首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.  相似文献   

3.
High-throughput screening (HTS) plays a pivotal role in lead discovery for the pharmaceutical industry. In tandem, cheminformatics approaches are employed to increase the probability of the identification of novel biologically active compounds by mining the HTS data. HTS data is notoriously noisy, and therefore, the selection of the optimal data mining method is important for the success of such an analysis. Here, we describe a retrospective analysis of four HTS data sets using three mining approaches: Laplacian-modified naive Bayes, recursive partitioning, and support vector machine (SVM) classifiers with increasing stochastic noise in the form of false positives and false negatives. All three of the data mining methods at hand tolerated increasing levels of false positives even when the ratio of misclassified compounds to true active compounds was 5:1 in the training set. False negatives in the ratio of 1:1 were tolerated as well. SVM outperformed the other two methods in capturing active compounds and scaffolds in the top 1%. A Murcko scaffold analysis could explain the differences in enrichments among the four data sets. This study demonstrates that data mining methods can add a true value to the screen even when the data is contaminated with a high level of stochastic noise.  相似文献   

4.
The main goal of high-throughput screening (HTS) is to identify active chemical series rather than just individual active compounds. In light of this goal, a new method (called compound set enrichment) to identify active chemical series from primary screening data is proposed. The method employs the scaffold tree compound classification in conjunction with the Kolmogorov-Smirnov statistic to assess the overall activity of a compound scaffold. The application of this method to seven PubChem data sets (containing between 9389 and 263679 molecules) is presented, and the ability of this method to identify compound classes with only weakly active compounds (potentially latent hits) is demonstrated. The analysis presented here shows how methods based on an activity cutoff can distort activity information, leading to the incorrect activity assignment of compound series. These results suggest that this method might have utility in the rational selection of active classes of compounds (and not just individual active compounds) for followup and validation.  相似文献   

5.
6.
7.
8.
9.
In continuation of our recent studies on the quality of conformational models generated with CATALYST and OMEGA we present a large-scale survey focusing on the impact of conformational model quality and several screening parameters on pharmacophore-based and shape-based virtual high throughput screening (vHTS). Therefore, we collected known active compounds of CDK2, p38 MAPK, PPAR-gamma, and factor Xa and built a set of druglike decoys using ilib:diverse. Subsequently, we generated 3D structures using CORINA and also calculated conformational models for all compounds using CAESAR, CATALYST FAST, and OMEGA. A widespread set of 103 structure-based pharmacophore models was developed with LigandScout for virtual screening with CATALYST. The performance of both database search modes (FAST and BEST flexible database search) as well as the fit value calculation procedures (FAST and BEST fit) available in CATALYST were analyzed in terms of their ability to discriminate between active and inactive compounds and in terms of efficiency. Moreover, these results are put in direct comparison to the performance of the shape-based virtual screening platform ROCS. Our results prove that high enrichment rates are not necessarily in conflict with efficient vHTS settings: In most of the experiments, we obtained the highest yield of actives in the hit list when parameter sets for the fastest search algorithm were used.  相似文献   

10.
Parallel Screening has been introduced as an in silico method to predict the potential biological activities of compounds by screening them with a multitude of pharmacophore models. This study presents an early application example employing a Pipeline Pilot-based screening platform for automatic large-scale virtual activity profiling. An extensive set of HIV protease inhibitor pharmacophore models was used to screen a selection of active and inactive compounds. Furthermore, we aimed to address the usually critically eyed point, whether it is possible in a parallel screening system to differentiate between similar molecules/molecules acting on closely related proteins, and therefore we incorporated a collection of other protease inhibitors including aspartic protease inhibitors. The results of the screening experiments show a clear trend toward most extensive retrieval of known active ligands, followed by the general protease inhibitors and lowest recovery of inactive compounds.  相似文献   

11.
12.
13.
14.
15.
This paper discusses the use of several rank-based virtual screening methods for prioritizing compounds in lead-discovery programs, given a training set for which both structural and bioactivity data are available. Structures from the NCI AIDS data set and from the Syngenta corporate database were represented by two types of fragment bit-string and by sets of high-level molecular features. These representations were processed using binary kernel discrimination, similarity searching, substructural analysis, support vector machine, and trend vector analysis, with the effectiveness of the methods being judged by the extent to which active test set molecules were clustered toward the top of the resultant rankings. The binary kernel discrimination approach yielded consistently superior rankings and would appear to have considerable potential for chemical screening applications.  相似文献   

16.
17.
A new application of TOPological Sub-structural MOlecular DEsign (TOPS-MODE) was carried out in herbicides using computer-aided molecular design. Two series of compounds, one containing herbicide and the other containing nonherbicide compounds, were processed by a k-Means Cluster Analysis in order to design the training and prediction sets. A linear classification function to discriminate the herbicides from the nonherbicide compounds was developed. The model correctly and clearly classified 88% of active and 94% of inactive compounds in the training set. More specifically, the model showed a good global classification of 91%, i.e., (168 cases out of 185). While in the prediction set, they showed an overall predictability of 91% and 92% for active and inactive compounds, being the global percentage of good classification of 92%. To assess the range of model applicability, a virtual screening of structurally heterogeneous series of herbicidal compounds was carried out. Two hundred eighty-four out of 332 were correctly classified (86%). Furthermore this paper describes a fragment analysis in order to determine the contribution of several fragments toward herbicidal property; also the present of halogens in the selected fragments were analyzed. It seems that the present TOPS-MODE based QSAR is the first alternate general "in silico" technique to experimentation in herbicides discovery.  相似文献   

18.
19.
The recent technological advancements of liquid chromatography–tandem mass spectrometry allow the simultaneous determination of tens, or even hundreds, of target analytes. In such cases, the traditional approach to quantitative method validation presents three major drawbacks: (i) it is extremely laborious, repetitive and rigid; (ii) it does not allow to introduce new target analytes without starting the validation from its very beginning and (iii) it is performed on spiked blank matrices, whose very nature is significantly modified by the addition of a large number of spiking substances, especially at high concentration. In the present study, several predictive chemometric models were developed from closed sets of analytes in order to estimate validation parameters on molecules of the same class, but not included in the original training set. Retention time, matrix effect, recovery, detection and quantification limits were predicted with partial least squares regression method. In particular, iterative stepwise elimination, iterative predictors weighting and genetic algorithms approaches were utilized and compared to achieve effective variables selection. These procedures were applied to data reported in our previously validated ultra-high performance liquid chromatography–tandem mass spectrometry multi-residue method for the determination of pharmaceutical and illicit drugs in oral fluid samples in accordance with national and international guidelines. Then, the partial least squares model was successfully tested on naloxone and lormetazepam, in order to introduce these new compounds in the oral fluid validated method, which adopts reverse-phase chromatography. Retention time, matrix effect, recovery, limit of detection and limit of quantification parameters for naloxone and lormetazepam were predicted by the model and then positively compared with their corresponding experimental values. The whole study represents a proof-of-concept of chemometrics potential to reduce the routine workload during multi-residue methods validation and suggests a rational alternative to ever-expanding procedures progressively drifting apart from real sample analysis.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号