首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 56 毫秒
1.
Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and non-genotoxicity (GT−) identification rates. New methods and combinatorial approaches have been explored for enhanced collective identification capability. The rates of in-silco methods may be further improved by significantly diversified training data enriched by the large number of recently reported GT+ and GT− compounds, but a major concern is the increased noise levels arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two SVMs of different diversity/noise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+ in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT− compounds excluding clinical trial drugs correctly identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT−, and 23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1–51.9% GT+ and 75–93% GT− rates of existing in-silico methods, 58.8% GT+ and 79% GT− rates of Ames method, and the estimated percentages of 23% in vivo and 31–33% in vitro GT+ compounds in the “universe of chemicals”. There is a substantial level of agreement between H-SVM and L-SVM predicted GT+ and GT− MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying GT+ compounds from large compound libraries based on higher diversity and higher noise training data.  相似文献   

2.
3.
The combination of 3D pharmacophore fingerprints and the support vector machine classification algorithm has been used to generate robust models that are able to classify compounds as active or inactive in a number of G-protein-coupled receptor assays. The models have been tested against progressively more challenging validation sets where steps are taken to ensure that compounds in the validation set are chemically and structurally distinct from the training set. In the most challenging example, we simulate a lead-hopping experiment by excluding an entire class of compounds (defined by a core substructure) from the training set. The left-out active compounds comprised approximately 40% of the actives. The model trained on the remaining compounds is able to recall 75% of the actives from the "new" lead series while correctly classifying >99% of the 5000 inactives included in the validation set.  相似文献   

4.
The central idea of supervised classification in chemoinformatics is to design a classifying algorithm that accurately assigns a new molecule to one of a set of predefined classes. Tipping has devised a classifying scheme, the Relevance Vector Machine (RVM), which is in terms of sparsity equivalent to the Support Vector Machine (SVM). However, unlike SVM classifiers, the RVM classifiers are probabilistic in nature, which is crucial in the field of decision making and risk taking. In this work, we investigate the performance of RVM binary classifiers on classifying a subset of the MDDR data set, a standard molecular benchmark data set, into active and inactive compounds. Additionally, we present results that compare the performance of SVM and RVM binary classifiers.  相似文献   

5.
6.
Academic and industrial research continues to be focused on discovering new classes of compounds based on HTS. Post-HTS analyses need to prioritize compounds that are progressed to chemical probe or lead status. We report trends in probe, lead and drug discovery by examining the following categories of compounds: 385 leads and the 541 drugs that emerged from them; "active" (152) and "inactive" (1488) compounds from the Molecular Libraries Initiative Small Molecule Repository (MLSMR) tested by HTS; "active" (46) and "inactive" (72) compounds from Nature Chemical Biology (NCB) tested by HTS; compounds in the drug development phase (I, II, III and launched), as indexed in MDDR; and medicinal chemistry compounds from WOMBAT, separated into high-activity (5,784 compounds with nanomolar activity or better) and low-activity (30,690 with micromolar activity or less). We examined Molecular weight (MW), molecular complexity, flexibility, the number of hydrogen bond donors and acceptors, LogP-the octanol/water partition coefficient estimated by ClogP and ALOGPS), LogSw (intrinsic water solubility, estimated by ALOGPS) and the number of Rule of five (Ro5) criteria violations. Based on the 50% and 90% distribution moments of the above properties, there were no significant difference between leads of known drugs and "actives" from MLSMR or NCB (chemical probes). "Inactives" from NCB and MLSMR were also found to exhibit similar properties. From these combined sets, we conclude that "Actives" (569 compounds) are less complex, less flexible, and more soluble than drugs (1,651 drugs), and significantly smaller, less complex, less hydrophobic and more soluble than the 5,784 high-activity WOMBAT compounds. These trends indicate that chemical probes are similar to leads with respect to some properties, e.g., complexity, solubility, and hydrophobicity.  相似文献   

7.
The number of compounds available for evaluation as part of the drug discovery process continues to increase. These compounds may exist physically or be stored electronically allowing screening by either actual or virtual means. This growing number of compounds has generated an increasing need for effective strategies to direct screening efforts. Initial efforts toward this goal led to the development of methods to select diverse sets of compounds for screening, methods to cluster actives into related groups of compounds, and tools to select compounds similar to actives of interest for further screening. In this work we extend these earlier efforts to exploit information about inactive compounds to help make rational decisions about which sets of compounds to include as part of a continuing screening campaign, or as part of a focused follow-up effort. This method uses the information from inactive compounds to "shave" off or deprioritize compounds similar to inactives from further consideration. This methodology can be used in two ways: first, to provide a rational means of deciding when sufficient compounds containing certain structural features have been tested and second as a tool to enhance similarity searching around known actives. Similarity searching is improved by deprioritizing compounds predicted to be inactive, due to the presence of structural features associated with inactivity.  相似文献   

8.
Target identification is a critical step following the discovery of small molecules that elicit a biological phenotype. The present work seeks to provide an in silico correlate of experimental target fishing technologies in order to rapidly fish out potential targets for compounds on the basis of chemical structure alone. A multiple-category Laplacian-modified na?ve Bayesian model was trained on extended-connectivity fingerprints of compounds from 964 target classes in the WOMBAT (World Of Molecular BioAcTivity) chemogenomics database. The model was employed to predict the top three most likely protein targets for all MDDR (MDL Drug Database Report) database compounds. On average, the correct target was found 77% of the time for compounds from 10 MDDR activity classes with known targets. For MDDR compounds annotated with only therapeutic or generic activities such as "antineoplastic", "kinase inhibitor", or "anti-inflammatory", the model was able to systematically deconvolute the generic activities to specific targets associated with the therapeutic effect. Examples of successful deconvolution are given, demonstrating the usefulness of the tool for improving knowledge in chemogenomics databases and for predicting new targets for orphan compounds.  相似文献   

9.
10.
11.
12.
13.
Small molecule aggregators non‐specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high‐throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non‐aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross‐validation, which showed comparable aggregator and significantly improved non‐aggregator identification rates against earlier studies. The second is the independent test of 17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non‐aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1.14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross‐validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false‐hit rates. © 2009 Wiley Periodicals, Inc.J Comput Chem, 2010  相似文献   

14.
15.
16.
17.
Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC(50) values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号