首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 953 毫秒
1.
Binary kernel discrimination (BKD) uses a training set of compounds, for which structural and qualitative activity data are available, to produce a model that can then be applied to the structures of other compounds in order to predict their likely activity. Experiments with the MDL Drug Data Report database show that the optimal value of the smoothing parameter, and hence the predictive power of BKD, is crucially dependent on the number of false positives in the training set. It is also shown that the best results for BKD are achieved using one particular optimization method for the determination of the smoothing parameter that lies at the heart of the method and using the Jaccard/Tanimoto coefficient in the kernel function that is used to compute the similarity between a test set molecule and the members of the training set.  相似文献   

2.
This paper discusses the use of binary kernel discrimination (BKD) for identifying potential active compounds in lead-discovery programs. BKD was compared with established virtual screening methods in a series of experiments using pesticide data from the Syngenta corporate database. It was found to be superior to methods based on similarity searching and substructural analysis but inferior to a support vector machine. Similar conclusions resulted from application of the methods to a pesticide data set for which categorical activity data were available.  相似文献   

3.
4.
A new structure–activity relationship model predicting the probability for a compound to inhibit human cytochrome P450 3A4 has been developed using data for >800 compounds from various literature sources and tested on PubChem screening data. Novel GALAS (Global, Adjusted Locally According to Similarity) modeling methodology has been used, which is a combination of baseline global QSAR model and local similarity based corrections. GALAS modeling method allows forecasting the reliability of prediction thus defining the model applicability domain. For compounds within this domain the statistical results of the final model approach the data consistency between experimental data from literature and PubChem datasets with the overall accuracy of 89%. However, the original model is applicable only for less than a half of PubChem database. Since the similarity correction procedure of GALAS modeling method allows straightforward model training, the possibility to expand the applicability domain has been investigated. Experimental data from PubChem dataset served as an example of in-house high-throughput screening data. The model successfully adapted itself to both data classified using the same and different IC50 threshold compared with the training set. In addition, adjustment of the CYP3A4 inhibition model to compounds with a novel chemical scaffold has been demonstrated. The reported GALAS model is proposed as a useful tool for virtual screening of compounds for possible drug-drug interactions even prior to the actual synthesis.  相似文献   

5.
Recent progress in combinatorial chemistry and parallel synthesis has radically changed the approach to drug discovery in the pharmaceutical industry. At present, thousands of compounds can be made in a short period, creating a need for fast and effective in silico methods to select the most promising lead candidates. Decision forest is a novel pattern recognition method, which combines the results of multiple distinct but comparable decision tree models to reach a consensus prediction. In this article, a decision forest model was developed using a structurally diverse training data set containing 232 compounds whose estrogen receptor binding activity was tested at the U.S. Food and Drug Administration (FDA)'s National Center for Toxicological Research (NCTR). The model was subsequently validated using a test data set of 463 compounds selected from the literature, and then applied to a large data set with 57,145 compounds as a screening example. The results show that the decision forest method is a fast, reliable and effective in silico approach, which could be useful in drug discovery.  相似文献   

6.
Virtual screening by molecular docking has become a widely used approach to lead discovery in the pharmaceutical industry when a high-resolution structure of the biological target of interest is available. The performance of three widely used docking programs (Glide, GOLD, and DOCK) for virtual database screening is studied when they are applied to the same protein target and ligand set. Comparisons of the docking programs and scoring functions using a large and diverse data set of pharmaceutically interesting targets and active compounds are carried out. We focus on the problem of docking and scoring flexible compounds which are sterically capable of docking into a rigid conformation of the receptor. The Glide XP methodology is shown to consistently yield enrichments superior to the two alternative methods, while GOLD outperforms DOCK on average. The study also shows that docking into multiple receptor structures can decrease the docking error in screening a diverse set of active compounds.  相似文献   

7.
Recent progress in combinatorial chemistry and parallel synthesis has radically changed the approach to drug discovery in the pharmaceutical industry. At present, thousands of compounds can be made in a short period, creating a need for fast and effective in silico methods to select the most promising lead candidates. Decision forest is a novel pattern recognition method, which combines the results of multiple distinct but comparable decision tree models to reach a consensus prediction. In this article, a decision forest model was developed using a structurally diverse training data set containing 232 compounds whose estrogen receptor binding activity was tested at the U.S. Food and Drug Administration (FDA)'s National Center for Toxicological Research (NCTR). The model was subsequently validated using a test data set of 463 compounds selected from the literature, and then applied to a large data set with 57,145 compounds as a screening example. The results show that the decision forest method is a fast, reliable and effective in silico approach, which could be useful in drug discovery.  相似文献   

8.
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.  相似文献   

9.
We describe a novel method for ligand-based virtual screening, based on utilizing Self-Organizing Maps (SOM) as a novelty detection device. Novelty detection (or one-class classification) refers to the attempt of identifying patterns that do not belong to the space covered by a given data set. In ligand-based virtual screening, chemical structures perceived as novel lie outside the known activity space and can therefore be discarded from further investigation. In this context, the concept of "novel structure" refers to a compound, which is unlikely to share the activity of the query structures. Compounds not perceived as "novel" are suspected to share the activity of the query structures. Nowadays, various databases contain active structures but access to compounds which have been found to be inactive in a biological assay is limited. This work addresses this problem via novelty detection, which does not require proven inactive compounds. The structures are described by spatial autocorrelation functions weighted by atomic physicochemical properties. Different methods for selecting a subset of targets from a larger set are discussed. A comparison with similarity search based on Daylight fingerprints followed by data fusion is presented. The two methods complement each other to a large extent. In a retrospective screening of the WOMBAT database novelty detection with SOM gave enrichment factors between 105 and 462-an improvement over the similarity search based on Daylight fingerprints between 25% and 100%, when the 100 top ranked structures were considered. Novelty detection with SOM is applicable (1) to improve the retrieval of potentially active compounds also in concert with other virtual screening methods; (2) as a library design tool for discarding a large number of compounds, which are unlikely to possess a given biological activity; and (3) for selecting a small number of potentially active compounds from a large data set.  相似文献   

10.
Performance of small molecule automated docking programs has conceptually been divided into docking -, scoring -, ranking - and screening power, which focuses on the crystal pose prediction, affinity prediction, ligand ranking and database screening capabilities of the docking program, respectively. Benchmarks show that different docking programs can excel in individual benchmarks which suggests that the scoring function employed by the programs can be optimized for a particular task. Here the scoring function of Smina is re-optimized towards enhancing the docking power using a supervised machine learning approach and a manually curated database of ligands and cross docking receptor pairs. The optimization method does not need associated binding data for the receptor-ligand examples used in the data set and works with small train sets. The re-optimization of the weights for the scoring function results in a similar docking performance with regard to docking power towards a cross docking test set. A ligand decoy based benchmark indicates a better discrimination between poses with high and low RMSD. The reported parameters for Smina are compatible with Autodock Vina and represent ready-to-use alternative parameters for researchers who aim at pose prediction rather than affinity prediction.  相似文献   

11.
12.
The use of high throughput screening (HTS) to identify lead compounds has greatly challenged conventional quantitative structure-activity relationship (QSAR) techniques that typically correlate structural variations in similar compounds with continuous changes in biological activity. A new QSAR-like methodology that can correlate less quantitative assay data (i.e., "active" versus "inactive"), as initially generated by HTS, has been introduced. In the present study, we have, for the first time, applied this approach to a drug discovery problem; that is, the study of the estrogen receptor ligands. The binding affinities of 463 estrogen analogues were transformed into a binary data format, and a predictive binary QSAR model was derived using 410 estrogen analogues as a training set. The model was applied to predict the activity of 53 estrogen analogues not included in the training set. An overall accuracy of 94% was obtained.  相似文献   

13.
14.
Combinatorial chemistry and high-throughput screening technologies produce huge amounts of data on a regular basis. Sieving through these libraries of compounds and their associated assay data to identify appropriate series for follow-up is a daunting task, which has created a need for computational techniques that can find coherent islands of structure-activity relationships in this sea. Structural unit analysis (SUA) examines an entire data set so as to identify the molecular substructures or fragments that distinguish compounds with high activity from those with average activity. The algorithm is iterative and follows set heuristics in order to generate the structural units. It produces graphs that represent a set of units, which become SUA rules. Finding all of the input structures that match these graphs generates clusters. The Apriori algorithm for association rule mining is adapted to explore all of the combinations of structural units that define useful series. User-defined constraints are applied toward series selection and the refinement of rules. The significance of a series is determined by applying statistical methods appropriate to each data set. Application to the NCI-H23 (DTP Human Tumor Cell Line Screen) database serves to illustrate the process by which structural series are identified. An application of the method to scaffold hopping is then discussed in connection with proprietary screening data from a lead optimization project directed toward the treatment of respiratory tract infections at Bayer Healthcare. SUA was able to successfully identify promising alternative core structures in addition to identifying compounds with above-average activity and selectivity.  相似文献   

15.
Docking and scoring are critical issues in virtual drug screening methods. Fast and reliable methods are required for the prediction of binding affinity especially when applied to a large library of compounds. The implementation of receptor flexibility and refinement of scoring functions for this purpose are extremely challenging in terms of computational speed. Here we propose a knowledge-based multiple-conformation docking method that efficiently accommodates receptor flexibility thus permitting reliable virtual screening of large compound libraries. Starting with a small number of active compounds, a preliminary docking operation is conducted on a large ensemble of receptor conformations to select the minimal subset of receptor conformations that provides a strong correlation between the experimental binding affinity (e.g., Ki, IC50) and the docking score. Only this subset is used for subsequent multiple-conformation docking of the entire data set of library (test) compounds. In conjunction with the multiple-conformation docking procedure, a two-step scoring scheme is employed by which the optimal scoring geometries obtained from the multiple-conformation docking are re-scored by a molecular mechanics energy function including desolvation terms. To demonstrate the feasibility of this approach, we applied this integrated approach to the estrogen receptor alpha (ERalpha) system for which published binding affinity data were available for a series of structurally diverse chemicals. The statistical correlation between docking scores and experimental values was significantly improved from those of single-conformation dockings. This approach led to substantial enrichment of the virtual screening conducted on mixtures of active and inactive ERalpha compounds.  相似文献   

16.
17.
A Quantitative structure–activity relationship study is performed on a set of organophosphorus compounds to reveal structural and quantum‐chemical features influencing the toxic effect. The properties derived from the topological analysis of the electron density have been used to model the toxicity data. A multiple linear regression analysis in conjunction with genetic algorithm is used in the study, followed by subsequent validation of the results. Obtained QSAR models are beneficial for virtual screening of toxicity for new compounds of interest. Because toxicity of organophosphorus compounds is dependent on conformational properties, a conformational search has been performed before optimization of geometries. All quantum‐chemical calculations are carried out at DFT/B3LYP level of theory with 6‐311++G(d,p) basis set. Frequency calculations are performed after full geometry optimization. Ab initio wave functions were obtained for further analysis and evaluation of quantum topological properties of target molecules. © 2011 Wiley Periodicals, Inc. Int J Quantum Chem, 2012  相似文献   

18.
Molecular fingerprints are widely used for similarity-based virtual screening in drug discovery projects. In this paper we discuss the performance and the complementarity of nine two-dimensional fingerprints (Daylight, Unity, AlFi, Hologram, CATS, TRUST, Molprint 2D, ChemGPS, and ALOGP) in retrieving active molecules by similarity searching against a set of query compounds. For this purpose, we used biological data from HTS screening campaigns of four protein families (GPCRs, kinases, ion channels, and proteases). We have established threshold values for the similarity index (Tanimoto index) to be used as starting points for similarity searches. Based on the complementarities between the selections made by using different fingerprints we propose a multifingerprint approach as an efficient tool to balance the strengths and weaknesses of various fingerprints.  相似文献   

19.
20.
Screening of more than 2 million compounds comprising 41 distinct encoded combinatorial libraries revealed a novel structural class of p38 mitogen-activated protein (MAP) kinase inhibitors. The methodology used for screening large encoded combinatorial libraries combined with the statistical interpretation of screening results is described. A strong preference for a particular triaminotriazine aniline amide was discovered based on biological activity observed in the screening campaign. Additional screening of a focused follow-up combinatorial library yielded data expanding the unique combinatorial SAR and emphasizing an extraordinary preference for this particular building block and structural class. The preference is further highlighted when the p38 inhibitor data set is compared to data obtained for a panel of other kinases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号