首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A novel method (in the context of quantitative structure–activity relationship (QSAR)) based on the k nearest neighbour (kNN) principle, has recently been introduced for the derivation of predictive structure–activity relationships. Its performance has been tested for estimating the estrogen binding affinity of a diverse set of 142 organic molecules. Highly predictive models have been obtained. Moreover, it has been demonstrated that consensus-type kNN QSAR models, derived from the arithmetic mean of individual QSAR models were statistically robust and provided more accurate predictions than the great majority of the individual QSAR models. Finally, the consensus QSAR method was tested with 3D QSAR and log?P data from a widely used steroid benchmark data set.  相似文献   

2.
Although artificial neural networks (ANNs) have been shown to exhibit superior predictive power in the study of quantitative structure-activity relationships (QSARs), they have also been labeled a "black box" because they provide little explanatory insight into the relative influence of the independent variables in the predictive process so that little information on how and why compounds work can be obtained. Here, we have turned our interests to their explanatory capacities; therefore, a method was proposed for assessing the relative importance of variables indicating molecular structure, on the basis of axon connection weights and partial derivatives of the ANN output with respect to its input, which can identify variables that significantly contribute to network predictions, and providing a variable selection method for ANNs. We show that, by extending this approach to ANNs, the "black box" mechanics of ANNs can be greatly illuminated, thereby making it very useful in understanding environmental chemical QSAR models.  相似文献   

3.
4.
5.
Summary Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable.The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness.Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed.The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.  相似文献   

6.
QSAR models have been under development for decades but acceptance and utilization of model results have been slow, in part, because there is no widely accepted metric for assessing their reliability. We reapply a method commonly used in quantitative epidemiology and medical decision-making for evaluating the results of screening tests to assess reliability of a QSAR model. It quantifies the accuracy (expressed as sensitivity and specificity) of QSAR models as conditional probabilities of correct and incorrect classification of chemical characteristic, given a true characteristic. Using Bayes formula, these conditional probabilities are combined with prior information to generate a posterior distribution to determine the probability a specific chemical has a particular characteristic, given a model prediction. As an example, we apply this approach to evaluate the predictive reliability of a CATABOL model and base on it a "ready" and "not ready" biodegradability classification. Finally, we show how predictive capability of the model can be improved by sequential use of two models, the first one with high sensitivity and the second with high specificity.  相似文献   

7.
8.
9.
Quantitative structure–activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482–491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.  相似文献   

10.

QSAR models have been under development for decades but acceptance and utilization of model results have been slow, in part, because there is no widely accepted metric for assessing their reliability. We reapply a method commonly used in quantitative epidemiology and medical decision-making for evaluating the results of screening tests to assess reliability of a QSAR model. It quantifies the accuracy (expressed as sensitivity and specificity) of QSAR models as conditional probabilities of correct and incorrect classification of chemical characteristic, given a true characteristic. Using Bayes formula, these conditional probabilities are combined with prior information to generate a posterior distribution to determine the probability a specific chemical has a particular characteristic, given a model prediction. As an example, we apply this approach to evaluate the predictive reliability of a CATABOL model and base on it a "ready" and "not ready" biodegradability classification. Finally, we show how predictive capability of the model can be improved by sequential use of two models, the first one with high sensitivity and the second with high specificity.  相似文献   

11.
12.
A novel method (in the context of quantitative structure-activity relationship (QSAR)) based on the k nearest neighbour (kNN) principle, has recently been introduced for the derivation of predictive structure-activity relationships. Its performance has been tested for estimating the estrogen binding affinity of a diverse set of 142 organic molecules. Highly predictive models have been obtained. Moreover, it has been demonstrated that consensus-type kNN QSAR models, derived from the arithmetic mean of individual QSAR models were statistically robust and provided more accurate predictions than the great majority of the individual QSAR models. Finally, the consensus QSAR method was tested with 3D QSAR and log P data from a widely used steroid benchmark data set.  相似文献   

13.
14.
15.
Quantitative structure–activity relationship (QSAR) studies are useful computational tools often used in drug discovery research and in many scientific disciplines. In this study, a robust fragment-similarity-based QSAR (FS-QSAR) algorithm was developed to correlate structures with biological activities by integrating fragment-based drug design concept and a multiple linear regression method. Similarity between any pair of training and testing fragments was determined by calculating the difference of lowest or highest eigenvalues of the chemistry space BCUT matrices of corresponding fragments. In addition to the BCUT-similarity function, molecular fingerprint Tanimoto coefficient (Tc) similarity function was also used as an alternative for comparison. For validation studies, the FS-QSAR algorithm was applied to several case studies, including a dataset of COX2 inhibitors and a dataset of cannabinoid CB2 triaryl bis-sulfone antagonist analogues, to build predictive models achieving average coefficient of determination (r 2) of 0.62 and 0.68, respectively. The developed FS-QSAR method is proved to give more accurate predictions than the traditional and one-nearest-neighbour QSAR methods and can be a useful tool in the fragment-based drug discovery for ligand activity prediction.  相似文献   

16.
Quantitative Structure Activity Relationship (QSAR) is a term describing a variety of approaches that are of substantial interest for chemistry. This method can be defined as indirect molecular design by the iterative sampling of the chemical compounds space to optimize a certain property and thus indirectly design the molecular structure having this property. However, modeling the interactions of chemical molecules in biological systems provides highly noisy data, which make predictions a roulette risk. In this paper we briefly review the origins for this noise, particularly in multidimensional QSAR. This was classified as the data, superimposition, molecular similarity, conformational, and molecular recognition noise. We also indicated possible robust answers that can improve modeling and predictive ability of QSAR, especially the self-organizing mapping of molecular objects, in particular, the molecular surfaces, a method that was brought into chemistry by Gasteiger and Zupan.  相似文献   

17.
18.
The growing interest in epigenetic probes and drug discovery, as revealed by several epigenetic drugs in clinical use or in the lineup of the drug development pipeline, is boosting the generation of screening data. In order to maximize the use of structure–activity relationships there is a clear need to develop robust and accurate models to understand the underlying structure–activity relationship. Similarly, accurate models should be able to guide the rational screening of compound libraries. Herein we introduce a novel approach for epigenetic quantitative structure–activity relationship (QSAR) modelling using conformal prediction. As a case study, we discuss the development of models for 11 sets of inhibitors of histone deacetylases (HDACs), which are one of the major epigenetic target families that have been screened. It was found that all derived models, for every HDAC endpoint and all three significance levels, are valid with respect to predictions for the external test sets as well as the internal validation of the corresponding training sets. Furthermore, the efficiencies for the predictions are above 80% for most data sets and above 90% for four data sets at different significant levels. The findings of this work encourage prospective applications of conformal prediction for other epigenetic target data sets.  相似文献   

19.
One popular metric for estimating the accuracy of prospective quantitative structure-activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号