首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
A new method for analyzing a structure-activity relationship is proposed. By use of a simple quantitative index, one can readily identify "structure-activity cliffs": pairs of molecules which are most similar but have the largest change in activity. We show how this provides a graphical representation of the entire SAR, in a way that allows the salient features of the SAR to be quickly grasped. In addition, the approach allows us view the SARs in a data set at different levels of detail. The method is tested on two data sets that highlight its ability to easily extract SAR information. Finally, we demonstrate that this method is robust using a variety of computational control experiments and discuss possible applications of this technique to QSAR model evaluation.  相似文献   

4.
The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities.  相似文献   

5.
6.
卢昂  陈壮志  巫秀美  马秀英  赵昱 《化学通报》2022,85(10):1261-1266
应用定量构效关系(Quantitative structure activity relationship, QSAR)研究阐明黄酮类化合物(Flavonoid compounds, FCs)的子结构指纹(Substructure fingerprint)与1,1-二苯基-2-三硝基苯肼(1,1-Diphenyl-2-picrylhydrazyl, DPPH)自由基清除能力之间的关系,从而指导高效抗氧化物质的设计和发现。在PubMed数据库中收集77个具有明确抗氧化活性的黄酮类化合物,而在ChEMBL数据库中收集86个无抗DPPH活性的黄酮类化合物。这163个黄酮类化合物的子结构指纹由PubChem系统生成,然后通过卡方检验筛选出与黄酮类化合物的抗氧化活性显著相关的分子指纹,最后通过判别分析建立预测QSAR模型,并采用回代法和交叉验证法对已建立的模型进行准确性和稳健性的验证。结果表明,黄酮类化合物抗DPPH自由基活性与ESSSR环的计数、简单相邻原子的类型和简单的SMARTS模式等因素有关。此外,所建立的QSAR模型能较好地预测黄酮类化合物的DPPH自由基清除活性,可用于评价候选抗氧化剂的潜力。  相似文献   

7.
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .  相似文献   

8.
In this article, as a first step to develop an efficient approximation for predicting the molecular electronic excited state properties at ab initio level, we propose local excitation approximation (LEA). In the LEA scheme, the only local electron excitations within selected substructure (Chromophore) are treated to calculate the targeted excited state wavefunctions, whereas the other electron excitations (local electron excitations in other substructure and charge‐transfer excitations between different regions) are simply discarded. This concept is realized by using the localized molecular orbitals (LMO) localizing on the chromophore substructure. If the targeted transitions show the strong local character and the adequate substructure is selected as chromophore region, the LEA scheme can provide excited state properties without large loss of accuracy. The fatal slowdown of convergence speed of Davidson's iterative diagonalization due to the use of LMO can be avoided by additional transformation of LMOs. To assess the accuracy and efficiency of the LEA scheme, we performed test calculations using various compounds at configuration interaction single (CIS) and time‐dependent Hartree‐Fock (TDHF) level of theory. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

9.
10.
11.
One popular metric for estimating the accuracy of prospective quantitative structure-activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.  相似文献   

12.
A Gaussian process method (GPM) is described and applied to the production of some QSAR models. These models have the potential to solve a number of problems which arise in QSAR modeling in that no parameters have to be supplied and only one hyperparameter is used in finding the optimal solution. The application of the method to QSAR is illustrated using data sets of compounds active at the benzodiazepine and muscarinic receptors as well as the data set of the toxicity of substituted benzenes to the ciliate, Tetrahymena Pyriformis.  相似文献   

13.
14.
QSAR models have been under development for decades but acceptance and utilization of model results have been slow, in part, because there is no widely accepted metric for assessing their reliability. We reapply a method commonly used in quantitative epidemiology and medical decision-making for evaluating the results of screening tests to assess reliability of a QSAR model. It quantifies the accuracy (expressed as sensitivity and specificity) of QSAR models as conditional probabilities of correct and incorrect classification of chemical characteristic, given a true characteristic. Using Bayes formula, these conditional probabilities are combined with prior information to generate a posterior distribution to determine the probability a specific chemical has a particular characteristic, given a model prediction. As an example, we apply this approach to evaluate the predictive reliability of a CATABOL model and base on it a "ready" and "not ready" biodegradability classification. Finally, we show how predictive capability of the model can be improved by sequential use of two models, the first one with high sensitivity and the second with high specificity.  相似文献   

15.
16.
Validation is a crucial aspect for quantitative structure–activity relationship (QSAR) model development. External validation is considered, in general, as the most conclusive proof of predictive capacity of a QSAR model. In the absence of truly external data set, external validation is usually performed on test set compounds, which are members of the original data set but not used in model development exercise. In the case of small data sets, QSAR researchers experience problem in model development due to the fact that the developed models may be less reliable on account of the small number of training set compounds and such models may also show poor external predictability because the models may not have captured all necessary features required for the particular structure–activity relationships. The present paper attempts to show that ‘true r(LOO)’ statistic calculated based on the model derived from the undivided data set with application of variable selection strategy at each cycle of leave‐one‐out (LOO) validation may reflect external validation characteristics of the developed model thus obviating the requirement of splitting of the data set into training and test sets. This approach may be helpful in the case of small data sets as it uses all available data for model development and validation thus making the resulting model more reliable. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

17.
18.
19.
20.

QSAR models have been under development for decades but acceptance and utilization of model results have been slow, in part, because there is no widely accepted metric for assessing their reliability. We reapply a method commonly used in quantitative epidemiology and medical decision-making for evaluating the results of screening tests to assess reliability of a QSAR model. It quantifies the accuracy (expressed as sensitivity and specificity) of QSAR models as conditional probabilities of correct and incorrect classification of chemical characteristic, given a true characteristic. Using Bayes formula, these conditional probabilities are combined with prior information to generate a posterior distribution to determine the probability a specific chemical has a particular characteristic, given a model prediction. As an example, we apply this approach to evaluate the predictive reliability of a CATABOL model and base on it a "ready" and "not ready" biodegradability classification. Finally, we show how predictive capability of the model can be improved by sequential use of two models, the first one with high sensitivity and the second with high specificity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号