首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
One popular metric for estimating the accuracy of prospective quantitative structure-activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.  相似文献   

3.
4.
The relevance of terms other than linear when deriving quantitative structure-activity relationship/quantitative structure-property relationship (QSAR/QSPR) models has been rarely considered so far. In this study, the impact of quadratic and interacting terms has been taken into account. The first effect of including such highly structured terms is a significant extension of the parametric domain that moves from the initial N to N(N + 3)/2 parameters. This substantial enlargement over the conventional linear boundaries involves a higher computational cost due to the increased combinatorial number of resulting theoretical QSAR/QSPR models. To face this issue, novel genetic-algorithm-based software, MGZ (multigenetic zooming), was developed and used for both variable selection and model building. To speed up the entire process of domain searching, MGZ was supported with multiple independent evolving populations and genetic storms to further QSAR/QSPR analyses. In addition, a novel fitness function was developed to score models on the basis of their inner predictive capability, assessed on the training set, structure complexity, and presence of nonlinear terms. The models were further validated by monitoring model redundancy and performing intensive randomization runs. The Selwood data set was used as a reference set to derive QSAR models. Furthermore, a QSPR study was conducted on the solubility data set of a large array of organic compounds. The results reported in the present paper demonstrate that our approach is successful in finding linear models, which are at least as good as the models previously derived using standard statistical approaches, and in deriving new nonlinear models with good statistical figures.  相似文献   

5.
6.
7.
QSAR/QSPR在POPs归趋与风险评价中的应用   总被引:4,自引:0,他引:4  
王斌  余刚  黄俊  胡洪营 《化学进展》2007,19(10):1612-1619
持久性有机污染物(POPs)是目前备受国际社会关注的高危害性有机污染物,对它们的环境归趋分析和风险评价需要获得大量可靠的性质数据和毒性数据,而定量结构活性/性质相关(QSAR/QSPR)方法为快速有效地获得这些数据提供了可能性。QSAR/QSPR模型已在预测POPs的生物活性/性质,补充缺失的基础数据及探求POPs的环境过程机制和生态效应机理等方面得到了广泛应用,近年来也在新POPs物质的筛选、归趋模拟以及风险评价等方面有着更进一步的应用或潜在应用前景。本文介绍了QSAR/QSPR在POPs性质和生物活性预测中的基本应用及其在POPs归趋和风险评价中的扩展应用,并对QSAR/QSPR在POPs研究领域的应用前景进行了展望。  相似文献   

8.
9.
10.
In this paper, we report on the potential of a recently developed neural network for structures applied to the prediction of physical chemical properties of compounds. The proposed recursive neural network (RecNN) model is able to directly take as input a structured representation of the molecule and to model a direct and adaptive relationship between the molecular structure and target property. Therefore, it combines in a learning system the flexibility and general advantages of a neural network model with the representational power of a structured domain. As a result, a completely new approach to quantitative structure-activity relationship/quantitative structure-property relationship (QSPR/QSAR) analysis is obtained. An original representation of the molecular structures has been developed accounting for both the occurrence of specific atoms/groups and the topological relationships among them. Gibbs free energy of solvation in water, Delta(solv)G degrees , has been chosen as a benchmark for the model. The different approaches proposed in the literature for the prediction of this property have been reconsidered from a general perspective. The advantages of RecNN as a suitable tool for the automatization of fundamental parts of the QSPR/QSAR analysis have been highlighted. The RecNN model has been applied to the analysis of the Delta(solv)G degrees in water of 138 monofunctional acyclic organic compounds and tested on an external data set of 33 compounds. As a result of the statistical analysis, we obtained, for the predictive accuracy estimated on the test set, correlation coefficient R = 0.9985, standard deviation S = 0.68 kJ mol(-1), and mean absolute error MAE = 0.46 kJ mol(-1). The inherent ability of RecNN to abstract chemical knowledge through the adaptive learning process has been investigated by principal components analysis of the internal representations computed by the network. It has been found that the model recognizes the chemical compounds on the basis of a nontrivial combination of their chemical structure and target property.  相似文献   

11.
12.
ABSTRACT

A method for combining statistical-based QSAR predictions of two or more binary classification models is presented. It was assumed that all models were independent. This facilitated the combination of positive and negative predictions using a quantitative weight of evidence (qWoE) procedure based on Bayesian statistics and the additivity of the logarithms of the likelihood ratios. Previous studies combined more than one prediction but used arbitrary strengths for positive and negative predictions. In our approach, the combined models were validated by determining the sensitivity and specificity values, which are performance metrics that are a point of departure for obtaining values that measure the weight of evidence of positive and negative predictions. The developed method was experimentally applied in the prediction of Ames mutagenicity. The method achieved a similar accuracy to that of the experimental Ames test for this endpoint when the overall prediction was determined using a combination of the individual predictions of more than one model. Calculating the qWoE value would reduce the requirement for expert knowledge and decrease the subjectivity of the prediction. This method could be applied to other endpoints such as developmental toxicity and skin sensitisation with binary classification models.  相似文献   

13.
14.
The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities.  相似文献   

15.
16.
17.
18.
19.
The blood-brain permeation of a structurally diverse set of 281 compounds was modeled using linear regression and a multivariate genetic partial least squares (G/PLS) approach. Key structural features affecting the logarithm of blood-brain partitioning (logBB) were captured through statistically significant quantitative structure-activity relationship (QSAR) models. These relationships reveal the importance of logP, polar surface area, and a variety of electrotopological indices for accurate predictions of logBB. The best models reveal an excellent correlation (r > 0.9) for a training set of 58 compounds. Likewise, the comparison of the average logBB values obtained from an ensemble of QSAR models with experimental values also verifies the statistical quality of the models (r > 0.9). The models provide good agreement (r approximately 0.7) between the predicted logBB values for 34 molecules in the external validation set and the experimental values. To further validate the models for use during the drug discovery process, a prediction set of 181 drugs with reported CNS penetration data was used. A >70% success rate is obtained by using any of the QSAR models in the qualitative prediction for CNS permeable (active) drugs. A lower success rate (approximately 60%) was obtained for the best model for CNS impermeable (inactive) drugs. Combining the predictions obtained from all the models (consensus) did not significantly improve the discrimination of CNS active and CNS inactive molecules. Finally, using the therapeutic classification as a guiding tool, the CNS penetration capability of over 2000 compounds in the Synthline database was estimated. The results were very similar to the smaller set of 181 compounds.  相似文献   

20.
雷斌  臧芸蕾  薛志伟  葛懿擎  李伟  翟倩  焦龙 《色谱》2021,39(3):331-337
色谱保留指数(retention index,RI)是色谱分析中的重要参数,不同化合物在不同极性固定相上具有不同的保留行为.醛酮化合物种类众多,实验测定其RI值的时间和经济成本高.该论文采用集成建模(ensemble modeling)结合全息定量构效关系(HQSAR)方法研究了醛酮化合物在2种固定相(DB-210和H...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号