首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 432 毫秒
1.
2.
3.
4.
5.
6.
Robust cross-validation of linear regression QSAR models   总被引:1,自引:0,他引:1  
A quantitative structure-activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any way during the model development/calibration. However, during the development of a QSAR model, it is necessary to obtain some indication of the model's predictive power. This is often done by some form of cross-validation (CV). In this study, the concepts of the predictive power and fitting ability of a multiple linear regression (MLR) QSAR model were examined in the CV context allowing for the presence of outliers. Commonly used predictive power and fitting ability statistics were assessed via Monte Carlo cross-validation when applied to percent human intestinal absorption, blood-brain partition coefficient, and toxicity values of saxitoxin QSAR data sets, as well as three known benchmark data sets with known outlier contamination. It was found that (1) a robust version of MLR should always be preferred over the ordinary-least-squares MLR, regardless of the degree of outlier contamination and that (2) the model's predictive power should only be assessed via robust statistics. The Matlab and java source code used in this study is freely available from the QSAR-BENCH section of www.dmitrykonovalov.org for academic use. The Web site also contains the java-based QSAR-BENCH program, which could be run online via java's Web Start technology (supporting Windows, Mac OSX, Linux/Unix) to reproduce most of the reported results or apply the reported procedures to other data sets.  相似文献   

7.
8.
9.
10.
咪唑啉衍生物缓蚀剂的定量构效关系及分子设计   总被引:5,自引:0,他引:5  
采用量子化学密度泛函理论(DFT)及线性回归分析方法, 对十一烷基咪唑啉衍生物缓蚀剂抗H2S、CO2腐蚀性能进行了定量构效关系(QSAR)研究. 通过回归分析, 筛选出了影响缓蚀剂缓蚀性能的主要因素, 建立了QSAR模型, 并使用留一法交叉验证对模型的稳定性及预测能力进行了分析. 结果表明, 电子转移参数△N、咪唑环上非氢原子静电荷之和∑Qring及分子极化率α对咪唑啉类缓蚀剂的缓蚀性能有很大的贡献, 所得模型的拟合相关系数(R2)和交叉验证相关系数(q2)分别为0.924 和0.917, 模型对此类缓蚀剂抗H2S、CO2腐蚀性能具有较好的预测效果. 应用QSAR研究结果进行了分子设计, 在理论上提出了一些具有较高抗H2S、CO2腐蚀性能的新型咪唑啉衍生物, 为实验工作者合成新型缓蚀剂提供理论参考.  相似文献   

11.
Holographic quantitative structure-activity relationship (HQSAR) is an emerging QSAR technique with the combined application of molecular hologram, which encodes the frequency of occurrence of various molecular fragment types, and the subsequent partial least squares (PLS) regression analysis. Based on molecular hologram, alignment-free QSAR models could be rapidly and easily developed with highly statistical significance and predictive ability. In this paper, the toxicity data for a series of 83 benzene derivatives to the autotrophic Chlorella vulgaris (IGC50, negative logarithmic form of 6-h 50% population growth inhibition concentration in mmol/l) were subjected to HQSAR analysis and this resulted in a model with a high predictive ability. The robustness and predictive ability of the model were validated by "leave-one-out" (LOO) cross-validation procedure and an external testing set. The influence of fragment distinction parameters and fragment size on the quality of the HQSAR model have been also discussed.  相似文献   

12.
13.
Holographic quantitative structure–activity relationship (HQSAR) is an emerging QSAR technique with the combined application of molecular hologram, which encodes the frequency of occurrence of various molecular fragment types, and the subsequent partial least squares (PLS) regression analysis. Based on molecular hologram, alignment-free QSAR models could be rapidly and easily developed with highly statistical significance and predictive ability. In this paper, the toxicity data for a series of 83 benzene derivatives to the autotrophic Chlorella vulgaris (IGC50, negative logarithmic form of 6-h 50% population growth inhibition concentration in mmol/l) were subjected to HQSAR analysis and this resulted in a model with a high predictive ability. The robustness and predictive ability of the model were validated by “leave-one-out” (LOO) cross-validation procedure and an external testing set. The influence of fragment distinction parameters and fragment size on the quality of the HQSAR model have been also discussed.  相似文献   

14.
The performance of three "spectroscopic" quantitative structure-activity relationship (QSAR) methods (eigenvalue (EVA), electronic eigenvalue (EEVA), and comparative spectra analysis (CoSA)) for relating molecular structure and estrogenic activity are critically evaluated. The methods were tested with respect to the relative binding affinities (RBA) of a diverse set of 36 estrogens previously examined in detail by the comparative molecular field analysis method. The CoSA method with (13)C chemical shifts appears to provide a predictive QSAR model for this data set. EEVA (i.e., molecular orbital energy in this context) is a borderline case, whereas the performances of EVA (i.e., vibrational normal mode) and CoSA with (1)H shifts are substandard and only semiquantitative. The CoSA method with (13)C chemical shifts provides an alternative and supplement to conventional 3D QSAR methods for rationalizing and predicting the estrogenic activity of molecules. If CoSA is to be applied to large data sets, however, it is desirable that the chemical shifts are available from common databases or, alternatively, that they can be estimated with sufficient accuracy using fast prediction schemes. Calculations of NMR chemical shifts by quantum mechanical methods, as in this case study, seem to be too time-consuming at this moment, but the situation is changing rapidly. An inherent shortcoming common to all spectroscopic QSAR methods is that they cannot take the chirality of molecules into account, at least as formulated at present. Moreover, the symmetry of molecules may cause additional problems. There are three pairs of enantiomers and nine symmetric (C(2) or C(2)(v)) molecules present in the data set, so that the predictive ability of full 3D QSAR methods is expected to be better than that of spectroscopic methods. This is demonstrated with SOMFA (self-organizing molecular field analysis). In general, the use of external test sets with randomized data is encouraged as a validation tool in QSAR studies.  相似文献   

15.
16.
17.
18.
19.
构建147个有机物分子结构与其热导率值之间的定量结构-性质关系(QSPR)模型, 探讨影响有机物热导率的结构因素. 以147个化合物作为样本集, 随机选择118个作为训练集, 29个作为测试集. 应用CODESSA软件计算了组成、拓扑、几何、静电和量子化学等描述符, 通过启发式方法(HM)筛选得到5个结构参数并建立线性回归模型; 用所选5个结构参数作为支持向量机(SVM)的输入, 建立非线性的支持向量机回归模型. 预测结果表明: 支持向量机回归模型的性能(复相关系数R2=0.9240)虽略低于启发式回归模型的性能(R2=0.9267), 但是支持向量机方法预测性能(R2=0.9682)高于启发式方法的预测性能(R2=0.9574), 对于QSPR模型来说, 预测性能更重要. 因此, 总体来说支持向量机方法优于启发式方法. 支持向量机方法和启发式方法的提出为工程上提供了一种根据分子结构预测有机物热导率的新方法.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号