期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities. 相似文献

6.

Combinatorial QSAR of ambergris fragrance compounds 总被引：4，自引：0，他引：4

Kovatcheva A Golbraikh A Oloff S Xiao YD Zheng W Wolschann P Buchbauer G Tropsha A 《Journal of chemical information and computer sciences》2004,44(2):582-595

相似文献

7.

Development of quantitative structure-activity relationship and classification models for a set of carbonic anhydrase inhibitors

Mattioni BE Jurs PC 《Journal of chemical information and computer sciences》2002,42(1):94-102

相似文献

8.

基于随机森林与Chemistry Development Kit描述符的P-gp底物识别

马广立赵筱萍程翼宇《高等学校化学学报》2007,28(10):1885-1888

应用随机森林方法、开放源代码软件-CDK(Chemistry Development Kit)描述符与170个化合物的训练数据集[其中96个为磷糖蛋白(P-gp)底物], 建立了P-gp底物的识别模型. 研究了CDK描述符与P-gp底物识别的关系, 结果表明, 原子极化性和电荷偏面积等分子属性对P-gp底物识别起到重要作用. 该模型对训练集的预测正确率为99.42%; 对外部测试集(42个化合物, 其中24个为P-gp底物)的预测结果为P-gp底物、非底物及总测试集的识别正确率分别为87.50%, 83.33%和85.71%. 212个化合物数据集上的Leave-One-Out交叉验证识别正确率为77.4%. 相似文献

9.

Comparison of support vector machine and artificial neural network systems for drug/nondrug classification

Byvatov E Fechner U Sadowski J Schneider G 《Journal of chemical information and computer sciences》2003,43(6):1882-1889

相似文献

10.

Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships

Sutherland JJ O'Brien LA Weaver DF 《Journal of chemical information and computer sciences》2003,43(6):1906-1915

相似文献

11.

Genetic Algorithm guided Selection: variable selection and subset selection 总被引：3，自引：0，他引：3

Cho SJ Hermsmeier MA 《Journal of chemical information and computer sciences》2002,42(4):927-936

相似文献