期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Benchmarking of QSAR models for blood-brain barrier permeation

Konovalov DA Coomans D Deconinck E Heyden YV 《Journal of chemical information and modeling》2007,47(4):1648-1656

相似文献

2.

Statistical confidence for variable selection in QSAR models via Monte Carlo cross-validation

Konovalov DA Sim N Deconinck E Vander Heyden Y Coomans D 《Journal of chemical information and modeling》2008,48(2):370-383

相似文献

3.

Probability issues in molecular design: predictive and modeling ability in 3D-QSAR schemes

Polanski J Gieleciak R Bak A 《Combinatorial chemistry & high throughput screening》2004,7(8):793-807

In the current work we investigated 3D-QSAR data by the use of the coupled leave-several-out (LSO) and leave-one-out (LOO) cross-validation (CV) procedures. We verified the above mentioned scheme using both simulated data and real 3D QSAR data describing a series of CoMFA steroids, heterocyclic azo dyes and styrylquinoline HIV integrase inhibitors. Unlike in standard analyses, this technique characterizes individual method not by a single performance metrics but screens a whole possible modeling space by sampling different molecules into the training and test sets, respectively. This allowed us for the discussion of the information included in the estimators validating cross-validation procedures, as well as the comparison of the efficiency of several 3D QSAR schemes, in particular, Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Surface Analysis (CoMSA). Moreover, it allows one to acquire some general knowledge about predictive and modeling ability in 3D QSAR method. 相似文献

4.

Diagnostic tools to determine the quality of "transparent" regression-based QSARs: the "modelling power" plot

Sagrado S Cronin MT 《Journal of chemical information and modeling》2006,46(3):1523-1532

相似文献

5.

A self‐adaptive genetic algorithm‐artificial neural network algorithm with leave‐one‐out cross validation for descriptor selection in QSAR study

Jingheng Wu Juan Mei Sixiang Wen Siyan Liao Jincan Chen Yong Shen 《Journal of computational chemistry》2010,31(10):1956-1968

相似文献

6.

A novel semi-empirical topological descriptor Nt and the application to study on QSPR/QSAR

Zhou C Nie C Li S Li Z 《Journal of computational chemistry》2007,28(15):2413-2423

相似文献

7.

Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis

Zhu H Tropsha A Fourches D Varnek A Papa E Gramatica P Oberg T Dao P Cherkasov A Tetko IV 《Journal of chemical information and modeling》2008,48(4):766-784

Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as measured by the leave-one-out cross-validation correlation coefficient ( Q abs2). The prediction accuracy for the external validation sets I and II ranged from 0.71 to 0.85 (linear regression coefficient R absI2) and from 0.38 to 0.83 (linear regression coefficient R absII2), respectively. The use of an applicability domain threshold implemented in most models generally improved the external prediction accuracy but at the same time led to a decrease in chemical space coverage. Finally, several consensus models were developed by averaging the predicted aquatic toxicity for every compound using all 15 models, with or without taking into account their respective applicability domains. We find that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models. Our studies prove the power of a collaborative and consensual approach to QSAR model development. The best validated models of aquatic toxicity developed by our collaboratory (both individual and consensus) can be used as reliable computational predictors of aquatic toxicity and are available from any of the participating laboratories. 相似文献

8.

Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression 总被引：5，自引：0，他引：5

Yao XJ Panaye A Doucet JP Zhang RS Chen HF Liu MC Hu ZD Fan BT 《Journal of chemical information and computer sciences》2004,44(4):1257-1266

相似文献

9.

一种新的氨基酸描述符及其在肽QSAR中的应用

仝建波李康楠吴英纪占培李玲霄《分析测试学报》2017,36(2):224-229

从20种天然氨基酸的41个randic molecular profiles非零描述符、44个eigenvalue based indices非零描述符和47个walk and path counts非零描述符分别进行主成分分析,得出一种新的氨基酸描述符-SVREW。将其应用于血管紧张素转化酶(ACE)抑制二肽和ACE抑制三肽、苦味二肽和苦味四肽、后叶催产素类似物、HLA-A*0201限制性CTL表位肽的结构表征,应用多元线性回归(MLR)建立定量构效关系模型,同时采用内部与外部双重验证的方法验证模型的稳定性。所建ACE抑制二肽、ACE抑制三肽、苦味二肽、苦味四肽、后叶催产素类似物、HLA-A*0201限制性CTL表位肽的模型复相关系数(R2cum)分别为0.994,0.797,0.948,0.878,0.686,0.720;留一法交互校验复相关系数(R2cv)分别为0.955,0.859,0.879,0.958,0.796,0.843;外部样本校验相关系数(Q2ext)分别为0.990,0.954,0.890,0.950,0.748,0.773。经研究表明SVREW描述符用于肽分子结构表征所建模型的稳定性与预测能力均较好,有望成为多肽定量构效关系研究中一种有效的结构表征方法,可对新药物的发现和研究提供指导。相似文献

10.

Rational selection of training and test sets for the development of validated QSAR models

Golbraikh A Shen M Xiao Z Xiao YD Lee KH Tropsha A 《Journal of computer-aided molecular design》2003,17(2-4):241-253

Quantitative Structure–Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R² (q²) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q ² for the training set and accuracy of prediction (R ²) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models. 相似文献

11.

Boosted leave-many-out cross-validation: the effect of training and test set diversity on PLS statistics

Clark RD 《Journal of computer-aided molecular design》2003,17(2-4):265-275

It is becoming increasingly common in quantitative structure/activity relationship (QSAR) analyses to use external test sets to evaluate the likely stability and predictivity of the models obtained. In some cases, such as those involving variable selection, an internal test set – i.e., a cross-validation set – is also used. Care is sometimes taken to ensure that the subsets used exhibit response and/or property distributions similar to those of the data set as a whole, but more often the individual observations are simply assigned `at random.' In the special case of MLR without variable selection, it can be analytically demonstrated that this strategy is inferior to others. Most particularly, D-optimal design performs better if the form of the regression equation is known and the variables involved are well behaved. This report introduces an alternative, non-parametric approach termed `boosted leave-many-out' (boosted LMO) cross-validation. In this method, relatively small training sets are chosen by applying optimizable k-dissimilarity selection (OptiSim) using a small subsample size (k = 4, in this case), with the unselected observations being reserved as a test set for the corresponding reduced model. Predictive errors for the full model are then estimated by aggregating results over several such analyses. The countervailing effects of training and test set size, diversity, and representativeness on PLS model statistics are described for CoMFA analysis of a large data set of COX2 inhibitors. 相似文献

12.

Use of Self-Training Artificial Neural Networks in a QSRR Study of a Diverse Set of Organic Compounds

Zahra Garkani-Nejad 《Chromatographia》2009,70(5-6):869-874

相似文献

13.

Assessing model fit by cross-validation 总被引：8，自引：0，他引：8

Hawkins DM Basak SC Mills D 《Journal of chemical information and computer sciences》2003,43(2):579-586

When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is small-in the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use cross-validation, but ensure that this is done properly. 相似文献

14.

Predicting the Acute Toxicity of Aromatic Amines by Linear and Nonlinear Regression Methods 总被引：1，自引：0，他引：1

张晓龙周志祥刘阳华范雪兰李捍东王建涛《结构化学》2014,(2):244-252

相似文献

15.

A QSAR study and molecular design of benzothiazole derivatives as potent anticancer agents

Chen JinCan Qian Li Shen Yong Chen LanMei Zheng KangCheng 《中国科学B辑(英文版)》2008,51(2):111-119

A quantitative structure-activity relationship (QSAR) of a series of benzothiazole derivatives showing a potent and selective cytotoxicity against a tumorigenic cell line has been studied by using the density functional theory (DFT), molecular mechanics (MM ) and statistical methods, and the QSAR equation was established via a correlation analysis and a stepwise regression analysis. A new scheme determining outliers by "leave-one-out" (LOO) cross-validation coefficient (q2n-i) was suggested and successfully used. In the established optimal equation (excluding two outliers), the steric parameter (MRR) and the net charge (QFR) of the first atom of the substituent (R), as well as the square of hydrophobic parameter (lgP)2 of the whole molecule, are the main independent factors contributing to the anticancer activities of the compounds. The fitting correlation coefficient (R2) and the cross-validation coefficient (q2) values are 0.883 and 0.797, respectively. It indicates that this model has a significantly statistical quality and an excellent prediction ability. Based on the QSAR studies, 4 new compounds with high predicted anticancer activities have been theoretically designed and they are expected to be confirmed experimentally. 相似文献

16.

Development of QSAR Model for Predicting the Mutagenicity of Aromatic Compounds

LIU Yang-Hua ZHOU Zhi-Xiang ZHANG Xiao-Long LI Han-Dong 《结构化学》2015,34(3)

相似文献

17.

Nonparametric regression applied to quantitative structure-activity relationships

Constans P Hirst JD 《Journal of chemical information and computer sciences》2000,40(2):452-459

相似文献

18.

Retention prediction of adrenoreceptor agonists and antagonists on unmodified silica phase in hydrophilic interaction chromatography

Quiming NS Denola NL Saito Y Jinno K 《Analytical and bioanalytical chemistry》2007,388(8):1693-1706

相似文献

19.

QSPR checking and validation: a case study with hydroxy radical reaction rate constant

D.M. Hawkins J.J. Kraker S.C. Basak D. Mills 《SAR and QSAR in environmental research》2013,24(5-6):525-539

Traditionally, QSAR and QSPR models have been fitted by splitting the available compounds into separate learning and validation sets. The model is then fitted to the learning set and assessed using the validation set. Cross-validation (CV) uses all available compounds for both purposes, so that the full body of available information is brought to bear on both the learning and the validation portions of the study. The price paid for this additional information is a substantially greater computational load. A common mistake in using CV is to omit some of the repetitive computations. This mistake leads to substantial bias in the assessment. A hydroxyl radical reaction rate dataset is used to illustrate the superiority of CV and the pitfalls from its improper execution when modeling using nearest neighbors, paralleling behavior in the well-studied linear model setting. 相似文献

20.

基于多元线性回归的血管紧张素转化酶抑制肽定量构效关系建模研究

刘静彭剑秋管骁《分析科学学报》2012,28(1):16-22

利用氨基酸结构描述符SVHEHS分别对血管紧张素转化酶(Angiotensin I-converting Enzyme,ACE)竞争性抑制二肽、三肽、四肽序列表征后,建立结构与活性的多元线性回归(MLR)模型。ACE抑制二肽模型的相关系数、交叉验证相关系数、均方根误差、外部验证相关系数分别为0.851、0.781、0.327、0.792;三肽模型分别为0.805、0.717、0.339、0.817;四肽模型分别为0.792、0.553、0.393、0.630。研究表明,运用该描述符建立的ACE抑制肽MLR模型拟合、预测能力均较好,能较好解释ACE抑制肽的活性与结构间的关系。相似文献