首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new strategy of outlier detection for QSAR/QSPR   总被引:1,自引:0,他引:1  
The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte‐Carlo cross‐validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross‐predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

2.
Validation is a crucial aspect for quantitative structure–activity relationship (QSAR) model development. External validation is considered, in general, as the most conclusive proof of predictive capacity of a QSAR model. In the absence of truly external data set, external validation is usually performed on test set compounds, which are members of the original data set but not used in model development exercise. In the case of small data sets, QSAR researchers experience problem in model development due to the fact that the developed models may be less reliable on account of the small number of training set compounds and such models may also show poor external predictability because the models may not have captured all necessary features required for the particular structure–activity relationships. The present paper attempts to show that ‘true r(LOO)’ statistic calculated based on the model derived from the undivided data set with application of variable selection strategy at each cycle of leave‐one‐out (LOO) validation may reflect external validation characteristics of the developed model thus obviating the requirement of splitting of the data set into training and test sets. This approach may be helpful in the case of small data sets as it uses all available data for model development and validation thus making the resulting model more reliable. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
A new computer program has been designed to build and analyze quantitative-structure activity relationship (QSAR) models through regression analysis. The user is provided with a range of regression and validation techniques. The emphasis of the program lies mainly in the validation of QSAR models in chemical applications. ARTE-QSAR produces an easy interpretable output from which the user can conclude if the obtained model is suitable for prediction and analysis.  相似文献   

4.
5.
6.
7.
CORrelation And Logic (CORAL) is a software that generates quantitative structure activity relationships (QSAR) for different endpoints. This study is dedicated to the QSAR analysis of acute toxicity in Fathead minnow (Pimephales promelas). Statistical quality for the external test set is a complex function of the split (into training and test subsets), the number of epochs of the Monte Carlo optimization, and the threshold that is a criterion for dividing the correlation weights into two classes rare (blocked) and not rare (active). Computational experiments with three random splits (data on 568 compounds) indicated that this approach can satisfactorily predict the desired endpoint (the negative decimal logarithm of the 50% lethal concentration, in mmol/L, pLC50). The average correlation coefficients (r2) are 0.675 ± 0.0053, 0.824 ± 0.0242, 0.787 ± 0.0101 for subtraining, calibration, and test set, respectively. The average standard errors of estimation (s) are 0.837 ± 0.021, 0.555 ± 0.047, 0.606 ± 0.049 for subtraining, calibration, and test set, respectively. The CORAL software together with three random splits into subtraining, calibration, and test sets can be downloaded on the Internet ( http://www.insilico.eu/coral/ ). © 2012 Wiley Periodicals, Inc.  相似文献   

8.
9.
We describe the application of particle swarms for the development of quantitative structure-activity relationship (QSAR) models based on k-nearest neighbor and kernel regression. Particle swarms is a population-based stochastic search method based on the principles of social interaction. Each individual explores the feature space guided by its previous success and that of its neighbors. Success is measured using leave-one-out (LOO) cross validation on the resulting model as determined by k-nearest neighbor kernel regression. The technique is shown to compare favorably to simulated annealing using three classical data sets from the QSAR literature.  相似文献   

10.
The application of machine learning methods to the construction of quantitative structure–activity relationship models is a complex computational problem in which dimensionality reduction of the representation of the molecular structure plays a fundamental role in predicting a target activity. The feature selection pre-processing approach has been indicated to be effective in dimensionality reduction for building simpler and more understandable models. In this paper, a performance comparative study of 13 state-of-the-art feature selection filter methods is conducted. Structure–activity relationship models are constructed using three widely used classifiers and a diverse collection of datasets. The comparative study utilizes robust statistical tests to compare the algorithms. According to the experimental results, there are substantial differences in performance among the evaluated feature selection methods. The methods that exhibit the best performance are correlation-based feature selection, fast clustering-based feature selection and the set cover method.  相似文献   

11.
The quantum chemical parameters and the topological indices have been calculated for the prediction of the toxicity of amino-benzenes in the environment, and work has been done on the multiple regression and neural networks. The combination of CoMFA with formation heat yields greatly improved results. A good model has been obtained which provides a basis for the studies of the toxic action mechanism.  相似文献   

12.
13.
以简易的量子化学方法计算了二十多种卤代苯和苯酚衍生物的FMO位电荷密度能,并进行定量结构-活性相关(quantiktive struvyitr biodegradability)研究,获得满意的结果.最后从生物酶促反应本质、污染物-生物酶的轨道控制反应角度对QSAR提出新的解释.  相似文献   

14.
15.
16.
17.
18.
Quantitative structure–activity relationships (QSAR) methods are urgently needed for predicting ADME/T (absorption, distribution, metabolism, excretion and toxicity) properties to select lead compounds for optimization at the early stage of drug discovery, and to screen drug candidates for clinical trials. Use of suitable QSAR models ultimately results in lesser time-cost and lower attrition rate during drug discovery and development. In the case of ADME/T parameters, drug metabolism is a key determinant of metabolic stability, drug–drug interactions, and drug toxicity. QSAR models for predicting drug metabolism have undergone significant advances recently. However, most of the models used lack sufficient interpretability and offer poor predictability for novel drugs. In this review, we describe some considerations to be taken into account by QSAR for modeling drug metabolism, such as the accuracy/consistency of the entire data set, representation and diversity of the training and test sets, and variable selection. We also describe some novel statistical techniques (ensemble methods, multivariate adaptive regression splines and graph machines), which are not yet used frequently to develop QSAR models for drug metabolism. Subsequently, rational recommendations for developing predictable and interpretable QSAR models are made. Finally, the recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction, including in vivo hepatic clearance, in vitro metabolic stability, inhibitors and substrates of cytochrome P450 families, are briefly summarized.  相似文献   

19.
A QSAR study on a series of pyrimidinyl and triazinyl amines was performed to explore the physico-chemical parameters responsible for their anti-HIV activity and cytotoxicity. Physico-chemical parameters were calculated using WIN CAChe 6.1. Stepwise multiple linear regression analysis was carried out to derive QSAR models which were further evaluated for statistical significance and predictive power by internal and external validation. The selected best QSAR models showed correlation coefficient R of 0.914 and 0.901, and cross-validated squared correlation coefficient Q 2 of 0.685 and 0.691 for anti-HIV activity and cytotoxicity, respectively. The developed significant QSAR model indicates that hydrophobicity of the whole molecule plays an important role in the anti-HIV activity and cytotoxicity of pyrimidinyl and triazinyl amine derivatives. When hydrophobicity is increased, anti-HIV activity of the present series of compounds is decreased leading to high cytotoxicity.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号