首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

An approach to the interpretation of backpropagation neural network models for quantitative structure-activity and structure-property relationships (QSAR/QSPR) studies is proposed. The method is based on analyzing the first and second moments of distribution of the values of the first and the second partial derivatives of neural network outputs with respect to inputs calculated at data points. The use of such statistics makes it possible not only to obtain actually the same characteristics as for the case of traditional "interpretable" statistical methods, such as the linear regression analysis, but also to reveal important additional information regarding the non-linear character of QSAR/QSPR relationships. The approach is illustrated by an example of interpreting a backpropagation neural network model for predicting position of the long-wave absorption band of cyane dyes.  相似文献   

2.
An approach to the interpretation of backpropagation neural network models for quantitative structure-activity and structure-property relationships (QSAR/QSPR) studies is proposed. The method is based on analyzing the first and second moments of distribution of the values of the first and the second partial derivatives of neural network outputs with respect to inputs calculated at data points. The use of such statistics makes it possible not only to obtain actually the same characteristics as for the case of traditional "interpretable" statistical methods, such as the linear regression analysis, but also to reveal important additional information regarding the non-linear character of QSAR/QSPR relationships. The approach is illustrated by an example of interpreting a backpropagation neural network model for predicting position of the long-wave absorption band of cyane dyes.  相似文献   

3.
4.
5.

This paper presents some freeware, shareware, and commercial statistical tools available via the Internet and which could be used in QSAR for deriving models. Programming environments useful in Statistics, newsgroups and FAQs are also introduced due to their interest for the discipline.  相似文献   

6.
This paper presents some freeware, shareware, and commercial statistical tools available via the Internet and which could be used in QSAR for deriving models. Programming environments useful in Statistics, newsgroups and FAQs are also introduced due to their interest for the discipline.  相似文献   

7.
In this paper we report a novel three-dimensional QSAR approach, kNN-MFA, developed based on principles of the k-nearest neighbor method combined with various variable selection procedures. The kNN-MFA approach was used to generate models for three different data sets and predict the activity of test molecules through each of these models. The three data sets used were the standard steroid benchmark, an antiinflammatory and an anticancerous data set. The study resulted in kNN-MFA models having better statistical parameters than the reported CoMFA models for all the three data sets. It was also found that stochastic methods generate better models resulting in more accurate predictions as compared to stepwise forward selection procedures. Thus, kNN-MFA method represents a good alternative to CoMFA-like methods.  相似文献   

8.
Validation is a crucial aspect for quantitative structure–activity relationship (QSAR) model development. External validation is considered, in general, as the most conclusive proof of predictive capacity of a QSAR model. In the absence of truly external data set, external validation is usually performed on test set compounds, which are members of the original data set but not used in model development exercise. In the case of small data sets, QSAR researchers experience problem in model development due to the fact that the developed models may be less reliable on account of the small number of training set compounds and such models may also show poor external predictability because the models may not have captured all necessary features required for the particular structure–activity relationships. The present paper attempts to show that ‘true r(LOO)’ statistic calculated based on the model derived from the undivided data set with application of variable selection strategy at each cycle of leave‐one‐out (LOO) validation may reflect external validation characteristics of the developed model thus obviating the requirement of splitting of the data set into training and test sets. This approach may be helpful in the case of small data sets as it uses all available data for model development and validation thus making the resulting model more reliable. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
Robust cross-validation of linear regression QSAR models   总被引:1,自引:0,他引:1  
A quantitative structure-activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any way during the model development/calibration. However, during the development of a QSAR model, it is necessary to obtain some indication of the model's predictive power. This is often done by some form of cross-validation (CV). In this study, the concepts of the predictive power and fitting ability of a multiple linear regression (MLR) QSAR model were examined in the CV context allowing for the presence of outliers. Commonly used predictive power and fitting ability statistics were assessed via Monte Carlo cross-validation when applied to percent human intestinal absorption, blood-brain partition coefficient, and toxicity values of saxitoxin QSAR data sets, as well as three known benchmark data sets with known outlier contamination. It was found that (1) a robust version of MLR should always be preferred over the ordinary-least-squares MLR, regardless of the degree of outlier contamination and that (2) the model's predictive power should only be assessed via robust statistics. The Matlab and java source code used in this study is freely available from the QSAR-BENCH section of www.dmitrykonovalov.org for academic use. The Web site also contains the java-based QSAR-BENCH program, which could be run online via java's Web Start technology (supporting Windows, Mac OSX, Linux/Unix) to reproduce most of the reported results or apply the reported procedures to other data sets.  相似文献   

10.
岳玮  何红梅  冯长君 《化学通报》2018,81(7):636-640
基于拓扑化学理论,原子类型电拓扑态指数(Mk)被用于表征18种三嗪噁二唑基吡唑衍生物的化学微环境。采用最佳变量子集回归方法,分别建立上述化合物对蛋白酪氨酸磷酸酯酶1B(PTP1B)、细胞分裂周期25磷酸酯酶B(Cdc25B)的抑酶活性(P_t、C_d)与Mk的定量构效关系(QSAR)模型。它们的最佳三元QSAR模型的判定系数(R~2)依次为0.896、0.828,逐一剔除法交叉验证相关系数(R_(cv)~2)依次为0.830、0.688。经R_(cv)~2、VIF、FT、AC等检验,该模型具有良好的稳健性及预测能力。经训练集验证,上述模型均具有良好的外部预测能力。模型显示,影响Pt、Cd的因素既有不同的结构基团(-CH_3、-O-、-NH_2和芳环中-N=),也有相同的因素(芳环中-C=)。  相似文献   

11.
12.
The application of machine learning methods to the construction of quantitative structure–activity relationship models is a complex computational problem in which dimensionality reduction of the representation of the molecular structure plays a fundamental role in predicting a target activity. The feature selection pre-processing approach has been indicated to be effective in dimensionality reduction for building simpler and more understandable models. In this paper, a performance comparative study of 13 state-of-the-art feature selection filter methods is conducted. Structure–activity relationship models are constructed using three widely used classifiers and a diverse collection of datasets. The comparative study utilizes robust statistical tests to compare the algorithms. According to the experimental results, there are substantial differences in performance among the evaluated feature selection methods. The methods that exhibit the best performance are correlation-based feature selection, fast clustering-based feature selection and the set cover method.  相似文献   

13.
The quality of QSAR models: problems and solutions   总被引:1,自引:0,他引:1  
Assessment of the quality of goodness-of-fit and the confidence in predictivity (prediction power) are the main terms used to define the statistical quality of QSAR models. Three parts of this assessment can be defined as: (1) Measure of goodness-of-fit. (2) Validation of model stability. (3) Predictivity analysis. Currently there are no mandatory requirements for the validation methods to be used and rules for the quantitative confidence estimates. To compare the statistical quality of QSAR models it is necessary to have an overall statistical quality index which will depend on the goodness-of-fit, validation and predictivity results together. To do so it is necessary to define the set of mandatory parameters for all three parts of assessment listed above and develop the approach for overall quality estimates based on these parameters. It is also necessary to include into the overall index the penalty mechanism for parameter absence. The goal of the present study is to analyse parameters for all three parts of the QSAR model statistical quality assessment and investigate the flexible weighting approach for the overall statistical quality index development. Due the different statistical parameters traditionally used for assessment of goodness-of-fit it is necessary to create the mechanism, which allows flexible set of parameters to be used for the overall statistical quality index. Only after approval by scientific community and regulatory boards the final set of mandatory parameters can be selected.  相似文献   

14.
QSAR models using a large diverse set of estrogens   总被引:12,自引:0,他引:12  
Endocrine disruptors (EDs) have a variety of adverse effects in humans and animals. About 58,000 chemicals, most having little safety data, must be tested in a group of tiered assays. As assays will take years, it is important to develop rapid methods to help in priority setting. For application to large data sets, we have developed an integrated system that contains sequential four phases to predict the ability of chemicals to bind to the estrogen receptor (ER), a prevalent mechanism for estrogenic EDs. Here we report the results of evaluating two types of QSAR models for inclusion in phase III to quantitatively predict chemical binding to the ER. Our data set for the relative binding affinities (RBAs) to the ER consists of 130 chemicals covering a wide range of structural diversity and a 6 orders of magnitude spread of RBAs. CoMFA and HQSAR models were constructed and compared for performance. The CoMFA model had a r2 = 0.91 and a q2LOO = 0.66. HQSAR showed reduced performance compared to CoMFA with r2 = 0.76 and q2LOO = 0.59. A number of parameters were examined to improve the CoMFA model. Of these, a phenol indicator increased the q2LOO to 0.71. When up to 50% of the chemicals were left out in the leave-N-out cross-validation, the q2 remained significant. Finally, the models were tested by using two test sets; the q2pred for these were 0.71 and 0.62, a significant result which demonstrates the utility of the CoMFA model for predicting the RBAs of chemicals not included in the training set. If used in conjunction with phases I and II, which reduced the size of the data set dramatically by eliminating most inactive chemicals, the current CoMFA model (phase III) can be used to predict the RBA of chemicals with sufficient accuracy and to provide quantitative information for priority setting.  相似文献   

15.

Background

The new European Regulation on chemical safety, REACH, (Registration, Evaluation, Authorisation and Restriction of CHemical substances), is in the process of being implemented. Many chemicals used in industry require additional testing to comply with the REACH regulations. At the same time EU member states are attempting to reduce the number of animals used in experiments under the 3 Rs policy, (refining, reducing, and replacing the use of animals in laboratory procedures). Computational techniques such as QSAR have the potential to offer an alternative for generating REACH data. The FP6 project CAESAR was aimed at developing QSAR models for 5 key toxicological endpoints of which skin sensitisation was one.

Results

This paper reports the development of two global QSAR models using two different computational approaches, which contribute to the hybrid model freely available online.

Conclusions

The QSAR models for assessing skin sensitisation have been developed and tested under stringent quality criteria to fulfil the principles laid down by the OECD. The final models, accessible from CAESAR website, offer a robust and reliable method of assessing skin sensitisation for regulatory use.
  相似文献   

16.
17.
18.
Perfluorinated compounds (PFCs) are a class of emerging pollutants still widely used in different materials as non-adhesives, waterproof fabrics, fire-fighting foams, etc. Their toxic effects include potential for endocrine-disrupting activity, but the amount of experimental data available for these pollutants is limited. The use of predictive strategies such as quantitative structure-activity relationships (QSARs) is recommended under the REACH regulation, to fill data gaps and to screen and prioritize chemicals for further experimentation, with a consequent reduction of costs and number of tested animals. In this study, local classification models for PFCs were developed to predict their T4-TTR (thyroxin-transthyretin) competing potency. The best models were selected by maximizing the sensitivity and external predictive ability. These models, characterized by robustness, good predictive power and a defined applicability domain, were applied to predict the activity of 33 other PFCs of environmental concern. Finally, classification models recently published by our research group for T4-TTR binding of brominated flame retardants and for estrogenic and anti-androgenic activity were applied to the studied perfluorinated chemicals to compare results and to further evaluate the potential for these PFCs to cause endocrine disruption.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号