首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
A novel projection modeling method for quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) is developed in this paper. Orthogonalization of block variables is introduced to deal with the problem of variable selection. Projections based on least squares are used to construct the modeling space in order to search for the best regression directions for chemical modeling. A suitable prediction space for such a model is further defined to confine the usage range of the model. Three real data sets were analyzed to check the performance of the proposed modeling method. The results obtained from Monte‐Carlo cross‐validation (MCCV) showed that the proposed modeling method might provide better results for QSAR and QSPR modeling than PCR and PLS with respect to both fitting and prediction abilities. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

5.
Abstract

Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

6.

The CORAL software (http://www.insilico.eu/coral) was suggested as a tool to build up quantitative structure–property/activity relationships (QSPRs/QSARs). This software is based on conception “a QSPR/QSAR model should be interpreted as a random event.” This is reflection of fact: different distributions into the training set (substances involved in modeling process) and the validation set (substances, which are not known at the moment of the modeling process) give models with significant dispersion in the statistical quality of the QSPR/QSAR. Results of experiments with the software and possible ways of further improvement of this software are discussed. The most attractive new ways to estimate predictive potential of the CORAL model seem to be the following ones: (i) index of ideality of correlation and (ii) correlation contradiction index. These can be also proposed as criteria of predictive potential for arbitrary QSPR/QSAR.

  相似文献   

7.
8.
Thyroid function diagnosis is an important classification problem, and we made reanalysis of the human thyroid data, which had been analyzed by the multivariate analysis, by the two notable neural networks. One is the self-organizing map approach which clusters the patients and displays visually a characteristic of the distribution according to laboratory tests. We found that self-organizing map (SOM) consists of three well separated clusters corresponding to hyperthyroid, hypothyroid and normal, and more detailed information for patients is obtained from the position in the map. Besides, the missing value SOM which we had introduced to investigate QSAR problem turned out to be also useful in treating such classification problem. We estimated the classification rates of thyroid disease using Bayesian regularized neural network (BRNN) and found that its prediction accuracy is better than multivariate analysis. Automatic relevance determination (ARD) method of BRNN was surely verified to be effective by the direct calculation of classification rates using BRNN without ARD for all possible combinations of laboratory tests.  相似文献   

9.
10.
The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities.  相似文献   

11.
Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as measured by the leave-one-out cross-validation correlation coefficient ( Q abs2). The prediction accuracy for the external validation sets I and II ranged from 0.71 to 0.85 (linear regression coefficient R absI2) and from 0.38 to 0.83 (linear regression coefficient R absII2), respectively. The use of an applicability domain threshold implemented in most models generally improved the external prediction accuracy but at the same time led to a decrease in chemical space coverage. Finally, several consensus models were developed by averaging the predicted aquatic toxicity for every compound using all 15 models, with or without taking into account their respective applicability domains. We find that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models. Our studies prove the power of a collaborative and consensual approach to QSAR model development. The best validated models of aquatic toxicity developed by our collaboratory (both individual and consensus) can be used as reliable computational predictors of aquatic toxicity and are available from any of the participating laboratories.  相似文献   

12.
The OECD has proposed five principles for validation of QSAR models used for regulatory purposes. Here we present a case study investigating how these principles can be applied to models based on Kohonen and counter propagation neural networks. The study is based on a counter propagation network model that has been built using toxicity data in fish fathead minnow for 541 compounds. The study demonstrates that most, if not all, of the OECD criteria may be met when modeling using this neural network approach.  相似文献   

13.
The OECD has proposed five principles for validation of QSAR models used for regulatory purposes. Here we present a case study investigating how these principles can be applied to models based on Kohonen and counter propagation neural networks. The study is based on a counter propagation network model that has been built using toxicity data in fish fathead minnow for 541 compounds. The study demonstrates that most, if not all, of the OECD criteria may be met when modeling using this neural network approach.  相似文献   

14.
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

15.
The main utility of QSAR models is their ability to predict activities/properties for new chemicals, and this external prediction ability is evaluated by means of various validation criteria. As a measure for such evaluation the OECD guidelines have proposed the predictive squared correlation coefficient Q(2)(F1) (Shi et al.). However, other validation criteria have been proposed by other authors: the Golbraikh-Tropsha method, r(2)(m) (Roy), Q(2)(F2) (Schu?u?rmann et al.), Q(2)(F3) (Consonni et al.). In QSAR studies these measures are usually in accordance, though this is not always the case, thus doubts can arise when contradictory results are obtained. It is likely that none of the aforementioned criteria is the best in every situation, so a comparative study using simulated data sets is proposed here, using threshold values suggested by the proponents or those widely used in QSAR modeling. In addition, a different and simple external validation measure, the concordance correlation coefficient (CCC), is proposed and compared with other criteria. Huge data sets were used to study the general behavior of validation measures, and the concordance correlation coefficient was shown to be the most restrictive. On using simulated data sets of a more realistic size, it was found that CCC was broadly in agreement, about 96% of the time, with other validation measures in accepting models as predictive, and in almost all the examples it was the most precautionary. The proposed concordance correlation coefficient also works well on real data sets, where it seems to be more stable, and helps in making decisions when the validation measures are in conflict. Since it is conceptually simple, and given its stability and restrictiveness, we propose the concordance correlation coefficient as a complementary, or alternative, more prudent measure of a QSAR model to be externally predictive.  相似文献   

16.
17.
Validation is a crucial aspect for quantitative structure–activity relationship (QSAR) model development. External validation is considered, in general, as the most conclusive proof of predictive capacity of a QSAR model. In the absence of truly external data set, external validation is usually performed on test set compounds, which are members of the original data set but not used in model development exercise. In the case of small data sets, QSAR researchers experience problem in model development due to the fact that the developed models may be less reliable on account of the small number of training set compounds and such models may also show poor external predictability because the models may not have captured all necessary features required for the particular structure–activity relationships. The present paper attempts to show that ‘true r(LOO)’ statistic calculated based on the model derived from the undivided data set with application of variable selection strategy at each cycle of leave‐one‐out (LOO) validation may reflect external validation characteristics of the developed model thus obviating the requirement of splitting of the data set into training and test sets. This approach may be helpful in the case of small data sets as it uses all available data for model development and validation thus making the resulting model more reliable. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

18.
Quantitative structure–activity relationship (QSAR) models for predicting acute toxicity to Daphnia magna are often associated with poor performances, urging the need for improvement to meet REACH requirements. The aim of this study was to evaluate the accuracy, stability and reliability of a previously published QSAR model by means of further external validation and to optimize its performance by means of extension to new data as well as a consensus approach. The previously published model was validated with a large set of new molecules and then compared with ChemProp model, from which most of the validation data were taken. Results showed better performance of the proposed model in terms of accuracy and percentage of molecules outside the applicability domain. The model was re-calibrated on all the available data to confirm the efficacy of the similarity-based approach. The extended dataset was also used to develop a novel model based on the same similarity approach but using binary fingerprints to describe the chemical structures. The fingerprint-based model gave lower regression statistics, but also less unpredicted compounds. Eventually, consensus modelling was successfully used to enhance the accuracy of the predictions and to halve the percentage of molecules outside the applicability domain.  相似文献   

19.
There are many pathogen microbial species with very different antimicrobial drugs susceptibility. In this work, we selected pairs of antifungal drugs with similar/dissimilar species predicted-activity profile and represented it as a large network, which may be used to identify drugs with similar mechanism of action. Computational chemistry prediction of the biological activity based on quantitative structure-activity relationships (QSAR) susbtantially increases the potentialities of this kind of networks, avoiding time and resource-consuming experiments. Unfortunately, most QSAR models are unspecific or predict activity against only one species. To solve this problem we developed a multispecies QSAR classification model, in which the outputs were the inputs of the aforementioned network. Overall model classification accuracy was 87.0% (161/185 compounds) in training, 83.4% (50/61) in validation, and 83.7% for 288 additional antifungal compounds used to extend model validation for network construction. The network predicted has 59 nodes (compounds), 648 edges (pairs of compounds with similar activity), low coverage density d = 37.8%, and distribution more close to normal than to exponential. These results are more characteristic of a not-overestimated random network, clustering different drug mechanisms of actions, than of a less useful power law network with few mechanisms (network hubs).  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号