首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the last two decades, the volumes of chemical and biological data are constantly increasing. The problem of converting data sets into knowledge is both expensive and time-consuming, as a result a workflow technology with platforms such as KNIME, was built up to facilitate searching through multiple heterogeneous data sources and filtering for specific criteria then extracting hidden information from these large data. Before any QSAR modeling, a manual data curation is extremely recommended. However, this can be done, for small datasets, but for the extensive data accumulated recently in public databases a manual process of big data will be hardly feasible. In this work, we suggest using KNIME as an automated solution for workflow in data curation, development, and validation of predictive QSAR models from a huge dataset.In this study, we used 250250 structures from NCI database, only 3520 compounds could successfully pass through our workflow safely with their corresponding experimental log P, this property was investigated as a case study, to improve some existing log P calculation algorithms.  相似文献   

2.
3.
Abstract

Computational chemistry provides a means for the calculation or estimation of three-dimensional chemical structure, organization and analysis of chemical data, classification of industrial chemicals by structure and properties, prediction of toxicity, and identification of chemical structure. The development of the EPA National Environmental Supercomputer Center (NESC) in Bay City, Michigan, makes available to scientists in EPA Headquarters, the ability to perform advanced QSAR modeling. This provides the means to develop and apply QSAR models for chemicals acting by a variety of molecular mechanisms. The work makes possible improved programmatic support to the Office of Pollution Prevention and Toxics under the Toxic Substances Control Act and the Pollution Prevention Act.  相似文献   

4.
5.
6.
7.
Datasets of molecular compounds often contain outliers, that is, compounds which are different from the rest of the dataset. Outliers, while often interesting may affect data interpretation, model generation, and decisions making, and therefore, should be removed from the dataset prior to modeling efforts. Here, we describe a new method for the iterative identification and removal of outliers based on a k‐nearest neighbors optimization algorithm. We demonstrate for three different datasets that the removal of outliers using the new algorithm provides filtered datasets which are better than those provided by four alternative outlier removal procedures as well as by random compound removal in two important aspects: (1) they better maintain the diversity of the parent datasets; (2) they give rise to quantitative structure activity relationship (QSAR) models with much better prediction statistics. The new algorithm is, therefore, suitable for the pretreatment of datasets prior to QSAR modeling. © 2014 Wiley Periodicals, Inc.  相似文献   

8.
9.
10.
ABSTRACT

Several 3D-QSAR models were built based on 196 hepatitis C virus (HCV) NS5A protein inhibitors. The bioactivity values EC90 for three types of inhibitors, the wild type (GT1a) and two mutants (GT1a Y93H and GT1a L31V), were collected to build three datasets. The programs OMEGA and ROCS were used for generating conformations and aligning molecules of the dataset, respectively. Each dataset was randomly divided into a training set and a test set three times to reduce the contingency of only one random selection. QSAR models were computed by comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA). For the datasets GT1a, GT1a Y93H, and GT1a L31V, the best models CoMFA-INDX, CoMSIA-SEHA, and CoMSIA-SEHA showed an r2 value of 0.682 ± 0.033, 0.779 ± 0.036, and 0.782 ± 0.022 on the test sets, respectively. From the contour maps of the three best models, we summarized the favourable and unfavourable substituents on the tetracyclic core, the Z group, the proline group, and the valine group of inhibitors. We guessed the mutants could change the electrostatic surfaces of the wild type active pocket. In addition, we used ECFP analyses to find important substructures and could intuitively understand the results from QSAR models.  相似文献   

11.
Self-Organizing Molecular Field Analysis (SOMFA) comes with a built-in regression methodology, the Self-Organizing Regression (SOR), instead of relying on external methods such as PLS. In this article we present a proof of the equivalence between SOR and SIMPLS with one principal component. Thus, the modest performance of SOMFA on complex datasets can be primarily attributed to the low performance of the SOMFA regression methodology. A multi-component extension of the original SOR methodology (MCSOR) is introduced, and the performances of SOR, MCSOR and SIMPLS are compared using several datasets. The results indicate that in general the performance of SOMFA models is greatly improved if SOR is replaced with a more sophisticated regression method. The results obtained for the Cramer (CBG) dataset further underline the fact that it is a very poor benchmark dataset and should not be used to evaluate the performance of QSAR techniques.  相似文献   

12.
13.
14.
15.
ABSTRACT

The aryl hydrocarbon receptor (AhR) plays an important role in several biological processes such as reproduction, immunity and homoeostasis. However, little is known on the chemical-structural and physicochemical features that influence the activity of AhR antagonistic modulators. In the present report, in vitro AhR antagonistic activity evaluations, based on a chemical-activated luciferase gene expression (AhR-CALUX) bioassay, and an extensive literature review were performed with the aim of constructing a structurally diverse database of contaminants and potentially toxic chemicals. Subsequently, QSAR models based on Linear Discriminant Analysis and Logistic Regression, as well as two toxicophoric hypotheses were proposed to model the AhR antagonistic activity of the built dataset. The QSAR models were rigorously validated yielding satisfactory performance for all classification parameters. Likewise, the toxicophoric hypotheses were validated using a diverse set of 350 decoys, demonstrating adequate robustness and predictive power. Chemical interpretations of both the QSAR and toxicophoric models suggested that hydrophobic constraints, the presence of aromatic rings and electron-acceptor moieties are critical for the AhR antagonism. Therefore, it is hoped that the deductions obtained in the present study will contribute to elucidate further on the structural and physicochemical factors influencing the AhR antagonistic activity of chemical compounds.  相似文献   

16.
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .  相似文献   

17.
Abstract

The ability to determine the biodegradability of chemicals without resorting to expensive tests is ecologically and economically desirable. Models based on quantitative structure–activity relations (QSAR) provide some promise in this direction. However, QSAR models in the literature rarely provide uncertainty estimates in more detail than aggregated statistics such as the sensitivity and specificity of the model’s predictions. Almost never is there a means of assessing the uncertainty in an individual prediction. Without an uncertainty estimate, it is impossible to assess the trustworthiness of any particular prediction, which leaves the model with a low utility for regulatory purposes. In the present work, a QSAR model with uncertainty estimates is used to predict biodegradability for a set of substances from a publicly available data set. Separation was performed using a partial least squares discriminant analysis model, and the uncertainty was estimated using bootstrapping. The uncertainty prediction allows for confidence intervals to be assigned to any of the model’s predictions, allowing for a more complete assessment of the model than would be possible through a traditional statistical analysis. The results presented here are broadly applicable to other areas of modelling as well, because the calculation of the uncertainty will clearly demonstrate where additional tests are needed.  相似文献   

18.
Quantitative Structure–Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q 2 for the training set and accuracy of prediction (R 2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.  相似文献   

19.
Quantitative structure–activity relationship (QSAR) models for predicting acute toxicity to Daphnia magna are often associated with poor performances, urging the need for improvement to meet REACH requirements. The aim of this study was to evaluate the accuracy, stability and reliability of a previously published QSAR model by means of further external validation and to optimize its performance by means of extension to new data as well as a consensus approach. The previously published model was validated with a large set of new molecules and then compared with ChemProp model, from which most of the validation data were taken. Results showed better performance of the proposed model in terms of accuracy and percentage of molecules outside the applicability domain. The model was re-calibrated on all the available data to confirm the efficacy of the similarity-based approach. The extended dataset was also used to develop a novel model based on the same similarity approach but using binary fingerprints to describe the chemical structures. The fingerprint-based model gave lower regression statistics, but also less unpredicted compounds. Eventually, consensus modelling was successfully used to enhance the accuracy of the predictions and to halve the percentage of molecules outside the applicability domain.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号