首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Alongside the validation, the concept of applicability domain (AD) is probably one of the most important aspects which determine the quality as well as reliability of the established quantitative structure–activity relationship (QSAR) models. To date, a variety of approaches for AD estimation have been devised which can be applied to particular type of QSAR models and their practical utilization is extensively elaborated in the literature. The present study introduces a novel, simple, and effective distance-based method for estimation of the AD in case of developed and validated predictive counter-propagation artificial neural network (CP ANN) models through a proficient exploitation of the Euclidean distance (ED) metric in the structure-representation vector space. The performance of the method was evaluated and explained in a case study by using a pre-built and validated CP ANN model for prediction of the transport activity of the transmembrane protein bilitranslocase for a diverse set of compounds. The method was tested on two more datasets in order to confirm its performance for evaluation of the applicability domain in CP ANN models. The chemical compounds determined as potential outliers, i.e., outside of the CP ANN model AD, were confirmed in a comparative AD assessment by using the leverage approach. Moreover, the method offers a graphical depiction of the AD for fast and simple determination of the extreme points.  相似文献   

2.
3.
线性特征选择方法可提升定量构效关系(QSAR)模型的预测能力,但易忽略特征(理化属性)与分子活性间的非线性关系。本文提出基于支持向量回归(SVR)的逐步非线性回归(SSNR)特征选择算法并用于降血压药物血管紧张素转化酶(ACE)抑制肽的QSAR研究。首先以具有不同背景的5组分子描述符分别表征肽序列,以SSNR实施特征选择,再通过智能一致性模型(ICM)对各组描述符对应子模型的预测活性进行加权整合,获得最终活性预测值。在ACE抑制二肽与三肽两个数据上的应用结果表明,SSNR获得的特征子集结合ICM策略可有效提升模型预测能力(二肽的平均Q■为0.675±0.002,三肽为0.663±0.013),优于遗传算法-偏最小二乘(0.538±0.049、0.599±0.047)与逐步线性回归(0.583±0.041、0.675±0.010)。最后基于抑制活性已知肽序列预测所有活性未知肽的活性,分析了高活性肽及其氨基酸偏好性,为人工合成潜在高活性ACE抑制肽提供可能的序列组合。  相似文献   

4.
5.
6.
7.
8.
ABSTRACT

Here we report a new predictive model for autoignition temperature (AIT), an important physical parameter widely used to assess potential safety hazards of combustible materials. Available structure-AIT data extracted from different sources were critically analysed. Support vector regression (SVR) models on different data subsets were built in order to identify a reliable compound set on which a realistic model could be built. This led to a selection of the dataset containing 875 compounds annotated with AIT values. The thereupon-based SVR model performs reasonably well in cross-validation with the determination coefficient r 2 = 0.77 and mean absolute error MAE = 37.8°C. External validation on 20 industrial compounds missing in the training set confirmed its good predictive power (MAE = 28.7°C).  相似文献   

9.
Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as measured by the leave-one-out cross-validation correlation coefficient ( Q abs2). The prediction accuracy for the external validation sets I and II ranged from 0.71 to 0.85 (linear regression coefficient R absI2) and from 0.38 to 0.83 (linear regression coefficient R absII2), respectively. The use of an applicability domain threshold implemented in most models generally improved the external prediction accuracy but at the same time led to a decrease in chemical space coverage. Finally, several consensus models were developed by averaging the predicted aquatic toxicity for every compound using all 15 models, with or without taking into account their respective applicability domains. We find that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models. Our studies prove the power of a collaborative and consensual approach to QSAR model development. The best validated models of aquatic toxicity developed by our collaboratory (both individual and consensus) can be used as reliable computational predictors of aquatic toxicity and are available from any of the participating laboratories.  相似文献   

10.
11.
基于地统计学与支持向量回归的QSAR建模   总被引:4,自引:0,他引:4  
基于主成分分析(PCA)、地统计学(GS)和支持向量回归(SVR), 提出了一种新的定量构效关系(QSAR)个体化预测方法——Weight-PCA-GS-SVR. 其基本思路是: 先以PCA降维并消除自变量间的信息冗余, 继以SVR经非线性主成分筛选去除与因变量无关的主成分, 再以保留主成分计算样本间的加权距离, 然后以高维GS确定公用变程; 每一个待测样本都以自身为中心从训练集中找出加权距离小于公用变程的私有k个近邻, 以SVR训练建模完成个体化预测. Weight-PCA-GS-SVR从行、列两个方向对模型进行了优化, 为自变量提供了一种新的加权方法, 为解决最优k近邻选择难题提供了新的思路, 并具有SVR原来的优点. 经3个化合物活性实例数据集验证, 新方法在所有参比模型中预测精度最高, 且明显优于文献报道结果, Weight-PCA-GS-SVR在QSAR等回归预测领域有较广泛的应用前景.  相似文献   

12.
基于支持向量机的高维特征非线性快速筛选与肽QSAR建模   总被引:1,自引:0,他引:1  
以氨基酸的531个物理化学性质参数直接表征肽的结构, 基于支持向量回归发展了一种新的高维特征非线性快速筛选方法, 将其应用于苦味二肽和血管紧张素转化酶抑制剂2个肽体系的定量序效关系(QSAR)建模, 各筛选获得10个意义明确的保留描述子. 以保留描述子建立支持向量回归模型, 其拟合精度、留一法交叉测试精度和外部预测精度较文献报道结果均有较大幅度提升, 优势明显; 对所建模型进行了非线性回归显著性测验、单因子相对重要性显著性测验和单因子效应分析, 增强了模型的可解释性. 新方法在肽、蛋白质QSAR建模等高维数据回归预测领域有广泛应用前景.  相似文献   

13.
A QSAR study on a series of pyrimidinyl and triazinyl amines was performed to explore the physico-chemical parameters responsible for their anti-HIV activity and cytotoxicity. Physico-chemical parameters were calculated using WIN CAChe 6.1. Stepwise multiple linear regression analysis was carried out to derive QSAR models which were further evaluated for statistical significance and predictive power by internal and external validation. The selected best QSAR models showed correlation coefficient R of 0.914 and 0.901, and cross-validated squared correlation coefficient Q 2 of 0.685 and 0.691 for anti-HIV activity and cytotoxicity, respectively. The developed significant QSAR model indicates that hydrophobicity of the whole molecule plays an important role in the anti-HIV activity and cytotoxicity of pyrimidinyl and triazinyl amine derivatives. When hydrophobicity is increased, anti-HIV activity of the present series of compounds is decreased leading to high cytotoxicity.  相似文献   

14.
15.
A new quantitative structure–activity relationship (QSAR) of the inhibition of mild steel corrosion in 1 M hydrochloric acid using furan derivatives was developed by proposing two‐stage sparse multiple linear regression. The sparse multiple linear regression using ridge penalty and sparse multiple linear regression using elastic net (SMLRE) were used to develop the QSAR model. The results show that the SMLRE‐based model possesses high predictive power compared with sparse multiple linear regression using ridge penalty‐based model according to the mean‐squared errors for both training and test datasets, leave‐one‐out internal validation (Q2int = 0.98), and external validation (Q2ext = 0.95). In addition, the results of applicability domain assessment using the leverage approach reveal a reliable and robust SMLRE‐based model. In conclusion, the developed QSAR model using SMLRE can be efficiently used in the studies of corrosion inhibition efficiency. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
Membrane transport proteins are essential for cellular uptake of numerous salts, nutrients and drugs. Bilitranslocase is a transporter, specific for water-soluble organic anions, and is the only known carrier of nucleotides and nucleotide-like compounds. Experimental data of bilitranslocase ligand specificity for 120 compounds were used to construct classification models using counter-propagation artificial neural networks (CP-ANNs) and support vector machines (SVMs). A subset of active compounds with experimentally determined transport rates was used to build predictive QSAR models for estimation of transport rates of unknown compounds. Several modelling methods and techniques were applied, i.e. CP-ANN, genetic algorithm, self-organizing mapping and multiple linear regression method. The best predictions were achieved using CP-ANN coupled with a genetic algorithm, with the external validation parameter QV2 of 0.96. The applicability domains of the models were defined to determine the chemical space in which reliable predictions can be obtained. The models were applied for the estimation of bilitranslocase transport activity for two sets of pharmaceutically interesting compounds, antioxidants and antiprions. We found that the relative planarity and a high potential for hydrogen bond formation are the common structural features of anticipated substrates of bilitranslocase. These features may serve as guidelines in the design of new pharmaceuticals transported by bilitranslocase.  相似文献   

17.
18.
Spread of multidrug‐resistant Escherichia coli clinical isolates is a main problem in the treatment of infectious diseases. Therefore, the modern scientific approaches in decision this problem require not only a prevention strategy, but also the development of new effective inhibitory compounds with selective molecular mechanism of action and low toxicity. The goal of this work is to identify more potent molecules active against E. coli strains by using machine learning, docking studies, synthesis and biological evaluation. A set of predictive QSAR models was built with two publicly available structurally diverse data sets, including recent data deposited in PubChem. The predictive ability of these models tested by a 5-fold cross-validation, resulted in balanced accuracies (BA) of 59–98% for the binary classifiers. Test sets validation showed that the models could be instrumental in predicting the antimicrobial activity with an accuracy (with BA = 60–99 %) within the applicability domain. The models were applied to screen a virtual chemical library, which was designed to have activity against resistant E. coli strains. The eight most promising compounds were identified, synthesized and tested. All of them showed the different levels of anti-E. coli activity and acute toxicity. The docking results have shown that all studied compounds are potential DNA gyrase inhibitors through the estimated interactions with amino acid residues and magnesium ion in the enzyme active center The synthesized compounds could be used as an interesting starting point for further development of drugs with low toxicity and selective molecular action mechanism against resistant E. coli strains. The developed QSAR models are freely available online at OCHEM http://ochem.eu/article/112525 and can be used to virtual screening of potential compounds with anti-E. coli activity.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号