首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
丛湧  薛英 《物理化学学报》2013,29(8):1639-1647
对89 个苯并异噻唑和苯并噻嗪类丙型肝炎病毒(HCV) NS5B聚合酶非核苷抑制剂进行了定量构效关系(QSAR)研究. 采用遗传算法组合偏最小二乘(GA-PLS)和线性逐步回归分析(LSRA)两种特征选择方法选择最优描述符子集, 然后建立多元线性回归和偏最小二乘线性回归模型. 并首次尝试使用遗传算法耦合支持向量机方法(GA-SVM)对两种特征选择方法所选的描述符子集分别建立非线性支持向量机回归模型. 三种机器学习方法所建模型均得到比较满意的预测效果. 采用LSRA所选的6 个描述符建立的三个QSAR模型对于测试集的相关系数为0.958-0.962, GA-SVM法给出最好的预测精度(0.962). 采用GA-PLS所选的7个描述符建立的三个QSAR模型对于测试集的相关系数为0.918-0.960, 偏最小二乘回归模型的结果最好(0.960). 本工作提供了一种有效的方法来预测丙型肝炎病毒抑制剂的生物活性, 该方法也可以扩展到其他类似的定量构效关系研究领域.  相似文献   

2.
3.
4.
梅虎  周原  廖志华  李志良 《化学学报》2006,64(9):949-952
采用VHSE氨基酸结构描述子表征HLA-A*0201限制性表位结构, 以遗传算法和偏最小二乘相结合(GA-PLS)对102个训练集进行定量构效关系建模. 剔除3个异常样本后, 据候选模型交互检验及50个外部测试集预测结果, 筛选得到最优偏最小二乘模型(A=2), 其R2, Q2和 分别为0.755, 0.621和0.680. 构效研究显示: CTL表位活性主要与1, 2, 7, 8, 9位氨基酸残基疏水、1, 2位立体及6位残基电性等性质密切相关.  相似文献   

5.
6.
7.
8.
9.
10.
In this work, different approaches for variable selection are studied in the context of near-infrared (NIR) multivariate calibration of textile. First, a model-based regression method is proposed. It consists in genetic algorithm optimisation combined with partial least squares regression (GA-PLS). The second approach is a relevance measure of spectral variables based on mutual information (MI), which can be performed independently of any given regression model. As MI makes no assumption on the relationship between X and Y, non-linear methods such as feed-forward artificial neural network (ANN) are thus encouraged for modelling in a prediction context (MI-ANN). GA-PLS and MI-ANN models are developed for NIR quantitative prediction of cotton content in cotton-viscose textile samples. The results are compared to full-spectrum (480 variables) PLS model (FS-PLS). The model requires 11 latent variables and yielded a 3.74% RMS prediction error in the range 0-100%. GA-PLS provides more robust model based on 120 variables and slightly enhanced prediction performance (3.44% RMS error). Considering MI variable selection procedure, great improvement can be obtained as 12 variables only are retained. On the basis of these variables, a 12 inputs ANN model is trained and the corresponding prediction error is 3.43% RMS error.  相似文献   

11.
Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as measured by the leave-one-out cross-validation correlation coefficient ( Q abs2). The prediction accuracy for the external validation sets I and II ranged from 0.71 to 0.85 (linear regression coefficient R absI2) and from 0.38 to 0.83 (linear regression coefficient R absII2), respectively. The use of an applicability domain threshold implemented in most models generally improved the external prediction accuracy but at the same time led to a decrease in chemical space coverage. Finally, several consensus models were developed by averaging the predicted aquatic toxicity for every compound using all 15 models, with or without taking into account their respective applicability domains. We find that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models. Our studies prove the power of a collaborative and consensual approach to QSAR model development. The best validated models of aquatic toxicity developed by our collaboratory (both individual and consensus) can be used as reliable computational predictors of aquatic toxicity and are available from any of the participating laboratories.  相似文献   

12.
The blood-brain permeation of a structurally diverse set of 281 compounds was modeled using linear regression and a multivariate genetic partial least squares (G/PLS) approach. Key structural features affecting the logarithm of blood-brain partitioning (logBB) were captured through statistically significant quantitative structure-activity relationship (QSAR) models. These relationships reveal the importance of logP, polar surface area, and a variety of electrotopological indices for accurate predictions of logBB. The best models reveal an excellent correlation (r > 0.9) for a training set of 58 compounds. Likewise, the comparison of the average logBB values obtained from an ensemble of QSAR models with experimental values also verifies the statistical quality of the models (r > 0.9). The models provide good agreement (r approximately 0.7) between the predicted logBB values for 34 molecules in the external validation set and the experimental values. To further validate the models for use during the drug discovery process, a prediction set of 181 drugs with reported CNS penetration data was used. A >70% success rate is obtained by using any of the QSAR models in the qualitative prediction for CNS permeable (active) drugs. A lower success rate (approximately 60%) was obtained for the best model for CNS impermeable (inactive) drugs. Combining the predictions obtained from all the models (consensus) did not significantly improve the discrimination of CNS active and CNS inactive molecules. Finally, using the therapeutic classification as a guiding tool, the CNS penetration capability of over 2000 compounds in the Synthline database was estimated. The results were very similar to the smaller set of 181 compounds.  相似文献   

13.
14.
Simultaneous multicomponent analysis is usually carried out by multivariate calibration models such as partial least squares (PLS) that utilize the full spectrum. It has been demonstrated by both experimental and theoretical considerations that better results can be obtained by a proper selection of the spectral range to be included in calculations. A genetic algorithm is one of the most popular methods for selecting variables for PLS calibration of mixtures with almost identical spectra without loss of prediction capacity. In this work, a simple and precise method for rapid and accurate simultaneous determination of sulfide and sulfite ions based on the addition reaction of these ions with new fuchsin at pH 8 and 25°C by PLS regression and using a genetic algorithm (GA) for variable selection is proposed. The concentrations of sulfide and sulfite ions varied between 0.05–2.50 and 0.15–2.00 μg/mL, respectively. A series of synthetic solutions containing different concentrations of sulfide and sulfite were used to check the prediction ability of GA-PLS models. The root mean square error of prediction with PLS on the whole data set was 0.19 μg/mL for sulfide and 0.09 μg/mL for sulfite. After the application of GA, these values were reduced to 0.04 and 0.03 μg/mL, respectively. The text was submitted by the authors in English.  相似文献   

15.
Microcrystalline naphthalene extraction has been used for the preconcentration of p-benzoquinone and tetrachloro-p-benzoquinone (chloranil), after their reaction by aniline, and later simultaneous spectrophotometric analysis by genetic algorithm-partial least squares (GA-PLS) calibration. The chemical variables affecting the analytical performance of the methodology were studied and optimized. Under the optimum conditions i.e., [aniline] = 0.05 M and [naphthalene] = 2.2% (w/v), preconcentration of 25 ml of sample solution permitted the detection of 0.32 and 0.23 microg ml(-1) for p-benzoquinone and chloranil, respectively. The predictive abilities of partial least squares regression (PLS) and genetic algorithm-partial least squares regression (GA-PLS) were examined for simultaneous determination of two quinones. The GA-PLS shows superiority over other PLS methods due to the wavelength selection in PLS calibration using a genetic algorithm without loss of prediction capacity, provides useful information about the chemical system.  相似文献   

16.
17.
ω-芋螺毒素属于海洋生物活性多肽,由24-31个氨基酸残基组成.特异性作用于电压敏感的钙离子通道(VGCCs),能够直接开发成药物或作为先导化合物进行新药开发.本文应用新型氨基酸残基结构描述符cscales和遗传偏最小二乘算法,对ω-芋螺毒素进行定量构效关系(QSAR)研究,并设计、构建了容量为2244个化合物的N-型和P/Q-型VGCC拮抗剂虚拟组合多肽库,然后分别采用QSAR模型预测和相似性搜索方法对组合多肽库进行了虚拟筛选.研究结果表明,建立的N-型和P/Q-型VGCC拮抗剂QSAR模型均具有较好的预测能力,交叉验证相关系数(CV-r2)均大于0.89.主成分分析和聚类分析结果表明,虚拟组合多肽库中化合物具有较好的结构多样性和差异性.通过虚拟筛选,得到了具有高预测活性的6个N-型和19个P/Q-型钙离子通道拮抗剂,为进一步的合成和活性评价奠定了理论基础.同时,本文建立的多肽QSAR预测模型和虚拟筛选策略,为其它多肽类化合物的定量构效关系研究和虚拟筛选提供了参考.  相似文献   

18.
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .  相似文献   

19.
In this study, the simultaneous determination of paracetamol, ibuprofen and caffeine in pharmaceuticals by chemometric approaches using UV spectrophotometry has been reported as a simple alternative to using separate models for each component. Spectra of paracetamol, ibuprofen and caffeine were recorded at several concentrations within their linear ranges and were used to compute the calibration mixture between wavelengths 200 and 400 nm at an interval of 1 nm in methanol:0.1 HCl (3:1). Partial least squares regression (PLS), genetic algorithm coupled with PLS (GA-PLS), and principal component-artificial neural network (PC-ANN) were used for chemometric analysis of data and the parameters of the chemometric procedures were optimized. The analytical performances of these chemometric methods were characterized by relative prediction errors and recoveries (%) and were compared with each other. The GA-PLS shows superiority over other applied multivariate methods due to the wavelength selection in PLS calibration using a genetic algorithm without loss of prediction capacity. Although the components show an important degree of spectral overlap, they have been determined simultaneously and rapidly requiring no separation step. These three methods were successfully applied to pharmaceutical formulation, capsule, with no interference from excipients as indicated by the recovery study results. The proposed methods are simple and rapid and can be easily used in the quality control of drugs as alternative analysis tools.  相似文献   

20.
A novel three-dimensional holographic vector of atomic interaction field(3D-HoVAIF) was used to describe the chemical structures of 23 benzoxazinone derivatives as antithrombotic drugs.Here a quantitative structure activity relationship(QSAR) model was built by partial least-squares(PLS) regression.The estimation stability and prediction ability of the model were strictly analyzed by both internal and external validations.The correlation coefficients of established PLS model,leave-one-out(LOO) cross-validation,and predicted values versus experimental ones of external samples were R2=0.899,RCV2=0.854 and Qext2=0.868,respectively.These values indicated that the built PLS model had both favorable estimation stability and good prediction capabilities.Furthermore,the satisfactory results showed that 3D-HoVAIF could preferably express the information related to the biological activity of benzoxazinone derivatives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号