首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
分别采用支持向量学习机、人工神经网络、调节性逻辑回归和K-最临近等机器学习方法对761个二氢叶酸还原酶抑制剂建立了其活性分类预测模型. 采用组成描述符和拓扑描述符表征抑制剂的分子结构及物理化学性质, 使用Kennard-Stone方法进行训练集的设计, 并用Metropolis Monte Carlo模拟退火方法作变量选择. 结果表明, 支持向量学习机优于其它机器学习方法, 所得到的最优模型具有较好的预测结果, 其预测正确率为91.62%. 说明通过合适的训练集设计及变量选择, 支持向量学习机方法可以很好地用于二氢叶酸还原酶抑制剂的活性分类预测.  相似文献   

2.
为了预测分子的抗真菌活性,计算了表征分子的电子、拓扑、几何结构和分子形状等特征的67个分子描述符,并用于支持向量学习机对分子抗真菌活性分类模型的建立和活性预测.分别用留一法和五重交叉法对模型进行了验证.在五重交叉验证中,根据分子三维结构的相似性,首先把所研究的94个分子分成若干类,再分别从每一类中随机选择若干个分子组成若干个训练集,剩余的分子构成相应的测试集.结果表明,用上述两种验证方法得到的结果相近,且所建立的模型具有较高的预测性,交叉验证的预测正确率达到84.0%.  相似文献   

3.
与传统的非甾体类消炎药相比,选择性环氧化酶-2抑制剂具有无胃肠道粘膜损伤,溃疡和肾功能障碍等严重的副作用,设计选择性环氧化酶-2抑制剂具有重要意义。本文用支持矢量学习机和神经网络两种机器学习方法建立选择性环氧化酶-2抑制剂的活性预测模型,以期为选择性环氧化酶-2抑制剂药物的合成提供先导化合物。我们将467个环氧化酶-2抑制剂用Kennard-Stone方法分为训练集,验证集和独立测试集,对每一抑制剂分子我们计算了463个包含组成描述符和拓扑描述符的分子描述符来表征其分子结构,并通过F-Score方法选取最重要的分子描述符用于分类模型的建立。结果表明,SVM方法通过变量筛选后具有很好的预测能力,其预测正确率达到93.30%。  相似文献   

4.
为预测埃坡霉素类衍生物的抗癌活性, 定义了一套表征分子形状的描述符, 即K阶形状参数, 并计算了67个表征分子的电子、拓扑和几何结构的分子描述符. 描述符经遗传算法筛选, 用于建立基于支持向量学习机(SVM)的抗癌活性分类模型; 用留一法和5重交叉验证法对SVM模型参数进行了优化. 结果表明模型具有较高的预测性且两种方法得到相近结果, 交叉验证的预测正确率达80.6%; 经筛选后的描述符有30个, 其中含有5个K阶形状参数, 这些描述符对埃坡霉素类衍生物的抗癌活性的模型建立具有比较重要的作用.  相似文献   

5.
组蛋白去乙酰化酶(HDAC)对染色质分布和基因调节起着重要的作用,也是治疗癌症和其它疾病的新靶点.羟肟酸类抑制剂是目前研究最多的组蛋白去乙酰化酶抑制剂.应用比较分子力场(CoMFA)法对一系列磺胺基羟肟酸类HDAC抑制剂进行了结构活性关系研究,得到的模型具有较高的交叉验证系数(q2=0.704).并在此基础上,建立了非交叉验证的偏最小二乘分析(PLS)模型.用该模型对随机选择的6个化合物组成的测试集进行了预测,得到了令人满意的结果,所建模型具有良好的预测能力.本研究对于设计高活性的HDAC抑制剂及抗癌药物都有指导意义.  相似文献   

6.
构建147个有机物分子结构与其热导率值之间的定量结构-性质关系(QSPR)模型, 探讨影响有机物热导率的结构因素. 以147个化合物作为样本集, 随机选择118个作为训练集, 29个作为测试集. 应用CODESSA软件计算了组成、拓扑、几何、静电和量子化学等描述符, 通过启发式方法(HM)筛选得到5个结构参数并建立线性回归模型; 用所选5个结构参数作为支持向量机(SVM)的输入, 建立非线性的支持向量机回归模型. 预测结果表明: 支持向量机回归模型的性能(复相关系数R2=0.9240)虽略低于启发式回归模型的性能(R2=0.9267), 但是支持向量机方法预测性能(R2=0.9682)高于启发式方法的预测性能(R2=0.9574), 对于QSPR模型来说, 预测性能更重要. 因此, 总体来说支持向量机方法优于启发式方法. 支持向量机方法和启发式方法的提出为工程上提供了一种根据分子结构预测有机物热导率的新方法.  相似文献   

7.
基于支持向量机方法的HERG钾离子通道抑制剂分类模型   总被引:1,自引:0,他引:1  
对human ether-a-gō-gō related genes(HERG)钾离子通道(钾通道)抑制剂,计算了表征分子组成、电荷分布、拓扑、几何结构及物理化学性质等特征的1559个分子描述符.采用Fischer Score(F-Score)排序过滤和Monte Carlo模拟退火法相结合从中筛选与HERG钾通道抑制剂分类相关的分子描述符.采用支持向量机(SVM)方法,分别以IC50=1.0、10.0μmol·L-1为分类标准,建立了三个分类预测模型.对367个训练集分子,用五重交叉验证.得到正、负样本的平均预测精度分别为84.8%-96.6%、80.7%-97.7%,其总的平均预测精度为87.1%-97.2%,优于其它文献报道结果.对97个外部测试集分子,所建三个模型的总样本预测精度在67.0%-90.1%之间,接近或优于其它文献报道结果.  相似文献   

8.
基质金属蛋白酶-13 (MMP-13)为预防和治疗骨关节炎(OA)提供了充满希望的靶标. 通过抑制剂来阻断MMP-13的活性将会对治疗OA疾病产生潜在的作用. 然而,宽谱抑制剂同样抑制MMP家族的其它成员,特别是MMP-1,这将会导致肌与骨的综合症. 因此,设计和发现潜在的MMP-13 相对于MMP-1 的高效选择性抑制剂,在对治疗OA新型药物的研发中具有相当重要的现实意义. 本研究通过两种机器学习方法(ML):支持向量机(SVM)和随机森林(RF)来建立分类模型,用于预测不同结构的MMP-13 对MMP-1 的选择性抑制剂. 所建这些模型的预测效果都已经达到了令人满意的精度. 在这两种ML模型中,RF对于MMP-13选择性抑制剂和非抑制剂的精度分别达到97.58%和100%. 同时,与MMP-13对MMP-1的选择性抑制最相关的分子描述符也基于不同的特征选择方法被两种模型挑选出来. 最后,用预测效果最好的RF模型虚拟筛选了ZINC数据库的“fragment-like”子集,从而得到了一系列潜在的候选药物. 研究表明,机器学习方法,特别是RF方法,对于发现潜在的MMP-13选择性抑制剂十分有效. 同时还得到了一些与MMP-13的选择性抑制相关的分子描述符.  相似文献   

9.
吕巍  薛英 《物理化学学报》2011,27(6):1407-1416
在丙型肝炎病毒(HCV)的基因复制和蛋白质成熟的过程中, 非结构蛋白5B(NS5B)作为RNA依赖的RNA聚合酶起到了重要的作用. 抑制NS5B聚合酶可以阻止丙型肝炎病毒的RNA复制, 因此成为一种治疗丙型肝炎的有效方法. 通过计算机方法进行虚拟筛选和预测NS5B聚合酶抑制剂已经变得越来越重要. 本文主要采用机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5 DT))对已知的丙型肝炎病毒NS5B蛋白酶抑制剂与非抑制剂建立分类预测模型. 1248个结构多样性化合物(552个NS5B抑制剂与696个非NS5B抑制剂)被用于测试分类预测系统, 并用递归变量消除法选择与NS5B抑制剂相关的性质描述符以提高预测精度. 独立验证集的总预测精度为84.1%-85.0%, NS5B抑制剂的预测精度为81.4%-91.7%, 非NS5B抑制剂的预测精度为78.2%-87.2%. 其中支持向量机给出最好的NS5B抑制剂预测精度(91.7%); C4.5决策树给出最好的非NS5B抑制剂预测精度(87.2%); k-最近相邻法给出最好的总预测精度(85.0%). 研究表明机器学习方法可以有效预测未知数据集中潜在的NS5B抑制剂, 并有助于发现与其相关的分子描述符.  相似文献   

10.
吕巍  薛英 《物理化学学报》2010,26(2):471-477
脂肪组织中,激素敏感脂肪酶(HSL)被认为是调节脂肪酸代谢的关键限速酶.HSL在糖尿病的发病过程中起重要作用,抑制HSL活性有助于糖尿病的治疗,因此探索新颖的HSL抑制剂成为当前研究的热门.在激素敏感脂肪酶的作用机制和三维结构缺乏的情况下,需要发展预测HSL抑制剂的方法.本文采用几种机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5DT))对已知的HSL抑制剂与非抑制剂建立分类预测模型.252个结构多样性化合物(123个HSL抑制剂与129个HSL非抑制剂)被用于测试分类预测系统,并用递归变量消除法选择与HSL抑制剂相关的性质描述符以提高预测精度.本研究对独立验证集的总预测精度为75.0%-80.0%,HSL抑制剂的预测精度为85.7%-90.5%,非HSL抑制剂的预测精度为63.2%-68.4%.支持向量机方法给出最好的总预测精度(80.0%).本研究表明支持向量机等机器学习方法可以有效预测未知数据集中潜在的HSL抑制剂,并有助于发现与其相关的分子描述符.  相似文献   

11.
Fourier transform near-infrared spectrometry has been used in combination with multivariate chemometric methods for wide applications in agriculture and food analysis. In this paper, we used linear partial least square and nonlinear least square support vector machine regression methods to establish calibration models for Fourier transform near-infrared spectrometric determination of pectin in shaddock peel samples. In particular, the tunable kernel parameters of the linear and nonlinear models were set changing in a moderate range and were optimally selected in conjunction with a Savitzky–Golay smoother. The smoothing parameters and the linear/nonlinear modeling parameters were combined for simultaneous optimization. To investigate the robustness of calibration models, parameter uncertainty were estimated in a direct way for the optimal linear and nonlinear models. Our results show that the nonlinear least square support vector machine method gives more accurate predictive results and is substantially more robust compared to the spectral noise when compared with the linear partial least square regression. Furthermore, the optimized least square support vector machine model was evaluated by the randomly selected test samples and the model test effect was much satisfactory. We anticipate that these linear and nonlinear methods and the methodology of determination of model parameter uncertainty will be applied to other analytes in the fields of near-infrared or Fourier transform near-infrared spectroscopy.  相似文献   

12.
In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.  相似文献   

13.
14.
There is growing interest in the application of machine learning techniques in bioinformatics. The supervised machine learning approach has been widely applied to bioinformatics and gained a lot of success in this research area. With this learning approach researchers first develop a large training set, which is a time-consuming and costly process. Moreover, the proportion of the positive examples and negative examples in the training set may not represent the real-world data distribution, which causes concept drift. Active learning avoids these problems. Unlike most conventional learning methods where the training set used to derive the model remains static, the classifier can actively choose the training data and the size of training set increases. We introduced an algorithm for performing active learning with support vector machine and applied the algorithm to gene expression profiles of colon cancer, lung cancer, and prostate cancer samples. We compared the classification performance of active learning with that of passive learning. The results showed that employing the active learning method can achieve high accuracy and significantly reduce the need for labeled training instances. For lung cancer classification, to achieve 96% of the total positives, only 31 labeled examples were needed in active learning whereas in passive learning 174 labeled examples were required. That meant over 82% reduction was realized by active learning. In active learning the areas under the receiver operating characteristic (ROC) curves were over 0.81, while in passive learning the areas under the ROC curves were below 0.50.  相似文献   

15.
16.
17.
18.
Probabilistic support vector machine (SVM) in combination with ECFP_4 (Extended Connectivity Fingerprints) were applied to establish a druglikeness filter for molecules. Here, the World Drug Index (WDI) and the Available Chemical Directory (ACD) were used as surrogates for druglike and nondruglike molecules, respectively. Compared with published methods using the same data sets, the classifier significantly improved the prediction accuracy, especially when using a larger data set of 341 601 compounds, which further pushed the correct classification rates up to 92.73%. On the other hand, most characteristic features for drugs and nondrugs found by the current method were visualized, which might be useful as guiding fragments for de novo drug design and fragment based drug design.  相似文献   

19.
Ke Yu 《Talanta》2007,71(2):676-682
Three machine learning techniques including back propagation artificial neural network (BP-ANN), radial basis function artificial neural network (RBF-ANN) and support vector regression (SVR) were applied to predicting the peptide mobility in capillary zone electrophoresis through the development of quantitative structure-mobility relationship (QSMR) models. A data set containing 102 peptides with a large range of size, charge and hydrophobicity was used as a typical study. The optimal modeling parameters of the models were determined by grid-searching approach using 10-fold cross-validation. The predicted results were compared with that obtained by the multiple linear regression (MLR) method. The results showed that the relative standard errors (R.S.E.) of the developed models for the test set obtained by MLR, BP-ANN, RBF-ANN and SVR were 11.21%, 7.47%, 5.79% and 5.75%, respectively, while the R.S.E.s for the external validation set were 11.18%, 7.87%, 7.54% and 7.18%, respectively. The better generalization ability of the QSMR models developed by machine learning techniques over MLR was exactly presented. It was shown that the machine learning techniques were effective for developing the accurate and relaible QSMR models.  相似文献   

20.
Dual-specific tyrosine phosphorylation regulated kinase 1 (DYRK1A) has been regarded as a potential therapeutic target of neurodegenerative diseases, and considerable progress has been made in the discovery of DYRK1A inhibitors. Identification of pharmacophoric fragments provides valuable information for structure- and fragment-based design of potent and selective DYRK1A inhibitors. In this study, seven machine learning methods along with five molecular fingerprints were employed to develop qualitative classification models of DYRK1A inhibitors, which were evaluated by cross-validation, test set, and external validation set with four performance indicators of predictive classification accuracy (CA), the area under receiver operating characteristic (AUC), Matthews correlation coefficient (MCC), and balanced accuracy (BA). The PubChem fingerprint-support vector machine model (CA = 0.909, AUC = 0.933, MCC = 0.717, BA = 0.855) and PubChem fingerprint along with the artificial neural model (CA = 0.862, AUC = 0.911, MCC = 0.705, BA = 0.870) were considered as the optimal modes for training set and test set, respectively. A hybrid data balancing method SMOTETL, a combination of synthetic minority over-sampling technique (SMOTE) and Tomek link (TL) algorithms, was applied to explore the impact of balanced learning on the performance of models. Based on the frequency analysis and information gain, pharmacophoric fragments related to DYRK1A inhibition were also identified. All the results will provide theoretical supports and clues for the screening and design of novel DYRK1A inhibitors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号