首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein methylation is involved in dozens of biological processes and plays an important role in adjusting protein physicochemical properties, conformation and function. However, with the rapid increase of protein sequence entering into databanks, the gap between the number of known sequence and the number of known methylation annotation is widening rapidly. Therefore, it is vitally significant to develop a computational method for quick and accurate identification of methylation sites. In this study, a novel predictor (Methy_SVMIACO) based on support vector machine (SVM) and improved ant colony optimization algorithm (IACO) is developed to identify methylation sites. The IACO is utilized to find the optimal feature subset and parameter of SVM, while SVM is employed to perform the identification of methylation sites. Comparison of the IACO with conventional ACO shows that the IACO converges quickly toward the global optimal solution and it is more useful tool for feature selection and SVM parameter optimization. The performance of Methy_SVMIACO is evaluated with a sensitivity of 85.71%, a specificity of 86.67%, an accuracy of 86.19% and a Matthew's correlation coefficient (MCC) of 0.7238 for lysine as well as a sensitivity of 89.08%, a specificity of 94.07%, an accuracy of 91.56% and a MCC of 0.8323 for arginine in 10-fold cross-validation test. It is shown through the analysis of the optimal feature subset that some upstream and downstream residues play important role in the methylation of arginine and lysine. Compared with other existing methods, the Methy_SVMIACO provides higher Acc, Sen and Spe, indicating that the current method may serve as a powerful complementary tool to other existing approaches in this area. The Methy_SVMIACO can be acquired freely on request from the authors.  相似文献   

2.
3.
In this paper, the support vector machine was trained to grasp the relationship between the pair-coupled amino acid composition and the content of protein secondary structural elements, including -helix, 310-helix, π-helix, β-strand, β-bridge, turn, bend and the rest random coil. Self-consistency and cross validation tests were made to assess the performance of our method. Results superior to or competitive with the popular theoretical and experimental methods have been obtained.  相似文献   

4.
In Gram-negative bacteria, a wide range of proteins are secreted by highly specialized secretion systems. These secreted proteins play essential roles in the response of bacteria to their environment and also in several physiological processes such as adhesion, pathogenicity, adaptation and survival. Therefore, identifying secreted proteins in Gram-negative bacteria may assist in understanding the secretion mechanism and development of new antimicrobial strategies. Considering that a single-feature model is less likely to comprehensively cover this information, three kinds of feature models were used in this paper to represent protein samples by composition analysis, correlation analysis and smoothing encoding method on position-specific scoring matrix profiles. A support vector machine-based ensemble method with these hybrid features was developed to predict multi-type Gram-negative bacterial secreted proteins. Finally, our method achieves overall accuracies of 97.09% and 96.51% using an independent dataset test and jackknife test on a public test dataset, which are 3.49% and 2.32% higher, respectively, than results obtained by other methods. These results show the effectiveness and stability of the proposed ensemble method. It is anticipated that our method will provide useful information for further research on bacterial secreted proteins and secreted systems.  相似文献   

5.
Protein–peptide interactions are essential for all cellular processes including DNA repair, replication, gene‐expression, and metabolism. As most protein – peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein – peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine‐learning method called SPRINT to make Sequence‐based prediction of Protein – peptide Residue‐level Interactions. SPRINT yields a robust and consistent performance for 10‐fold cross validations and independent test. The most important feature is evolution‐generated sequence profiles. For the test set (1056 binding and non‐binding residues), it yields a Matthews’ Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence‐based technique shows comparable or more accurate than structure‐based methods for peptide‐binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/ . © 2016 Wiley Periodicals, Inc.  相似文献   

6.
本文应用一种组合遗传算法和共轭梯度法的支持向量机(GA-CG-SVM)方法建立了药物诱导磷脂质病分类预测模型.首先对描述符进行了优化,选出了19个描述符用于模型的构建,所建模型对训练集的预测准确率为81.6%,对测试集的预测精度为87.5%,说明所建SVM分类模型不仅能正确预测训练集药物诱导的磷脂质病,也对其他化合物具...  相似文献   

7.
差分拉曼光谱结合SVM对便签纸的鉴别分析   总被引:1,自引:0,他引:1  
刘津彤  张岚泽  姜红  陈相全  段斌  刘峰 《化学通报》2022,85(2):259-263,246
基于差分拉曼光谱技术与支持向量机(SVM)模型,提出了一种对便签纸类检材的快速可视化鉴别方法。实验获取了40组不同品牌便签纸样本的差分拉曼光谱数据,利用BP神经网络和差分技术完成谱图的除噪与基线校正后,借助F检验与主成分分析提取谱段信息,构建出SVM分类模型。实验结果表明,当设置Linear为SVM模型的核函数时,可以实现对样本测试集的完全准确划分,K折交叉验证的结果理想。相比于传统聚类分析手段,本方法可以在原始高维光谱数据中筛选出有效特征矩阵,且SVM模型兼具高效性和准确性,为公安实践中纸张类物证的区分鉴别提供一种新思路。  相似文献   

8.
9.
Malonylation is a recently discovered post‐translational modification (PTM) in which a malonyl group attaches to a lysine (K) amino acid residue of a protein. In this work, a novel machine learning model, SPRINT‐Mal, is developed to predict malonylation sites by employing sequence and predicted structural features. Evolutionary information and physicochemical properties are found to be the two most discriminative features whereas a structural feature called half‐sphere exposure provides additional improvement to the prediction performance. SPRINT‐Mal trained on mouse data yields robust performance for 10‐fold cross validation and independent test set with Area Under the Curve (AUC) values of 0.74 and 0.76 and Matthews’ Correlation Coefficient (MCC) of 0.213 and 0.20, respectively. Moreover, SPRINT‐Mal achieved comparable performance when testing on H. sapiens proteins without species‐specific training but not in bacterium S. erythraea. This suggests similar underlying physicochemical mechanisms between mouse and human but not between mouse and bacterium. SPRINT‐Mal is freely available as an online server at: http://sparks-lab.org/server/SPRINT-Mal/ . © 2018 Wiley Periodicals, Inc.  相似文献   

10.
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg.  相似文献   

11.
By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided.  相似文献   

12.
13.
14.
研究了基于统计学习理论的支持向量机(SVM)回归法在X射线荧光光谱定量分析中的应用。以39个农田土壤样品作为实验材料,以其中32个土壤样品作为校正集,选用SVM模型中Linear、Poly和RBF 3种核函数对As元素含量与荧光光谱数据进行回归建模。用3种不同模型对预测集中7个土壤样品的As元素含量进行预测分析,结果显示模型预测As元素含量与电感耦合等离子体发射光谱法测定的As元素含量之间的相关系数R2均大于0.99,相对分析误差RPD均大于3,表明所建立的SVM模型具有较好的使用价值。为了进一步考察SVM回归模型的预测效果,同应用较成熟的PLS回归模型的预测结果进行对比,结果显示SVM法的预测结果更好,表明SVM回归模型亦可用于便携式X射线荧光光谱法的定量预测分析。  相似文献   

15.
16.
17.
Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.  相似文献   

18.
The human cytochrome P450 2B6 can metabolize a number of clinical drugs. Inhibition of CYP2B6 by coadministered multiple drugs may lead to drug–drug interactions and undesired drug toxicity. The aim of this investigation is to develop an in silico model to predict the interactions between P450 2B6 and novel inhibitors using a novel hierarchical support vector regression (HSVR) approach, which simultaneously takes into account the coverage of applicability domain (AD) and the level of predictivity. Thirty‐seven molecules were deliberately selected and rigorously scrutinized from the literature data, of which 26 and 11 molecules were treated as the training set and the test set to generate the models and to validate the generated models, respectively. The generated HSVR model gave rise to an r2 value of 0.97 for observed versus predicted pKm values for the training set, a q2 value of 0.93 by the 10‐fold cross‐validation, and an r2 value of 0.82 for the test set. Additionally, the predicted results show that the HSVR model outperformed the individual local models, the global model, and the consensus model. Thus, this HSVR model provides an accurate tool for the prediction of human cytochrome P450 2B6‐substrate interactions and can be utilized as a primary filter to eliminate the potential selective inhibitor of CYP2B6. © 2008 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

19.
Total 200 properties related to structural characteristics were employed to represent structures of 400 HA coded proteins of influenza virus as training samples. Some recognition models for HA proteins of avian influenza virus (AIV) were developed using support vector machine (SVM) and linear discriminant analysis (LDA). The results obtained from LDA are as follows: the identification accuracy (Ria) for training samples is 99.8% and Ria by leave one out cross validation is 99.5%. Both Ria of 99.8% for training samples and Ria of 99.3% by leave one out cross validation are obtained using SVM model, respectively. External 200 HA proteins of influenza virus were used to validate the external predictive power of the resulting model. The external Ria for them is 95.5% by LDA and 96.5% by SVM, respectively, which shows that HA proteins of AIVs are preferably recognized by SVM and LDA, and the performances by SVM are superior to those by LDA.  相似文献   

20.
The rapid identification of pathogens is crucial in controlling the food quality and safety.The proposed system for the rapid and label-free identification of pathogens is based on the principle of laser scattering from the bacterial microbes.The clinical prototype consists of three parts:the laser beam,photodetectors,and the data acquisition system.The bacterial testing sample was mixed with 10 mL distilled water and placed inside the machine chamber.When the bacterial microbes pass by the lase...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号