首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, the support vector machine was trained to grasp the relationship between the pair-coupled amino acid composition and the content of protein secondary structural elements, including -helix, 310-helix, π-helix, β-strand, β-bridge, turn, bend and the rest random coil. Self-consistency and cross validation tests were made to assess the performance of our method. Results superior to or competitive with the popular theoretical and experimental methods have been obtained.  相似文献   

2.
3.
Protein methylation is involved in dozens of biological processes and plays an important role in adjusting protein physicochemical properties, conformation and function. However, with the rapid increase of protein sequence entering into databanks, the gap between the number of known sequence and the number of known methylation annotation is widening rapidly. Therefore, it is vitally significant to develop a computational method for quick and accurate identification of methylation sites. In this study, a novel predictor (Methy_SVMIACO) based on support vector machine (SVM) and improved ant colony optimization algorithm (IACO) is developed to identify methylation sites. The IACO is utilized to find the optimal feature subset and parameter of SVM, while SVM is employed to perform the identification of methylation sites. Comparison of the IACO with conventional ACO shows that the IACO converges quickly toward the global optimal solution and it is more useful tool for feature selection and SVM parameter optimization. The performance of Methy_SVMIACO is evaluated with a sensitivity of 85.71%, a specificity of 86.67%, an accuracy of 86.19% and a Matthew's correlation coefficient (MCC) of 0.7238 for lysine as well as a sensitivity of 89.08%, a specificity of 94.07%, an accuracy of 91.56% and a MCC of 0.8323 for arginine in 10-fold cross-validation test. It is shown through the analysis of the optimal feature subset that some upstream and downstream residues play important role in the methylation of arginine and lysine. Compared with other existing methods, the Methy_SVMIACO provides higher Acc, Sen and Spe, indicating that the current method may serve as a powerful complementary tool to other existing approaches in this area. The Methy_SVMIACO can be acquired freely on request from the authors.  相似文献   

4.
Li-Juan Tang  Hai-Long Wu 《Talanta》2009,79(2):260-1694
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. To circumvent the problem, a new gene mining approach is proposed based on the similarity between probability density functions on each gene for the class of interest with respect to the others. This method allows the ascertainment of significant genes that are informative for discriminating each individual class rather than maximizing the separability of all classes. Then one can select genes containing important information about the particular subtypes of diseases. Based on the mined significant genes for individual classes, a support vector machine with local kernel transform is constructed for the classification of different diseases. The combination of the gene mining approach with support vector machine is demonstrated for cancer classification using two public data sets. The results reveal that significant genes are identified for each cancer, and the classification model shows satisfactory performance in training and prediction for both data sets.  相似文献   

5.
本文应用一种组合遗传算法和共轭梯度法的支持向量机(GA-CG-SVM)方法建立了药物诱导磷脂质病分类预测模型.首先对描述符进行了优化,选出了19个描述符用于模型的构建,所建模型对训练集的预测准确率为81.6%,对测试集的预测精度为87.5%,说明所建SVM分类模型不仅能正确预测训练集药物诱导的磷脂质病,也对其他化合物具...  相似文献   

6.
7.
8.
In this work, chemiluminescence (CL) behaviors of two selected phenothiazines, namely promazine and fluphenazine hydrochloride, were investigated for their simultaneous determination using oxidation of Ru(bipy)32+ by Ce4+ ions in acidic media. This method is based on the kinetic distinction of the CL reactions of fluphenazine and promazine with Ru(bipy)32+ and Ce4+ system in a sulfuric acid medium. Least square support vector regression models were constructed for relating concentrations of both compounds to their CL profiles. The parameters of the model consisting of σ2 and γ were optimized using all possible combinations of σ2 and γ to select the model with the minimum root mean square cross validation. Under optimized conditions, the univariate calibration curve was linear over the concentration ranges of 0.4-30.0 μg mL−1 and 0.07-5.0 μg mL−1 with detection limits of 0.1 μg mL−1 and 0.04 μg mL−1 for promazine and fluphenazine, respectively. The influence of potential interfering substances on the determination of promazine and fluphenazine were studied. The proposed method was used for simultaneous determination of both compounds in synthetic mixtures and in spiked human plasma.  相似文献   

9.
It is known that in the three-dimensional structure of a protein, certain amino acids can interact with each other in order to provide structural integrity or aid in its catalytic function. If these positions are mutated the loss of this interaction usually leads to a non-functional protein. Directed evolution experiments, which probe the sequence space of a protein through mutations in search for an improved variant, frequently result in such inactive sequences. In this work, we address the use of machine learning algorithms, Boolean learning and support vector machines (SVMs), to find such pairs of amino acid positions. The recombination method of imparting mutations was simulated to create in silico sequences that were used as training data for the algorithms. The two algorithms were combined together to develop an approach that weighs the structural risk as well as the empirical risk to solve the problem. This strategy was adapted to a multi-round framework of experiments where the data generated in the present round is used to design experiments for the next round to improve the generated library, as well as the estimation of the interacting positions. It is observed that this strategy can greatly improve the number of functional variants that are generated as well as the average number of mutations that can be made in the library.  相似文献   

10.
Qi Shen  Wei-Min Shi  Bao-Xian Ye 《Talanta》2007,71(4):1679-1683
In the analysis of gene expression profiles, the number of tissue samples with genes expression levels available is usually small compared with the number of genes. This can lead either to possible overfitting or even to a complete failure in analysis of microarray data. The selection of genes that are really indicative of the tissue classification concerned is becoming one of the key steps in microarray studies. In the present paper, we have combined the modified discrete particle swarm optimization (PSO) and support vector machines (SVM) for tumor classification. The modified discrete PSO is applied to select genes, while SVM is used as the classifier or the evaluator. The proposed approach is used to the microarray data of 22 normal and 40 colon tumor tissues and showed good prediction performance. It has been demonstrated that the modified PSO is a useful tool for gene selection and mining high dimension data.  相似文献   

11.
Due to degeneracy of the observed binding sites, the in silico prediction of bacterial sigma(70)-like promoters remains a challenging problem. A large number of sigma(70)-like promoters has been biologically identified in only two species, Escherichia coli and Bacillus subtilis. In this paper we investigate the issues that arise when searching for promoters in other species using an ensemble of SVM classifiers trained on E. coli promoters. DNA sequences are represented using a tagged mismatch string kernel. The major benefit of our approach is that it does not require a prior definition of the typical -35 and -10 hexamers. This gives the SVM classifiers the freedom to discover other features relevant to the prediction of promoters. We use our approach to predict sigma(A) promoters in B. subtilis and sigma(66) promoters in Chlamydia trachomatis. We extended the analysis to identify specific regulatory features of gene sets in C. trachomatis having different expression profiles. We found a strong -35 hexamer and TGN/-10 associated with a set of early expressed genes. Our analysis highlights the advantage of using TSS-PREDICT as a starting point for predicting promoters in species where few are known.  相似文献   

12.
Constraint generation for 3d structure prediction and structure-based database searches benefit from fine-grained prediction of local structure. In this work, we present LOCUSTRA, a novel scheme for the multiclass prediction of local structure that uses two layers of support vector machines (SVM). Using a 16-letter structural alphabet from de Brevern et al. (Proteins: Struct., Funct., Bioinf. 2000, 41, 271-287), we assess its prediction ability for an independent test set of 222 proteins and compare our method to three-class secondary structure prediction and direct prediction of dihedral angles. The prediction accuracy is Q16=61.0% for the 16 classes of the structural alphabet and Q3=79.2% for a simple mapping to the three secondary classes helix, sheet, and coil. We achieve a mean phi(psi) error of 24.74 degrees (38.35 degrees) and a median RMSDA (root-mean-square deviation of the (dihedral) angles) per protein chain of 52.1 degrees. These results compare favorably with related approaches. The LOCUSTRA web server is freely available to researchers at http://www.fz-juelich.de/nic/cbb/service/service.php.  相似文献   

13.
14.
15.
In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein–protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies.  相似文献   

16.
Understanding of proteins adaptive to hypersaline environment and identifying them is a challenging task and would help to design stable proteins. Here, we have systematically analyzed the normalized amino acid compositions of 2121 halophilic and 2400 non-halophilic proteins. The results showed that halophilic protein contained more Asp at the expense of Lys, Ile, Cys and Met, fewer small and hydrophobic residues, and showed a large excess of acidic over basic amino acids. Then, we introduce a support vector machine method to discriminate the halophilic and non-halophilic proteins, by using a novel Pearson VII universal function based kernel. In the three validation check methods, it achieved an overall accuracy of 97.7%, 91.7% and 86.9% and outperformed other machine learning algorithms. We also address the influence of protein size on prediction accuracy and found the worse performance for small size proteins might be some significant residues (Cys and Lys) were missing in the proteins.  相似文献   

17.
18.
19.
The qualitative evaluation of chromatographic data in the framework of external quality assurance schemes is considered in this paper. The homogeneity in the evaluation of chromatographic data among human experts in samples with analytes close to the limit of detection of analytical methods was examined and also a Support Vector Machine (SVM) was developed as an alternative to experts for a more homogeneous and automatic evaluation. A set of 105 ion chromatograms obtained by anti-doping control laboratories was used in this study. The quality of the ion chromatograms was evaluated qualitatively by nine independent experts (associating a score from 0 to 4) and also more objectively taking into account chromatographic parameters (peak width, asymmetry, resolution and S/N ratio). Results obtained showed a high degree of variability among experts when judging ion chromatograms. Experts applying extremely outlying evaluation criteria were identified and excluded from the data used to develop the SVM. This machine was built providing the system with qualitative information (scores assigned by experts) and with objective data (parameters) of the ion chromatograms. A seven-fold cross-validation approach was used to train and to evaluate the predictive ability of the machine. According to the results obtained, the SVM developed was found to be close to the reasoning process followed by the homogeneous human expert group. This machine also could provide a scoring system to sort laboratories according to the quality of their results. The qualitative evaluation of analytical records using a scoring system allowed the identification of the main factors affecting the quality of chromatographic analytical data, such as the specific analytical technique applied and the adherence to guidelines for reporting positive results.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号