首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
We propose a 6D representation of protein sequences consisting of 20 amino acids. Based on this 6D representation, we propose a proteome distance measure for constructing phylogenic tree. And we make use of the corresponding similarity matrix to construct phylogenic tree. The examination of phylogenic tree belong to 30 mitochondrial sequence illustrates the utility of our approach. © 2006 Wiley Periodicals, Inc. Int J Quantum Chem, 2007  相似文献   

4.
基于氨基酸模糊聚类分析的跨膜区域预测   总被引:2,自引:0,他引:2  
邓勇  刘琪  李亦学 《化学学报》2004,62(19):1968-1972
跨膜蛋白在进化过程中,序列保守性较差,即使是同源蛋白序列的一致性程度也较低,因而在跨膜区预测算法中,通过序列的一致性程度来选取训练集并不能有效地消除预测结果对训练集的过度适应性.本文提出了一种基于氨基酸模糊聚类分析的预测算法,通过氨基酸在各个区域分布的相似性程度进行模糊聚类,从而根据一类氨基酸的分布特性而不是各个氨基酸的分布特性进行跨膜区预测.结果表明,该方法能在一定程度上消除训练集的选取对测试结果的影响,提高跨膜蛋白拓扑结构预测的准确度,特别是提高对目前知之甚少的跨膜蛋白的预测准确度.  相似文献   

5.
Based on the chaos game representation, a 2D graphical representation of protein sequences was introduced in which the 20 amino acids are rearranged in a cyclic order according to their physicochemical properties. The Euclidean distances between the corresponding amino acids from the 2‐D graphical representations are computed to find matching (or conserved) fragments of amino acids between the two proteins. Again, the cumulative distance of the 2D‐graphical representations is defined to compare the similarity of protein. And, the examination of the similarity among sequences of the ND5 proteins of nine species shows the utility of our approach. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

6.
7.
In terms of the classification of the protein secondary structures, we propose a 2D representation of protein secondary structure sequences. The representation are used to display, analyze, and compare the secondary structure sequences. Based on this representation, we assign the structural class to the protein, and verify the advantage or disadvantage of the methods of predicted protein second structure.  相似文献   

8.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

9.
A new two-dimensional graphical representation of protein sequences is introduced. Twenty concentric evenly spaced circles divided by n radial lines into equal divisions are selected to represent any protein sequence of length n. Each circle represents one of the different 20 amino acids, and each radial line represents a single amino acid of the protein sequence. An efficient numerical method based on the graph is proposed to measure the similarity between two protein sequences. To prove the accuracy of our approach, the method is applied to NADH dehydrogenase subunit 5 (ND5) proteins of nine different species and 24 transferrin sequences from vertebrates. High values of correlation coefficient between our results and the results of ClustalW are obtained (approximately perfect correlations). These values are higher than the values obtained in many other related works.  相似文献   

10.
In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.  相似文献   

11.
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.  相似文献   

12.
13.
The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics methods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QSAR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms the predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.  相似文献   

14.
15.
Position-Specific Scoring Matrix (PSSM) is an excellent feature extraction method that was proposed early in protein classifying prediction, but within the restriction of feature shape in PSSM, researchers make a lot attempts to process it so that PSSM can be input to the traditional machine learning algorithms. These processes drop information provided by PSSM in a way thus the feature representation is limited. Moreover, the high-dimensional feature representation of PSSM makes it incompatible with other feature extraction methods. We use the PSSM as the input of Recurrent Neural Network without any post-processing, the amino acids in protein sequences are regarded as time step in RNN. This way takes full advantage of the information that PSSM provides. In this study, the PSSM is input to the model directly and the internal information of PSSM is fully utilized, we propose an end-to-end solution and achieve state-of-the-art performance. Ultimately, the exploration of how to combine PSSM with traditional feature extraction methods is carried out and achieve slightly improved performance. Our network architecture is implemented in Python and is available at https://github.com/YellowcardD/RNN-for-membrane-protein-types-prediction.  相似文献   

16.
MotivationPrimary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence.ResultsThe performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.  相似文献   

17.
18.
We consider a spectrum-like two-dimensional graphical representation of proteins based on a reduced protein model in which 20 amino acids are grouped into five classes. This particular grouping of amino acids was suggested by Riddle and co-workers in 1997. The graphical representation is based on depicting sequentially the amino acids on five horizontal lines at equal separations. One-letter codes, B, O, U, X and Y, to which numerical values 1 to 5 have been assigned, are suggested as labels for the fictional amino acids that represent all the amino acids within each group. The approach is illustrated on ND6 proteins of eight species having from 168 to 175 amino acids. While visual inspection of the novel spectral graphical representations of proteins may reveal local similarities and dissimilarities of protein sequences, arithmetic manipulations of spectra offer an elegant route to graphic visualization of the degree of similarity for selected pairs of proteins.  相似文献   

19.
During last few decades accurate determination of protein structural class using a fast and suitable computational method has been a challenging problem in protein science. In this context a meaningful representation of a protein sample plays a key role in achieving higher prediction accuracy. In this paper based on the concept of Chou's pseudo amino acid composition (Chou, K.C., 2001. Proteins 43, 246-255), a new feature representation method is introduced which is composed of the amino acid composition information, the amphiphilic correlation factors and the spectral characteristics of the protein. Thus the sample of a protein is represented by a set of discrete components which incorporate both the sequence order and the length effect. On the basis of such a statistical framework a simple radial basis function network based classifier is introduced to predict protein structural class. A set of exhaustive simulation studies demonstrates high success rate of classification using the self-consistency and jackknife test on the benchmark datasets.  相似文献   

20.
Neutralizing antibodies often recognize conformational, discontinuous epitopes. Linear peptides mimicking such conformational epitopes can be selected from phage display peptide libraries by screening with the respective antibodies. However, it is difficult to localize these "mimotopes" within the three-dimensional (3D) structures of the target proteins. Knowledge of conformational epitopes of neutralizing antibodies would help to design antigens able to elicit protective immune responses. Therefore, we provide here a software that allows to localize linear peptide sequences within 3D structures of proteins. The 3D-Epitope-Explorer (3DEX) software allows to map conformational epitopes in 3D protein structures based on an algorithm that takes into account the physicochemical neighborhood of C(alpha)- or C(beta)-atoms of individual amino acids. A given amino acid of a peptide sequence is localized within the protein and the software searches within predefined distances for the amino acids neighboring that amino acid in the peptide. Surface exposure of the amino acids can also be taken into consideration. The procedure is then repeated for the remaining amino acids of the peptide. The introduction of a joker function allows to map peptide mimotopes, which do not necessarily have 100% sequence homology to the protein. Using this software we were able to localize mimotopes selected from phage displayed peptide libraries with polyclonal antibodies from HIV-positive patient plasma within the 3D structure of gp120, the exterior glycoprotein of HIV-1. We also analyzed two recently published peptide sequences corresponding to known conformational epitopes to further confirm the integrity of 3DEX.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号