首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Optimally weighted fuzzy k-nearest neighbors (OWFKNN) algorithm has been used to predict proteins' subcellular locations based on their amino acid composition, in this paper. The datasets used consists of two species which are 997 prokaryotic and 2427 eukaryotic protein sequences. The overall prediction accuracy achieved is about 88.5% for prokaryotic sequences and 86.2% for eukaryotic sequences in a jackknife test. Compared to other algorithms developed for the prediction of protein subcellular location, OWFKNN gives very satisfying results. Therefore, OWFKNN can be used as an alternative method to predict protein localization.  相似文献   

2.
The subcellular location of a protein is closely correlated with it biological function. In this paper, two new pattern classification methods termed as Nearest Feature Line (NFL) and Tunable Nearest Neighbor (TNN) have been introduced to predict the subcellular location of proteins based on their amino acid composition alone. The simulation experiments were performed with the jackknife test on a previously constructed data set, which consists of 2,427 eukaryotic and 997 prokaryotic proteins. All protein sequences in the data set fall into four eukaryotic subcellular locations and three prokaryotic subcellular locations. The NFL classifier reached the total prediction accuracies of 82.5% for the eukaryotic proteins and 91.0% for the prokaryotic proteins. The TNN classifier reached the total prediction accuracies of 83.6 and 92.2%, respectively. It is clear that high prediction accuracies have been achieved. Compared with Support Vector Machine (SVM) and Nearest Neighbor methods, these two methods display similar or even higher prediction accuracies. Hence, we conclude that NFL and TNN can be used as complementary methods for prediction of protein subcellular locations.  相似文献   

3.
Predicting the location where a protein resides within a cell is important in cell biology. Computational approaches to this issue have attracted more and more attentions from the community of biomedicine. Among the protein features used to predict the subcellular localization of proteins, the feature derived from Gene Ontology (GO) has been shown to be superior to others. However, most of the sights in this field are set on the presence or absence of some predefined GO terms. We proposed a method to derive information from the intrinsic structure of the GO graph. The feature vector was constructed with each element in it representing the information content of the GO term annotating to a protein investigated, and the support vector machines was used as classifier to test our extracted features. Evaluation experiments were conducted on three protein datasets and the results show that our method can enhance eukaryotic and human subcellular location prediction accuracy by up to 1.1% better than previous studies that also used GO-based features. Especially in the scenario where the cellular component annotation is absent, our method can achieved satisfied results with an overall accuracy of more than 87%.  相似文献   

4.
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.  相似文献   

5.
6.
Apoptosis proteins play an essential role in the development and homeostasis of an organism. The accurate prediction of subcellular location for apoptosis proteins is helpful for understanding the mechanism of programmed cell death and their biological functions. In this article, a new apoptosis proteins localization algorithm, named PSSP, is proposed based on the predicted cleavage sites of primary protein sequences. First, protein chains are divided into N‐terminal signal parts and mature protein parts according to their predicted cleavage sites by SignalP. Then, amino acid composition (ACC) of the individual subsequence together with pseudo‐ACC and stereochemical properties of whole chain were extracted to represent a given protein sequence. Jackknife test by support vector machine on three broadly used datasets (ZD98, ZW225, and CL317 datasets) of apoptosis proteins demonstrated that the total accuracies by this approach are 93.9, 87.6, and 91.5%, respectively. In addition, an independent nonapoptosis benchmark dataset (NNPSL) was also used to evaluate the performance of this method, and predictive accuracies for eukaryotic and prokaryotic proteins are also comparable to existing methods. © 2013 Wiley Periodicals, Inc.  相似文献   

7.
Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.  相似文献   

8.
A new computational method (EpiDock) is proposed for predicting peptide binding to class I MHC proteins, from the amino acid sequence of any protein of immunological interest. Starting from the primary structure of the target protein, individual three-dimensional structures of all possible MHC-peptide (8-, 9- and 10-mers) complexes are obtained by homology modelling. A free energy scoring function (Fresno) is then used to predict the absolute binding free energy of all possible peptides to the class I MHC restriction protein. Assuming that immunodominant epitopes are usually found among the top MHC binders, the method can thus be applied to predict the location of immunogenic peptides on the sequence of the protein target. When applied to the prediction of HLA-A*0201-restricted T-cell epitopes from the Hepatitis B virus, EpiDock was able to recover 92% of known high affinity binders and 80% of known epitopes within a filtered subset of all possible nonapeptides corresponding to about one tenth of the full theoretical list.The proposed method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture. As it requires very few starting experimental data, EpiDock can be used: (i) to predict potential T-cell epitopes from viral genomes (ii) to roughly predict still unknown peptide binding motifs for novel class I MHC alleles.  相似文献   

9.
The structural class is an important feature widely used to characterize the overall folding type of a protein. How to improve the prediction quality for protein structural classification by effectively incorporating the sequence-order effects is an important and challenging problem. Based on the concept of the pseudo amino acid composition [Chou, K. C. Proteins Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60], a novel approach for measuring the complexity of a protein sequence was introduced. The advantage by incorporating the complexity measure factor into the pseudo amino acid composition as one of its components is that it can catch the essence of the overall sequence pattern of a protein and hence more effectively reflect its sequence-order effects. It was demonstrated thru the jackknife crossvalidation test that the overall success rate by the new approach was significantly higher than those by the others. It has not escaped our notice that the introduction of the complexity measure factor can also be used to improve the prediction quality for, among many other protein attributes, subcellular localization, enzyme family class, membrane protein type, and G-protein couple receptor type.  相似文献   

10.
A quantitative conformational theory of proteins is developed that enables one to predict the native structure of a protein from its amino acid sequence. The theory is based on the following principles: (1) the spatial structure and conformational properties of a protein are predetermined by its amino acid sequence; (2) the native conformation of a protein corresponds to the free energy minimum; (3) all interactions within a protein molecule are specified as short-, mediumy-, and long-range types, interactions of different types being consistent with each other. The role of the short-, medium-, and long-range interactions in the spatial organization of a protein globule is discussed, and a step-by-step analysis of amino acid sequences with gradually increasing lengths is presented. The proposed theory is based on a semiempirical computational method that involves quantitative evaluation of all pairwise atomic interactions within a protein molecule in an aqueous medium. Examples illustrating the suggested approach are presented.  相似文献   

11.
During last few decades accurate determination of protein structural class using a fast and suitable computational method has been a challenging problem in protein science. In this context a meaningful representation of a protein sample plays a key role in achieving higher prediction accuracy. In this paper based on the concept of Chou's pseudo amino acid composition (Chou, K.C., 2001. Proteins 43, 246-255), a new feature representation method is introduced which is composed of the amino acid composition information, the amphiphilic correlation factors and the spectral characteristics of the protein. Thus the sample of a protein is represented by a set of discrete components which incorporate both the sequence order and the length effect. On the basis of such a statistical framework a simple radial basis function network based classifier is introduced to predict protein structural class. A set of exhaustive simulation studies demonstrates high success rate of classification using the self-consistency and jackknife test on the benchmark datasets.  相似文献   

12.
The proteins structure can be mainly classified into four classes: all-alpha, all-beta, alpha/beta, and alpha + beta protein according to their chain fold topologies. For the purpose of predicting the protein structural class, a new predicting algorithm, in which the increment of diversity combines with Quadratic Discriminant analysis, is presented to study and predict protein structural class. On the basis of the concept of the pseudo amino acid composition (Chou, Proteins: Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60), 400 dipeptide components and 20 amino acid composition are, respectively, selected as parameters of diversity source. Total of 204 nonhomologous proteins constructed by Chou (Chou, Biochem Biophys Res Commun 1999, 264, 216) are used for training and testing the predictive model. The predicted results by using the pseudo amino acids approach as proposed in this paper can remarkably improve the success rates, and hence the current method may play a complementary role to other existing methods for predicting protein structural classification.  相似文献   

13.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

14.
膜蛋白跨膜区段的预测分析   总被引:6,自引:0,他引:6  
将连续小波变换技术的时频局部化特点和氨基酸的疏水特性相结合,提出了一种用于预测膜蛋白跨膜区段数目和位置的新方法,以代码为1YST的膜蛋白为例,对小波尺度和疏水值的种类进行了选择,同时描述了该法对跨膜螺旋区数目和位置的预测分析过程.从膜蛋白数据库中随机抽取36个蛋白质(含跨膜螺旋区232)作为测试集,采用该方法对其跨膜螺旋区进行预测,其中222个跨膜螺旋区能被准确预测,准确率为96.1%.结果表明,该法具有较高的预测准确性.  相似文献   

15.
Precise information about protein locations in a cell facilitates in the understanding of the function of a protein and its interaction in the cellular environment. This information further helps in the study of the specific metabolic pathways and other biological processes. We propose an ensemble approach called "CE-PLoc" for predicting subcellular locations based on fusion of individual classifiers. The proposed approach utilizes features obtained from both dipeptide composition (DC) and amphiphilic pseudo amino acid composition (PseAAC) based feature extraction strategies. Different feature spaces are obtained by varying the dimensionality using PseAAC for a selected base learner. The performance of the individual learning mechanisms such as support vector machine, nearest neighbor, probabilistic neural network, covariant discriminant, which are trained using PseAAC based features is first analyzed. Classifiers are developed using same learning mechanism but trained on PseAAC based feature spaces of varying dimensions. These classifiers are combined through voting strategy and an improvement in prediction performance is achieved. Prediction performance is further enhanced by developing CE-PLoc through the combination of different learning mechanisms trained on both DC based feature space and PseAAC based feature spaces of varying dimensions. The predictive performance of proposed CE-PLoc is evaluated for two benchmark datasets of protein subcellular locations using accuracy, MCC, and Q-statistics. Using the jackknife test, prediction accuracies of 81.47 and 83.99% are obtained for 12 and 14 subcellular locations datasets, respectively. In case of independent dataset test, prediction accuracies are 87.04 and 87.33% for 12 and 14 class datasets, respectively.  相似文献   

16.
On the basis of information on the evolution of the 20 amino acids and their physiochemical characteristics, we propose a new two-dimensional (2D) graphical representation of protein sequences in this article. By this representation method, we use 2D data to represent three-dimensional information constructed by the amino acids' evolution index, the class information of amino acid based on physiochemical characteristics, and the order of the amino acids appearing in the protein sequences. Then, using discrete Fourier transform, the sequence signals with different lengths can be transformed to the frequency domain, in which the sequences are with the same length. A new method is used to analyze the protein sequence similarity and to predict the protein structural class. The experiments indicate that our method is effective and useful.  相似文献   

17.
Although there are many techniques available for the analysis of amino acids, deproteinization is still one of the major problems in the analysis of amino acids in physiological fluids. The method used to prepare the plasma and to remove the plasma protein has a marked effect on the final results. The most widely used method of deproteinization is precipitation with 5-sulphosalicyclic acid followed by centrifugation to remove the precipitated protein. We have not had success in using this deproteinization agent for the analysis of plasma amino acids by a high-performance liquid chromatographic method with automatic pre-column o-phthaldialdehyde-3-mercaptopropionic acid and 9-fluorenylmethyl chloroformate derivatization because of the adverse effect of the sulphosalicyclic acid supernatant on the quantitation and separation. Ultrafiltration was used as an alternative method for the preparation of plasma samples in this experiment. The results were satisfactory for the analysis of plasma amino acids in 1500 samples during a period of four years. Some factors that might influence the results of the ultrafiltration were investigated.  相似文献   

18.
基于定量结构-性质关系方法预测氨基酸的比旋光度   总被引:1,自引:0,他引:1  
运用定量结构-性质关系(quantitative structure-property relationsh ip,QSPR)方法对人体必需的氨基酸比旋光度进行了预测,同时运用交互检验(LOO)方法对所建立的模型进行了检验。应用启发式算法对描述符进行筛选并建立线性回归模型,所建立的模型的相关系数(R2)为0.918,但分别用 1、-1代表左旋和右旋的分子手性后,重新建立多元线性回归模型,其相关系数(R2)变为0.970。本研究所建立的QSPR模型为预测手性化合物比旋光度提供了一种有效的新方法。  相似文献   

19.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

20.
This is the continuation of our studies to use very basic information on enzyme to predict optimal reaction parameters in enzymatic reactions because the gap between available enzyme sequences and their available reaction parameters is widening. In this study, 23 features selected from 540 plus features of individual amino acid as well as a feature combined whole protein information were screened as independents in a 20-1 feedforward backpropagation neural network for predicting optimal pH in beta-glucosidase’s hydrolytic reaction because this enzyme drew attention recently due to its role in biofuel industry. The results show that 11 features can be used as independents for the prediction, while the feature of amino acid distribution probability works better than the rest independents for the prediction. Our study paves a way to predict the optimal reaction parameters of enzymes based on the amino acid features of enzyme sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号