首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Prediction of protein folding rates from amino acid sequences is one of the most important challenges in molecular biology. In this work, I have related the protein folding rates with physical-chemical, energetic and conformational properties of amino acid residues. I found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and folding rates of two- and three-state proteins, indicating the importance of native state topology in determining the protein folding rates. I have formulated a simple linear regression model for predicting the protein folding rates from amino acid sequences along with structural class information and obtained an excellent agreement between predicted and experimentally observed folding rates of proteins; the correlation coefficients are 0.99, 0.96 and 0.95, respectively, for all-alpha, all-beta and mixed class proteins. This is the first available method, which is capable of predicting the protein folding rates just from the amino acid sequence with the aid of generic amino acid properties and structural class information.  相似文献   

2.
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.  相似文献   

3.
Understanding the relationship between amino acid sequences and folding rate of proteins is a challenging task similar to protein folding problem. In this work, we have analyzed the relative importance of protein sequence and structure for predicting the protein folding rates in terms of amino acid properties and contact distances, respectively. We found that the parameters derived with protein sequence (physical-chemical, energetic, and conformational properties of amino acid residues) show very weak correlation (|r| < 0.39) with folding rates of 28 two-state proteins, indicating that the sequence information alone is not sufficient to understand the folding rates of two-state proteins. However, the maximum positive correlation obtained for the properties, number of medium-range contacts, and alpha-helical tendency reveals the importance of local interactions to initiate protein folding. On the other hand, a remarkable correlation (r varies from -0.74 to -0.88) has been obtained between structural parameters (contact order, long-range order, and total contact distance) and protein folding rates. Further, we found that the secondary structure content and solvent accessibility play a marginal role in determining the folding rates of two-state proteins. Multiple regression analysis carried out with the combination of three properties, beta-strand tendency, enthalpy change, and total contact distance improved the correlation to 0.92 with protein folding rates. The relative importance of existing methods along with multiple-regression model proposed in this work will be discussed. Our results demonstrate that the native-state topology is the major determinant for the folding rates of two-state proteins.  相似文献   

4.
Prediction of protein folding rate change upon amino acid substitution is an important and challenging problem in protein folding kinetics and design. In this work, we have analyzed the relationship between amino acid properties and folding rate change upon mutation. Our analysis showed that the correlation is not significant with any of the studied properties in a dataset of 476 mutants. Further, we have classified the mutants based on their locations in different secondary structures and solvent accessibility. For each category, we have selected a specific combination of amino acid properties using genetic algorithm and developed a prediction scheme based on quadratic regression models for predicting the folding rate change upon mutation. Our results showed a 10-fold cross validation correlation of 0.72 between experimental and predicted change in protein folding rates. The correlation is 0.73, 0.65 and 0.79, respectively in strand, helix and coil segments. The method has been further tested with an extended dataset of 621 mutants and a blind dataset of 62 mutants, and we observed a good agreement with experiments. We have developed a web server for predicting the folding rate change upon mutation and it is available at .  相似文献   

5.
6.
The ability to predict protein folding rates constitutes an important step in understanding the overall folding mechanisms. Although many of the prediction methods are structure based, successful predictions can also be obtained from the sequence. We developed a novel method called prediction of protein folding rates (PPFR), for the prediction of protein folding rates from protein sequences. PPFR implements a linear regression model for each of the mainstream folding dynamics including two-, multi-, and mixed-state proteins. The proposed method provides predictions characterized by strong correlations with the experimental folding rates, which equal 0.87 for the two- and multistate proteins and 0.82 for the mixed-state proteins, when evaluated with out-of-sample jackknife test. Based on in-sample and out-of-sample tests, the PPFR's predictions are shown to be better than most of other sequence only and structure-based predictors and complementary to the predictions of the most recent sequence-based QRSM method. We show that simultaneous incorporation of several characteristics, including the sequence, physiochemical properties of residues, and predicted secondary structure provides improved quality. This hybridized prediction model was analyzed to reveal the complementary factors that can be used in tandem to predict folding rates. We show that bigger proteins require more time for folding, higher helical and coil content and the presence of Phe, Asn, and Gln may accelerate the folding process, the inclusion of Ile, Val, Thr, and Ser may slow down the folding process, and for the two-state proteins increased beta-strand content may decelerate the folding process. Finally, PPFR provides strong correlation when predicting sequences with low similarity.  相似文献   

7.
The function of eukaryotic protein is closely correlated with its subcellular location. The number of newly found protein sequences entering into data banks is rapidly increasing with the success of human genome project. It is highly desirable to predict a protein subcellular automatically from its amino acid sequence. In this paper, amino acid hydrophobic patterns and average power-spectral density (APSD) are introduced to define pseudo amino acid composition. The covariant-discriminant predictor is used to predict subcellular location. Immune-genetic algorithm (IGA) is used to find the fittest weight factors which are very important in this method. As such, high success rates are obtained by both self-consistency test (86%) and jackknife test (73%). More than 80% predictive accuracy is achieved in independent dataset test. The results demonstrate that the proposed method is practical. And, the method illuminates that the protein subcellular location can be predicted from its surface physio-chemical characteristic of protein folding.  相似文献   

8.
Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate and reliable prediction of protein domain linkers and boundaries is often considered to be the initial step of protein tertiary structure and function predictions. In this paper, we introduce CISA as a method for predicting inter-domain linker regions solely from the amino acid sequence information. The method first computes the amino acid compositional index from the protein sequence dataset of domain-linker segments and the amino acid composition. A preference profile is then generated by calculating the average compositional index values along the amino acid sequence using a sliding window. Finally, the protein sequence is segmented into intervals and a simulated annealing algorithm is employed to enhance the prediction by finding the optimal threshold value for each segment that separates domains from inter-domain linkers. The method was tested on two standard protein datasets and showed considerable improvement over the state-of-the-art domain linker prediction methods.  相似文献   

9.
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.  相似文献   

10.
Supersecondary structures (SSSs) are the building blocks of protein 3D structures. Accurate prediction of SSSs can be one important step toward building a tertiary structure from the specified secondary structure. How to improve the accuracy of prediction of SSSs by effectively incorporating the sequence order effects is an important and challenging problem. Based on a different form of Chou's pseudo amino acid composition, a novel approach for feature representation of SSSs is proposed. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are incorporated to represent the compositional features of proteins. Each supersecondary structural motif is characterized as a vector of 36 dimensions. In addition, we propose a novel prediction system by using SVM and IDQD algorithm as classifiers. Our method is trained and tested on ArchDB40 dataset containing 3088 proteins. The highest overall accuracy for the training dataset and the independent testing dataset are 77.7 and 69.4%, respectively. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

11.
Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important problem both for detecting outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have systematically analyzed the distribution of amino acid residues in the sequences of globular and outer membrane proteins. We observed that the occurrence of two neighboring aliphatic and polar residues is significantly higher in outer membrane proteins than in globular proteins. From the information about the dipeptide composition we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 95% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 87%. These accuracy levels are comparable to other methods in the literature. The influence of protein size and structural class for discrimination is discussed.  相似文献   

12.
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.  相似文献   

13.
Proteins carry out the most important and difficult tasks in all living organisms. To do so, they must often interact specifically with other small and large molecules. This requires that they fold to a globular conformation with a unique active site that is used for the specific interaction. Consequently, protein folding can be regarded as the “secret of life”. Biochemists and chemists have a great interest in elucidating the mechanism by which proteins fold and in predicting the folded conformation and its stability given just the amino acid sequence. This challenge is sometimes called the “protein folding problem”. The ability to construct proteins differing in sequence by one or more amino acids and to analyze their three-dimensional structures by X-ray crystallography and NMR spectroscopy is a powerful tool for investigating the conformational stability and folding of proteins. Several proteins are now under intensive study by this approach. One of these is ribonuclease T1.  相似文献   

14.
Two-dimensional electrophoretic separation and immobilization of proteins onto inert membranes for subsequent amino acid sequence and amino acid composition analysis is described as a rapid procedure for the identification or characterization of proteins from complex mixtures. This method avoids the drawbacks of classical purification and isolation methods which involve time-consuming operations with low resolution and, often, insufficient yields. Excellent overall yields of minor amounts (in the low microgram range) using this method allow for sequence determination of yet inaccessible proteins. Solubilized cell proteins of mouse brain were separated by high resolution two-dimensional electrophoresis and electroblotted onto a siliconized glass fiber membrane. The immobilized proteins were stained with Coomassie Brilliant Blue R-250, and twelve proteins spots were then submitted to both Edman degradation and amino acid analysis. Proteins were identified by comparison of the experimentally determined amino acid composition with a dataset derived from the Protein Identification Resource (PIR) protein sequence database. Eight out of twelve proteins tested were identified by amino acid analysis and confirmed by N-terminal sequence determination.  相似文献   

15.
折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠速率与序列长度、中短程关联特征、三联体残基组份特征等密切相关.  相似文献   

16.
Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.  相似文献   

17.
In the last few decades, development of novel experimental techniques, such as new types of disulfide (SS)-forming reagents and genetic and chemical technologies for synthesizing designed artificial proteins, is opening a new realm of the oxidative folding study where peptides and proteins can be folded under physiologically more relevant conditions. In this review, after a brief overview of the historical and physicochemical background of oxidative protein folding study, recently revealed folding pathways of several representative peptides and proteins are summarized, including those having two, three, or four SS bonds in the native state, as well as those with odd Cys residues or consisting of two peptide chains. Comparison of the updated pathways with those reported in the early years has revealed the flexible nature of the protein folding pathways. The significantly different pathways characterized for hen-egg white lysozyme and bovine milk α-lactalbumin, which belong to the same protein superfamily, suggest that the information of protein folding pathways, not only the native folded structure, is encoded in the amino acid sequence. The application of the flexible pathways of peptides and proteins to the engineering of folded three-dimensional structures is an interesting and important issue in the new realm of the current oxidative protein folding study.  相似文献   

18.
Prediction of transmembrane beta-strands in outer membrane proteins (OMP) is one of the important problems in computational chemistry and biology. In this work, we propose a method based on neural networks for identifying the membrane-spanning beta-strands. We introduce the concept of "residue probability" for assigning residues in transmembrane beta-strand segments. The performance of our method is evaluated with single-residue accuracy, correlation, specificity, and sensitivity. Our predicted segments show a good agreement with experimental observations with an accuracy level of 73% solely from amino acid sequence information. Further, the predictive power of N- and C-terminal residues in each segments, number of segments in each protein, and the influence of cutoff probability for identifying membrane-spanning beta-strands will be discussed. We have developed a Web server for predicting the transmembrane beta-strands from the amino acid sequence, and the prediction results are available at http://psfs.cbrc.jp/tmbeta-net/.  相似文献   

19.
It is important to establish whether a recombinant protein is an authentic copy of the predicted cDNA sequence. In this study, recombinant protein for native peptidyl prolyl cis-trans isomerase (N-PPIase) and double-labeled (13C- and 15N-) protein (DL-PPIase) appeared on the sodium dodecyl sulfate (SDS) electropherograms as two bands for N-PPIase and four bands for DL-PPIase. Since the N-terminal amino acid residues of all bands were the same, we characterized these bands using the peptide mapping method and amino acid composition analysis. Peptide mapping of the proteins seemed to be almost identical but they could not reflect the whole amino acid sequences of the protein. The bands on the polyvinylidene difluoride (PVDF) membrane, electroblotted after SDS-polyacrylamide gel electrophoresis (SDS-PAGE), were hydrolyzed and their amino acid composition was analyzed using a highly sensitive 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC) amino acid analysis and compared with the cDNA sequences for proteins. The matching score (sigma(T%-E%)2) for similarity of proteins was calculated by summation of the square difference between the theoretical (T%) and the experimental (E%) amino acid composition of the recombinant protein. The amino acid composition of all bands of both proteins showed more than 93% of the theoretical values. The major molecular weights of both proteins were 16812 and 17694 by electrospray ionization (ESI)-mass spectrometry. However, the purified proteins also contained minor compounds with Mr of 3721 for N-PPIase and 5285 for DL-PPIase. These compounds were considered to be nonpeptidyl products that comigrated with the protein. Similarities of the amino acid composition of the four bands were more than 98%. Our results indicate that AQC amino acid analysis is the most suitable method for characterization of a recombinant protein.  相似文献   

20.
The proteins structure can be mainly classified into four classes: all-alpha, all-beta, alpha/beta, and alpha + beta protein according to their chain fold topologies. For the purpose of predicting the protein structural class, a new predicting algorithm, in which the increment of diversity combines with Quadratic Discriminant analysis, is presented to study and predict protein structural class. On the basis of the concept of the pseudo amino acid composition (Chou, Proteins: Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60), 400 dipeptide components and 20 amino acid composition are, respectively, selected as parameters of diversity source. Total of 204 nonhomologous proteins constructed by Chou (Chou, Biochem Biophys Res Commun 1999, 264, 216) are used for training and testing the predictive model. The predicted results by using the pseudo amino acids approach as proposed in this paper can remarkably improve the success rates, and hence the current method may play a complementary role to other existing methods for predicting protein structural classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号