首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
De novo and inverse folding predictions of protein structure and dynamics   总被引:6,自引:0,他引:6  
Summary In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered β-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including β-hairpins, helical hairpins and α/β/α fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3–4 ? from native. Furthermore, the de novo algorithms can asses the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorthhm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem.  相似文献   

2.
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.  相似文献   

3.
A computational model, IMP-TYPE, is proposed for the classification of five types of integral membrane proteins from protein sequence. The proposed model aims not only at providing accurate predictions but most importantly it incorporates interesting and transparent biological patterns. When contrasted with the best-performing existing models, IMP-TYPE reduces the error rates of these methods by 19 and 34% for two out-of-sample tests performed on benchmark datasets. Our empirical evaluations also show that the proposed method provides even bigger improvements, i.e., 29 and 45% error rate reductions, when predictions are performed for sequences that share low (40%) identity with sequences from the training dataset. We also show that IMP-TYPE can be used in a standalone mode, i.e., it duplicates significant majority of correct predictions provided by other leading methods, while providing additional correct predictions which are incorrectly classified by the other methods. Our method computes predictions using a Support Vector Machine classifier that takes feature-based encoded sequence as its input. The input feature set includes hydrophobic AA pairs, which were selected by utilizing a consensus of three feature selection algorithms. The hydrophobic residues that build up the AA pairs used by our method are shown to be associated with the formation of transmembrane helices in a few recent studies concerning integral membrane proteins. Our study also indicates that Met and Phe display a certain degree of hydrophobicity, which may be more crucial than their polarity or aromaticity when they occur in the transmembrane segments. This conclusion is supported by a recent study on potential of mean force for membrane protein folding and a study of scales for membrane propensity of amino acids.  相似文献   

4.
折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠速率与序列长度、中短程关联特征、三联体残基组份特征等密切相关.  相似文献   

5.
Despite the recent advances in the prediction of protein structures by deep neutral networks, the elucidation of protein-folding mechanisms remains challenging. A promising theory for describing protein folding is a coarse-grained statistical mechanical model called the Wako-Saitô-Muñoz-Eaton (WSME) model. The model can calculate the free-energy landscapes of proteins based on a three-dimensional structure with low computational complexity, thereby providing a comprehensive understanding of the folding pathways and the structure and stability of the intermediates and transition states involved in the folding reaction. In this review, we summarize previous and recent studies on protein folding and dynamics performed using the WSME model and discuss future challenges and prospects. The WSME model successfully predicted the folding mechanisms of small single-domain proteins and the effects of amino-acid substitutions on protein stability and folding in a manner that was consistent with experimental results. Furthermore, extended versions of the WSME model were applied to predict the folding mechanisms of multi-domain proteins and the conformational changes associated with protein function. Thus, the WSME model may contribute significantly to solving the protein-folding problem and is expected to be useful for predicting protein folding, stability, and dynamics in basic research and in industrial and medical applications.  相似文献   

6.
One of the most important challenges in computational and molecular biology is to understand the relationship between amino acid sequences and the folding rates of proteins. Recent works suggest that topological parameters, amino acid properties, chain length and the composition index relate well with protein folding rates, however, sequence order information has seldom been considered as a property for predicting protein folding rates. In this study, amino acid sequence order was used to derive an effective method, based on an extended version of the pseudo-amino acid composition, for predicting protein folding rates without any explicit structural information. Using the jackknife cross validation test, the method was demonstrated on the largest dataset (99 proteins) reported. The method was found to provide a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.81 (with a highly significant level) and the standard error is 2.46. The reported algorithm was found to perform better than several representative sequence-based approaches using the same dataset. The results indicate that sequence order information is an important determinant of protein folding rates.  相似文献   

7.
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.  相似文献   

8.
Prediction of protein folding rates from amino acid sequences is one of the most important challenges in molecular biology. In this work, I have related the protein folding rates with physical-chemical, energetic and conformational properties of amino acid residues. I found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and folding rates of two- and three-state proteins, indicating the importance of native state topology in determining the protein folding rates. I have formulated a simple linear regression model for predicting the protein folding rates from amino acid sequences along with structural class information and obtained an excellent agreement between predicted and experimentally observed folding rates of proteins; the correlation coefficients are 0.99, 0.96 and 0.95, respectively, for all-alpha, all-beta and mixed class proteins. This is the first available method, which is capable of predicting the protein folding rates just from the amino acid sequence with the aid of generic amino acid properties and structural class information.  相似文献   

9.
Machine learning algorithms have wide range of applications in bioinformatics and computational biology such as prediction of protein secondary structures, solvent accessibility, binding site residues in protein complexes, protein folding rates, stability of mutant proteins, and discrimination of proteins based on their structure and function. In this work, we focus on two aspects of predictions: (i) protein folding rates and (ii) stability of proteins upon mutations. We briefly introduce the concepts of protein folding rates and stability along with available databases, features for prediction methods and measures for prediction performance. Subsequently, the development of structure based parameters and their relationship with protein folding rates will be outlined. The structure based parameters are helpful to understand the physical basis for protein folding and stability. Further, basic principles of major machine learning techniques will be mentioned and their applications for predicting protein folding rates and stability of mutant proteins will be illustrated. The machine learning techniques could achieve the highest accuracy of predicting protein folding rates and stability. In essence, statistical methods and machine learning algorithms are complimenting each other for understanding and predicting protein folding rates and the stability of protein mutants. The available online resources on protein folding rates and stability will be listed.  相似文献   

10.
Protein stability, folding and unfolding rates are all determined by the multidimensional folding free energy surface, which in turn is dictated by factors such as size, structure, and amino-acid sequence. Work over the last 15 years has highlighted the role of size and 3D structure in determining folding rates, resulting in many procedures for their prediction. In contrast, unfolding rates are thought to depend on sequence specifics and be much more difficult to predict. Here we introduce a minimalist physics-based model that computes one-dimensional folding free energy surfaces using the number of aminoacids (N) and the structural class (α-helical, all-β, or α-β) as only protein-specific input. In this model N sets the overall cost in conformational entropy and the net stabilization energy, whereas the structural class defines the partitioning of the stabilization energy between local and non-local interactions. To test its predictive power, we calibrated the model empirically and implemented it into an algorithm for the PREdiction of Folding and Unfolding Rates (PREFUR). We found that PREFUR predicts the absolute folding and unfolding rates of an experimental database of 52 proteins with accuracies of ±0.7 and ±1.4 orders of magnitude, respectively (relative to experimental spans of 6 and 8 orders of magnitude). Such prediction uncertainty for proteins vastly varying in size and structure is only two-fold larger than the differences in folding (±0.34) and unfolding rates (±0.7) caused by single-point mutations. Moreover, PREFUR predicts protein stability with an accuracy of ±6.3 kJ mol(-1), relative to the 5 kJ mol(-1) average perturbation induced by single-point mutations. The remarkable performance of our simplistic model demonstrates that size and structural class are the major determinants of the folding landscapes of natural proteins, whereas sequence variability only provides the final 10-20% tuning. PREFUR is thus a powerful bioinformatic tool for the prediction of folding properties and analysis of experimental data.  相似文献   

11.
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.  相似文献   

12.
Understanding the relationship between amino acid sequences and folding rate of proteins is a challenging task similar to protein folding problem. In this work, we have analyzed the relative importance of protein sequence and structure for predicting the protein folding rates in terms of amino acid properties and contact distances, respectively. We found that the parameters derived with protein sequence (physical-chemical, energetic, and conformational properties of amino acid residues) show very weak correlation (|r| < 0.39) with folding rates of 28 two-state proteins, indicating that the sequence information alone is not sufficient to understand the folding rates of two-state proteins. However, the maximum positive correlation obtained for the properties, number of medium-range contacts, and alpha-helical tendency reveals the importance of local interactions to initiate protein folding. On the other hand, a remarkable correlation (r varies from -0.74 to -0.88) has been obtained between structural parameters (contact order, long-range order, and total contact distance) and protein folding rates. Further, we found that the secondary structure content and solvent accessibility play a marginal role in determining the folding rates of two-state proteins. Multiple regression analysis carried out with the combination of three properties, beta-strand tendency, enthalpy change, and total contact distance improved the correlation to 0.92 with protein folding rates. The relative importance of existing methods along with multiple-regression model proposed in this work will be discussed. Our results demonstrate that the native-state topology is the major determinant for the folding rates of two-state proteins.  相似文献   

13.
This paper proposed an improved simulated annealing (ISA) algorithm for protein structure optimization based on a three-dimensional AB off-lattice model. In the algorithm, we provided a general formula used for producing initial solution, and designed a multivariable disturbance term, relating to the parameters of simulated annealing and a tuned constant, to generate neighborhood solution. To avoid missing optimal solution, storage operation was performed in searching process. We applied the algorithm to test artificial protein sequences from literature and constructed a benchmark dataset consisting of 10 real protein sequences from the Protein Data Bank (PDB). Otherwise, we generated Cα space-filling model to represent protein folding conformation. The results indicate our algorithm outperforms the five methods before in searching lower energies of artificial protein sequences. In the testing on real proteins, our method can achieve the energy conformations with Cα-RMSD less than 3.0 Å from the PDB structures. Moreover, Cα space-filling model may simulate dynamic change of protein folding conformation at atomic level.  相似文献   

14.
Secondary structure motifs in nucleic acid probes generally impair intended hybridization reactions and so efforts to predict and avoid such structures are commonly employed in probe design schemes. Another key facet of probe design that has received much less attention, however, is that secondary structure at targeted probe binding site regions may also impair hybridization. Thus, evaluation of both probe and target site secondary structures together should improve hybridization prediction and design effectiveness. Several challenges confound this goal, including imperfect empirical rules and parameters underlying predictions and the fact that folding algorithms scale poorly with respect to sequence length. Here, we attempt to quantify the consequences of target site structure on predicted hybridization using sequences sampled from the human genome. We also provide a methodology for choosing a reasonable “window size” around target sites that is as small as possible without compromising folding algorithm prediction accuracy.  相似文献   

15.
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws.  相似文献   

16.
As for many intrinsically disordered proteins, order–disorder transitions in the N‐terminal oligomerization domain of the multifunctional nucleolar protein nucleophosmin (Npm‐N) are central to its function, with phosphorylation and partner binding acting as regulatory switches. However, the mechanism of this transition and its regulation remain poorly understood. In this study, single‐molecule and ensemble experiments revealed pathways with alternative sequences of folding and assembly steps for Npm‐N. Pathways could be switched by altering the ionic strength. Phosphorylation resulted in pathway‐specific effects, and decoupled folding and assembly steps to facilitate disorder. Conversely, binding to a physiological partner locked Npm‐N in ordered pentamers and counteracted the effects of phosphorylation. The mechanistic plasticity found in the Npm‐N order–disorder transition enabled a complex interplay of phosphorylation and partner‐binding steps to modulate its folding landscape.  相似文献   

17.
The funneled energy landscape theory implies that protein structures are minimally frustrated. Yet, because of the divergent demands between folding and function, regions of frustrated patterns are present at the active site of proteins. To understand the effects of such local frustration in dictating the energy landscape of proteins, here we compare the folding mechanisms of the two alternative spliced forms of a PDZ domain (PDZ2 and PDZ2as) that share a nearly identical sequence and structure, while displaying different frustration patterns. The analysis, based on the kinetic characterization of a large number of site‐directed mutants, reveals that although the late stages for folding are very robust and biased by native topology, the early stages are more malleable and dominated by local frustration. The results are briefly discussed in the context of the energy‐landscape theory.  相似文献   

18.
Only a vanishingly small proportion of the almost infinite number of possible proteins occur in nature. Can this remaining potential of structural and functional diversity be used in the construction of new proteins? Is a “second evolution” of proteins and enzymes about to occur? These questions have suddenly become of interest because the recombinant DNA technique allows the synthesis of any given amino acid sequence. Examples of enzyme models demonstrate clearly that the unusual catalytic properties of enzymes are associated with the presence of a specifically folded polypeptide chain which has a complex three-dimensional form. The critical hurdle in the path of artificial proteins is thus the design of amino acid sequences which are able to fold into tertiary structures. — Recent studies on the topology and the mechanism of folding have provided considerable insight into the occurrence of, and the rules governing the three-dimensional architecture of proteins. Secondary structures apparently play a key role in the folding process; helices and “β-structures” act as nucleation centers directing folding and account for the surprisingly small number of different folding topologies. The problem of secondary structure formation can be investigated directly by means of conformational studies on model peptides. Oligopeptides with tailormade physicochemical, structural and conformational properties can already be designed. The theoretical and experimental basis for the construction of polypeptides with stable tertiary structures is therefore established. The path to macromolecules with an immense variety of novel properties lays before us.  相似文献   

19.
Intrinsic disorder is relatively common in proteins, plays important roles in numerous cellular activities, and its prevalence was implicated in various human diseases. However, annotations of the disorder lag behind the rapidly increasing number of known protein chains. The last decade observed development of a relatively large number of in-silico methods that predict the disorder using the protein sequence as their input. We perform a first-of-its kind comprehensive empirical evaluation of the disorder predictors which is characterized by three novel aspects, (1) we evaluate the quality of the disorder predictions at the residue, segment, and chain levels; (2) we consider a large number of published and accessible to the end user predictors that are evaluated on a relatively big dataset with close to 500 proteins; and (3) we assess statistical significance of differences between the considered methods. Our study reveals that there is no universally superior predictor and that the top-performing methods are complementary. We show that while recent consensus-based predictors outperform other considered methods for the residue-level predictions, some older methods perform better for the prediction of the disordered segments. Our analysis indicates that certain predictors are biased to under-predict the disorder, while some other solutions tend to over-predict the number of the disordered residues. We also evaluate the utility of the predicted residue-level disorder for prediction of proteins with long disordered segments and prediction of the chainlevel disorder content. Lastly, we provide recommendations concerning development of a new generation of consensusbased methods and specialized methods for improved prediction of the disorder content.  相似文献   

20.
Intrinsically disordered proteins (IDPs) are involved in diverse cellular functions. Many IDPs can interact with multiple binding partners, resulting in their folding into alternative ligand‐specific functional structures. For such multi‐structural IDPs, a key question is whether these multiple structures are fully encoded in the protein sequence, as is the case in many globular proteins. To answer this question, here we employed a combination of single‐molecule and ensemble techniques to compare ligand‐induced and osmolyte‐forced folding of α‐synuclein. Our results reveal context‐dependent modulation of the protein′s folding landscape, suggesting that the codes for the protein′s native folds are partially encoded in its primary sequence, and are completed only upon interaction with binding partners. Our findings suggest a critical role for cellular interactions in expanding the repertoire of folds and functions available to disordered proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号