首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.  相似文献   

2.
One of the most important challenges in computational and molecular biology is to understand the relationship between amino acid sequences and the folding rates of proteins. Recent works suggest that topological parameters, amino acid properties, chain length and the composition index relate well with protein folding rates, however, sequence order information has seldom been considered as a property for predicting protein folding rates. In this study, amino acid sequence order was used to derive an effective method, based on an extended version of the pseudo-amino acid composition, for predicting protein folding rates without any explicit structural information. Using the jackknife cross validation test, the method was demonstrated on the largest dataset (99 proteins) reported. The method was found to provide a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.81 (with a highly significant level) and the standard error is 2.46. The reported algorithm was found to perform better than several representative sequence-based approaches using the same dataset. The results indicate that sequence order information is an important determinant of protein folding rates.  相似文献   

3.
Understanding the relationship between amino acid sequences and folding rate of proteins is a challenging task similar to protein folding problem. In this work, we have analyzed the relative importance of protein sequence and structure for predicting the protein folding rates in terms of amino acid properties and contact distances, respectively. We found that the parameters derived with protein sequence (physical-chemical, energetic, and conformational properties of amino acid residues) show very weak correlation (|r| < 0.39) with folding rates of 28 two-state proteins, indicating that the sequence information alone is not sufficient to understand the folding rates of two-state proteins. However, the maximum positive correlation obtained for the properties, number of medium-range contacts, and alpha-helical tendency reveals the importance of local interactions to initiate protein folding. On the other hand, a remarkable correlation (r varies from -0.74 to -0.88) has been obtained between structural parameters (contact order, long-range order, and total contact distance) and protein folding rates. Further, we found that the secondary structure content and solvent accessibility play a marginal role in determining the folding rates of two-state proteins. Multiple regression analysis carried out with the combination of three properties, beta-strand tendency, enthalpy change, and total contact distance improved the correlation to 0.92 with protein folding rates. The relative importance of existing methods along with multiple-regression model proposed in this work will be discussed. Our results demonstrate that the native-state topology is the major determinant for the folding rates of two-state proteins.  相似文献   

4.
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.  相似文献   

5.
Prediction of protein folding rate change upon amino acid substitution is an important and challenging problem in protein folding kinetics and design. In this work, we have analyzed the relationship between amino acid properties and folding rate change upon mutation. Our analysis showed that the correlation is not significant with any of the studied properties in a dataset of 476 mutants. Further, we have classified the mutants based on their locations in different secondary structures and solvent accessibility. For each category, we have selected a specific combination of amino acid properties using genetic algorithm and developed a prediction scheme based on quadratic regression models for predicting the folding rate change upon mutation. Our results showed a 10-fold cross validation correlation of 0.72 between experimental and predicted change in protein folding rates. The correlation is 0.73, 0.65 and 0.79, respectively in strand, helix and coil segments. The method has been further tested with an extended dataset of 621 mutants and a blind dataset of 62 mutants, and we observed a good agreement with experiments. We have developed a web server for predicting the folding rate change upon mutation and it is available at .  相似文献   

6.
Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.  相似文献   

7.
Combinatorial protein libraries provide a promising route to investigate the determinants and features of protein folding and to identify novel folding amino acid sequences. A library of sequences based on a pool of different monomer types are screened for folding molecules, consistent with a particular foldability criterion. The number of sequences grows exponentially with the length of the polymer, making both experimental and computational tabulations of sequences infeasible. Herein a statistical theory is extended to specify the properties of sequences having particular values of global energetic quantities that specify their energy landscape. The theory yields the site-specific monomer probabilities. A foldability criterion is derived that characterizes the properties of sequences by quantifying the energetic separation of the target state from low-energy states in the unfolded ensemble and the fluctuations of the energies in the unfolded state ensemble. For a simple lattice model of proteins, excellent agreement is observed between the theory and the results of exact enumeration. The theory may be used to provide a quantitative framework for the design and interpretation of combinatorial experiments.  相似文献   

8.
The ability to predict protein folding rates constitutes an important step in understanding the overall folding mechanisms. Although many of the prediction methods are structure based, successful predictions can also be obtained from the sequence. We developed a novel method called prediction of protein folding rates (PPFR), for the prediction of protein folding rates from protein sequences. PPFR implements a linear regression model for each of the mainstream folding dynamics including two-, multi-, and mixed-state proteins. The proposed method provides predictions characterized by strong correlations with the experimental folding rates, which equal 0.87 for the two- and multistate proteins and 0.82 for the mixed-state proteins, when evaluated with out-of-sample jackknife test. Based on in-sample and out-of-sample tests, the PPFR's predictions are shown to be better than most of other sequence only and structure-based predictors and complementary to the predictions of the most recent sequence-based QRSM method. We show that simultaneous incorporation of several characteristics, including the sequence, physiochemical properties of residues, and predicted secondary structure provides improved quality. This hybridized prediction model was analyzed to reveal the complementary factors that can be used in tandem to predict folding rates. We show that bigger proteins require more time for folding, higher helical and coil content and the presence of Phe, Asn, and Gln may accelerate the folding process, the inclusion of Ile, Val, Thr, and Ser may slow down the folding process, and for the two-state proteins increased beta-strand content may decelerate the folding process. Finally, PPFR provides strong correlation when predicting sequences with low similarity.  相似文献   

9.
beta-barrel membrane proteins perform a variety of functions, such as mediating non-specific, passive transport of ions and small molecules, selectively passing the molecules like maltose and sucrose and are involved in voltage dependent anion channels. Understanding the structural features of beta-barrel membrane proteins and detecting them in genomic sequences are challenging tasks in structural and functional genomics. In this review, with the survey of experimentally known amino acid sequences and structures, the characteristic features of amino acid residues in beta-barrel membrane proteins and novel parameters for understanding their folding and stability will be described. The development of statistical methods and machine learning techniques for discriminating beta-barrel membrane proteins from other folding types of globular and membrane proteins will be explained along with their relative importance. Further, different methods including hydrophobicity profiles, rule based approach, amino acid properties, neural networks, hidden Markov models etc. for predicting membrane spanning segments of beta-barrel membrane proteins will be discussed. In addition, the applications of discrimination techniques for detecting beta-barrel membrane proteins in genomic sequences will be outlined. In essence, this comprehensive review would provide an overall picture about beta-barrel membrane proteins starting from the construction of datasets to genome-wide applications.  相似文献   

10.
Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important problem both for detecting outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have systematically analyzed the distribution of amino acid residues in the sequences of globular and outer membrane proteins. We observed that the occurrence of two neighboring aliphatic and polar residues is significantly higher in outer membrane proteins than in globular proteins. From the information about the dipeptide composition we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 95% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 87%. These accuracy levels are comparable to other methods in the literature. The influence of protein size and structural class for discrimination is discussed.  相似文献   

11.
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.  相似文献   

12.
The proteins structure can be mainly classified into four classes: all-alpha, all-beta, alpha/beta, and alpha + beta protein according to their chain fold topologies. For the purpose of predicting the protein structural class, a new predicting algorithm, in which the increment of diversity combines with Quadratic Discriminant analysis, is presented to study and predict protein structural class. On the basis of the concept of the pseudo amino acid composition (Chou, Proteins: Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60), 400 dipeptide components and 20 amino acid composition are, respectively, selected as parameters of diversity source. Total of 204 nonhomologous proteins constructed by Chou (Chou, Biochem Biophys Res Commun 1999, 264, 216) are used for training and testing the predictive model. The predicted results by using the pseudo amino acids approach as proposed in this paper can remarkably improve the success rates, and hence the current method may play a complementary role to other existing methods for predicting protein structural classification.  相似文献   

13.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

14.
The efficient synthesis of small molecules having many molecular skeletons is an unsolved problem in diversity-oriented synthesis (DOS). We describe the development and application of a synthesis strategy that uses common reaction conditions to transform a collection of similar substrates into a collection of products having distinct molecular skeletons. The substrates have different appendages that pre-encode skeletal information, called sigma-elements. This approach is analogous to the natural process of protein folding in which different primary sequences of amino acids are transformed into macromolecules having distinct three-dimensional structures under common folding conditions. Like sigma-elements, the amino acid sequences pre-encode structural information. An advantage of using folding processes to generate skeletal diversity in DOS is that skeletal information can be pre-encoded into substrates in a combinatorial fashion, similar to the way protein structural information is pre-encoded combinatorially in polypeptide sequences, thus making it possible to generate skeletal diversity in an efficient manner. This efficiency was realized in the context of a fully encoded, split-pool synthesis of approximately 1260 compounds potentially representing all possible combinations of building block, stereochemical, and skeletal diversity elements.  相似文献   

15.
Only a vanishingly small proportion of the almost infinite number of possible proteins occur in nature. Can this remaining potential of structural and functional diversity be used in the construction of new proteins? Is a “second evolution” of proteins and enzymes about to occur? These questions have suddenly become of interest because the recombinant DNA technique allows the synthesis of any given amino acid sequence. Examples of enzyme models demonstrate clearly that the unusual catalytic properties of enzymes are associated with the presence of a specifically folded polypeptide chain which has a complex three-dimensional form. The critical hurdle in the path of artificial proteins is thus the design of amino acid sequences which are able to fold into tertiary structures. — Recent studies on the topology and the mechanism of folding have provided considerable insight into the occurrence of, and the rules governing the three-dimensional architecture of proteins. Secondary structures apparently play a key role in the folding process; helices and “β-structures” act as nucleation centers directing folding and account for the surprisingly small number of different folding topologies. The problem of secondary structure formation can be investigated directly by means of conformational studies on model peptides. Oligopeptides with tailormade physicochemical, structural and conformational properties can already be designed. The theoretical and experimental basis for the construction of polypeptides with stable tertiary structures is therefore established. The path to macromolecules with an immense variety of novel properties lays before us.  相似文献   

16.
A method is presented for the structural characterization of proteins separated by two-dimensional poly-acrylamide gel electrophoresis (2D-PAGE). The method includes separation of a protein mixture by 2D-PAGE, recovery of proteins from the gel spots revealed by copper staining and analysis of the proteins by triple-stage quadrupole mass spectrometry using an electrospray ionization interface (ESI-TSQMS). Prior to the mass spectrometric analysis, the extracted proteins were passed through a small reversed-phase column (10 × 4.0 mm I.D.) to remove salts and gel-derived contaminants and then introduced into the mass spectrometer through a reversed-phase capillary column with 0.25 mm I.D. Application of the method to the analysis of rat cerebellar proteins suggests that the molecular mass could be accurately determined with sub-picomole amounts of protein samples derived from one or two 2D gels. The method was also useful for peptide mapping and determination of amino acid sequences of proteins micro-prepared from the 2D gel. Because 2D-PAGE has an excellent resolving power in protein separation and because capillary LC-ESI-TSQMS provides structural information with very small amounts of samples, the combined system of 2D-PAGE and capillary LC-ESI-TSQMS described here should allow wide applications to molecular studies of genes and proteins, such as identifications of protein spots on 2D gels, confirmation of gene/protein sequences and analysis of post-translational modification of proteins present naturally in tissue/cell extracts or expressed by recombinant DNA techniques.  相似文献   

17.
采用完全计数法,研究了二维紧密蛋白质链在不同HP序列时的构象性质,特别是具有唯一基态能量的折叠序列的性质.对于具有N个单体的紧密蛋白质链,发现有一定比例的序列为折叠序列.在这些折叠序列中,疏水基团(H)的数目比亲水基团(P)多20%,并同200种真实蛋白质分子的疏水基团和亲水基团的结果进行了比较.对于不同的折叠序列,根据序列中其疏水基团的数目,把具有相同疏水基团数目的序列归在同一类,发现这样的序列在总的序列中的相对含量满足高斯分布.同时还对序列中H(或者P)团族大小及其分别进行了研究,发现折叠序列与无规随机序列不同.还研究了不同折叠序列在不同链长时的比热情况,发现其相转变温度TC主要与链长有关,与折叠序列无关.  相似文献   

18.
De novo design of artificial proteins is an essential approach to elucidate the principles of protein architecture and to understand specific functions of natural proteins and also to yield novel molecules for medical and industrial aims. We have designed artificial sequences of 153 amino acids to fit the main-chain framework of the sperm whale myoglobin structure based on the knowledge-based energy functions to evaluate the compatibility between protein tertiary structures and amino acid sequences. The synthesized artificial globins bind a single heme per protein molecule as designed, which show well-defined electrochemical and spectroscopic features characteristic of proteins with a low-spin heme. Redox and ligand binding reactions of the artificial heme proteins were investigated and these heme-related functions were found to vary with their structural uniqueness. Relationships between the structural and functional properties are discussed.  相似文献   

19.
The native structure of fast-folding proteins, albeit a deep local free-energy minimum, may involve a relatively small energetic penalty due to nonoptimal, though favorable, contacts between amino acid residues. The weak energetic frustration that such contacts represent varies among different proteins and may account for folding behavior not seen in unfrustrated models. Minimalist model proteins with heterogeneous contacts--as represented by lattice heteropolymers consisting of three types of monomers--also give rise to weak energetic frustration in their corresponding native structures, and the present study of their equilibrium and nonequilibrium properties reveals some of the breadth in their behavior. In order to capture this range within a detailed study of only a few proteins, four candidate protein structures (with their cognate sequences) have been selected according to a figure of merit called the winding index--a characteristic of the number of turns the protein winds about an axis. The temperature-dependent heat capacities reveal a high-temperature collapse transition, and an infrequently observed low-temperature rearrangement transition that arises because of the presence of weak energetic frustration. Simulation results motivate the definition of a new measure of folding affinity as a sequence-dependent free energy--a function of both a reduced stability gap and high accessibility to non-native structures--that correlates strongly with folding rates.  相似文献   

20.
Machine learning algorithms have wide range of applications in bioinformatics and computational biology such as prediction of protein secondary structures, solvent accessibility, binding site residues in protein complexes, protein folding rates, stability of mutant proteins, and discrimination of proteins based on their structure and function. In this work, we focus on two aspects of predictions: (i) protein folding rates and (ii) stability of proteins upon mutations. We briefly introduce the concepts of protein folding rates and stability along with available databases, features for prediction methods and measures for prediction performance. Subsequently, the development of structure based parameters and their relationship with protein folding rates will be outlined. The structure based parameters are helpful to understand the physical basis for protein folding and stability. Further, basic principles of major machine learning techniques will be mentioned and their applications for predicting protein folding rates and stability of mutant proteins will be illustrated. The machine learning techniques could achieve the highest accuracy of predicting protein folding rates and stability. In essence, statistical methods and machine learning algorithms are complimenting each other for understanding and predicting protein folding rates and the stability of protein mutants. The available online resources on protein folding rates and stability will be listed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号