首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
Protein fold recognition is an important and essential step in determining tertiary structure of a protein in biological science. In this study, a model termed NiRecor is developed for recognizing protein folds based on artificial neural networks incorporated in an adaptive heterogeneous particle swarm optimizer. The main contribution of NiRecor is that it is a data‐driven and highly‐performing predictor without manually tuning control parameters for different data sets. In biological science, since evolutionary‐ and structure‐based information of amino acid sequences is greatly important in determination of tertiary structure of a protein, accordingly, in NiRecor we employ two different feature sets, which involve position specific scoring matrix and secondary structure prediction matrix, to predict the structural classes of protein folds. The experimental results demonstrate the proposed method is powerful in predicting protein folds with higher precisions by improvements of 1.1 ∼7.8 percentages on three benchmark datasets by comparing with several existing predictors. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
Protein fold recognition   总被引:4,自引:0,他引:4  
Summary An important, yet seemingly unattainable, goal in structural molecular biology is to be able to predict the native three-dimensional structure of a protein entirely from its amino acid sequence. Prediction methods based on rigorous energy calculations have not yet been successful, and best results have been obtained from homology modelling and statistical secondary structure prediction. Homology modelling is limited to cases where significant sequence similarity is shared between a protein of known structure and the unknown. Secondary structure prediction methods are not only unreliable, but also do not offer any obvious route to the full tertiary structure. Recently, methods have been developed whereby entire protein folds are recognized from sequence, even where little or no sequence similarity is shared between the proteins under consideration. In this paper we review the current methods, including our own, and in particular offer a historical background to their development. In addition, we also discuss the future of these methods and outline the developments under investigation in our laboratory.  相似文献   

3.
Prediction of protein accessibility from sequence, as prediction of protein secondary structure is an intermediate step for predicting structures and consequently functions of proteins. Most of the currently used methods are based on single residue prediction, either by statistical means or evolutionary information, and accessibility state of central residue in a window predicted. By expansion of databases of proteins with known 3D structures, we extracted information of pairwise residue types and conformational states of pairs simultaneously. For solving the problem of ambiguity in state prediction by one residue window sliding, we used dynamic programming algorithm to find the path with maximum score. The three state overall per-residue accuracy, Q3, of this method in a Jackknife test with dataset of known proteins is more than 65% which is an improvement on results of methods based on evolutionary information.  相似文献   

4.
The analysis of residue-residue contacts in protein structures can shed some light on our understanding of the folding and stability of proteins. In this paper, we study the statistical properties of long-range and short-range residue-residue contacts of 91 globular proteins using CSU software and analyze the importance of long-range contacts in globular protein structure. There are many short-range and long-range contacts in globular proteins, and it is found that the average number of long-range contacts per residue is 5.63 and the percentage of residue-residue contacts which are involved in long-range ones is 59.4%. In more detail, the distribution of long-range contacts in different residue intervals is investigated and it is found that the residues occurring in the interval range of 4-10 residues apart in the sequence contribute more long-range contacts to the stability of globular protein. The number of long-range contacts per residue, which is a measure of ability toform residue-residue contacts, is also calculated for 20 different amino acid residues. It is shown that hydrophobic residues (including Leu, Val, Ile, Met, Phe, Tyr, Cys and Trp) having a large number of long-range contacts easily form long-range contacts, while the hydrophilic amino acids (including Ala, Gly, Thr, His, Glu, Gln, Asp, Asn, Lys, Ser, Arg, and Pro) form long-range contacts with more difficulty. The relationship between the Fauchere-Pliska hydrophobicity scale (FPH) and the number of short-range and long-range contacts per residue for 20 amino acid residues is also studied. An approximately linear relationship between the Fauchere-Pliska hydrophobicity scale (FPH) and the number of long-range contacts per residue CL is found and can be expressed as  相似文献   

5.
Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.  相似文献   

6.
A method is described for the refinement of rough protein models based on finding a selection of structural fragments that match the model. Unlike most fragment-based methods, these are not necessarily contiguous in the sequence and form a tiling (tessellation) that covers most of the structure. The residue positions of the fragments are then used as a target for the model atoms to generate a revised model which is used as the basis of a subsequent pattern definition and search. The method was shown to improve the recognition of the native fold in a series of decoys largely as a result of improved secondary structure representation.  相似文献   

7.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

8.
Computational methods are needed to help characterize the structure and function of protein–protein complexes. To develop and improve such methods, standard test problems are essential. One important test is to identify experimental structures from among large sets of decoys. Here, a flexible docking procedure was used to produce such a large ensemble of decoy complexes. In addition to their use for structure prediction, they can serve as a proxy for the nonspecific, protein–protein complexes that occur transiently in the cell, which are hard to characterize experimentally, yet biochemically important. For 202 homodimers and 41 heterodimers with known X‐ray structures, we produced an average of 1217 decoys each. The structures were characterized in detail. The decoys have rather large protein–protein interfaces, with at least 45 residue–residue contacts for every 100 contacts found in the experimental complex. They have limited intramonomer deformation and limited intermonomer steric conflicts. The decoys thoroughly sample each monomer's surface, with all the surface amino acids being part of at least one decoy interface. The decoys with the lowest intramonomer deformation were analyzed separately, as proxies for nonspecific protein–protein complexes. Their interfaces are less hydrophobic than the experimental ones, with an amino acid composition similar to the overall surface composition. They have a poorer shape complementarity and a weaker association energy, but are no more fragmented than the experimental interfaces, with 2.1 distinct patches of interacting residues on average, compared to 2.6 for the experimental interfaces. The decoys should be useful for testing and parameterizing docking methods and scoring functions; they are freely available as PDB files at http://biology.polytechnique.fr/decoys . © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

9.
We describe ProteinShop, a new visualization tool that streamlines and simplifies the process of determining optimal protein folds. ProteinShop may be used at different stages of a protein structure prediction process. First, it can create protein configurations containing secondary structures specified by the user. Second, it can interactively manipulate protein fragments to achieve desired folds by adjusting the dihedral angles of selected coil regions using an Inverse Kinematics method. Last, it serves as a visual framework to monitor and steer a protein structure prediction process that may be running on a remote machine. ProteinShop was used to create initial configurations for a protein structure prediction method developed by a team that competed in CASP5. ProteinShop's use accelerated the process of generating initial configurations, reducing the time required from days to hours. This paper describes the structure of ProteinShop and discusses its main features.  相似文献   

10.
A statistical analytical approach has been used to analyze the secondary structure (SS) of amino acids as a function of the sequence of amino acid residues. We have used 306 non-homologous best-resolved protein structures from the Protein Data Bank for the analysis. A sequence region of 32 amino acids on either side of the residue is considered in order to calculate single amino acid propensities, di-amino acid potentials and tri-amino acid potentials. A weighted sum of predictions obtained using these properties is used to suggest a final prediction method. Our method is as good as the best-known SS prediction methods, is the simplest of all the methods, and uses no homologous sequence/family alignment data, yet gives 72% SS prediction accuracy. Since the method did not use many other factors that may increase the prediction accuracy there is scope to achieve greater accuracy using this approach. Received: 4 May 1998 / Accepted: 17 September 1998 / Published online: 10 December 1998  相似文献   

11.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature.The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.  相似文献   

12.
We propose a new analytical method for detecting and computing contacts between atoms in biomolecules. It is based on the alpha shape theory and proceeds in three steps. First, we compute the weighted Delaunay triangulation of the union of spheres representing the molecule. In the second step, the Delaunay complex is filtered to derive the dual complex. Finally, contacts between spheres are collected. In this approach, two atoms i and j are defined to be in contact if their centers are connected by an edge in the dual complex. The contact areas between atom i and its neighbors are computed based on the caps formed by these neighbors on the surface of i; the total area of all these caps is partitioned according to their spherical Laguerre Voronoi diagram on the surface of i. This method is analytical and its implementation in a new program BallContact is fast and robust. We have used BallContact to study contacts in a database of 1551 high resolution protein structures. We show that with this new definition of atomic contacts, we generate realistic representations of the environments of atoms and residues within a protein. In particular, we establish the importance of nonpolar contact areas that complement the information represented by the accessible surface areas. This new method bears similarity to the tessellation methods used to quantify atomic volumes and contacts, with the advantage that it does not require the presence of explicit solvent molecules if the surface of the protein is to be considered. © 2012 Wiley Periodicals, Inc.  相似文献   

13.
Monolayer adsorbed water on the beta-cristobalite (100) surface is studied via classical molecular dynamics simulations. The ordered two-dimensional (2D) tessellation ice structure (i.e., the four-membered and the eight-membered rings appear alternatively) is justified at low temperatures in the simulations. The stability of this possible new ice phase is further investigated by heating the system from 5 to 300 K. An order-disorder structural transition is observed between 100 and 200 K, featuring the melting process of the tessellation ice. This process is characterized by the water oxygen-oxygen radial distribution function, the coordination number, the distance vector between the center of mass of the oxygen and the hydrogen atoms in water, the mean square displacement of oxygen in water, and the vibrational density of state. The above techniques show consistency on that the order-disorder transition temperature of the 2D tessellation ice is far below 300 K. The 2D tessellation ice structure is also obtained via density functional calculations with different generalized gradient approximations. By comparing the calculated adsorption and the lateral energies between different methods, we find that the melting temperature of the specific 2D ice structure is strongly method dependent. Therefore, further experimental works are urged to justify this possible new ice phase and probe its stability.  相似文献   

14.
15.
Protein Digestibility-Corrected Amino Score (PDCAAS) is discussed. PDCAAS is now widely used as a routine assay for protein quality evaluation, replacing the more traditional biological methods [e.g., measurement of the Protein Efficiency Ratio (PER) in rats]. PDCAAS is based on comparison of the essential amino acid content of a test protein with that of a reference essential amino acid pattern and a correction for differences in protein digestibility as determined using a rat assay. Although PDCAAS is a rapid and useful method, it often shows discrepancies when compared to PER values. These discrepancies relate to the following issues: uncertainty about the validity of reference patterns, invalidity of correction for fecal (versus ileal) digestibility, truncation of PDCAAS values to 100%, failure to obtain full biological response after supplementation of the limiting essential amino acid, discrepancies between protein and amino acid digestibility, effects of processing on protein quality, and effects of the presence of antinutritional factors in the matrix containing the protein. Part of the discrepancy between PDCAAS and PER can be overcome by modifications of PDCAAS. This article describes some proposed modifications and puts forward the suggestion that the rat protein fecal digestibility assay be replaced by an in vitro ileal amino acid digestibility assay based on a computer-controlled gastrointestinal model.  相似文献   

16.
In a fine-grained computational analysis of protein structure, we investigated the relationships between a residue's backbone conformations and its side-chain packing as well as conformations. To produce continuous distributions in high resolution, we ran molecular dynamics simulations over a set of protein folds (dynameome). In effect, the dynameome dataset samples not only the states well represented in the PDB but also the known states that are not well represented in the structural database. In our analysis, we characterized the mutual influence among the backbone ?,ψ angles with the first side-chain torsion angles (χ1) and the volumes occupied by the side-chains. The dependencies of these relationships on side-chain environment and amino acids are further explored. We found that residue volumes exhibit dependency on backbone 2° structure conformation: side-chains pack more densely in extended β-sheet than in α-helical structures. As expected, residue volumes on the protein surface were larger than those in the interior. The first side-chain torsion angles are found to be dependent on the backbone conformations in agreement with previous studies, but the dynameome dataset provides higher resolution of rotamer preferences based on the backbone conformation. All three gauche?, gauche+, and trans rotamers show different patterns of ?,ψ dependency, and variations in χ1 value are skewed from their canonical values to relieve the steric strains. By demonstrating the utility of dynameomic modeling on the native state ensemble, this study reveals details of the interplay among backbone conformations, residue volumes and side-chain conformations.  相似文献   

17.
Characterizing the structure of transition states (TS) is a first step towards understanding two-state protein folding mechanisms. However, a direct experimental characterization of these states is challenging and indirect information derived from protein engineering methodologies (?-value analysis) is often difficult to interpret. We present here a theoretical study on the nature of the transition state ensemble for three representative proteins covering the major structural classes using a mean-field C(α)-based Gō-model. We identify that transition state ensembles are dominated by local contacts, indicating that most non-local contacts form only upon crossing the macroscopic folding free energy barrier. We demonstrate that the mean ?-value corresponds to the fraction of stabilization energy gained at the barrier-top in two-state-like systems, and that it depends monotonically on the stability conditions. Furthermore, we show that there is a fundamental connection between small destabilization and large ?-values that in turn depends on the location of the mutated residue in the structure. These results that are in agreement with the recent empirical findings highlight the importance of local energetics in determining folding mechanisms.  相似文献   

18.
Routine structure prediction of new folds is still a challenging task for computational biology. The challenge is not only in the proper determination of overall fold but also in building models of acceptable resolution, useful for modeling the drug interactions and protein-protein complexes. In this work we propose and test a comprehensive approach to protein structure modeling supported by sparse, and relatively easy to obtain, experimental data. We focus on chemical shift-based restraints from NMR, although other sparse restraints could be easily included. In particular, we demonstrate that combining the typical NMR software with artificial intelligence-based prediction of secondary structure enhances significantly the accuracy of the restraints for molecular modeling. The computational procedure is based on the reduced representation approach implemented in the CABS modeling software, which proved to be a versatile tool for protein structure prediction during the CASP (CASP stands for critical assessment of techniques for protein structure prediction) experiments (see http://predictioncenter/CASP6/org). The method is successfully tested on a small set of representative globular proteins of different size and topology, including the two CASP6 targets, for which the required NMR data already exist. The method is implemented in a semi-automated pipeline applicable to a large scale structural annotation of genomic data. Here, we limit the computations to relatively small set. This enabled, without a loss of generality, a detailed discussion of various factors determining accuracy of the proposed approach to the protein structure prediction.  相似文献   

19.
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.  相似文献   

20.
A new method is presented to accurately determine the probability of having a deuterium or hydrogen atom on a specific amide position within a peptide after deuterium/hydrogen (D/H) exchange in solution. Amide hydrogen exchange has been proven to be a sensitive probe for studying protein structures and structural dynamics. At the same time, mass spectrometry in combination with physical fragmentation methods is commonly used to sequence proteins based on an amino acid residue specific mass analysis. In the present study it is demonstrated that the isotopic patterns of a series of peptide fragment ions obtained with capillary-skimmer dissociation, as observed with a 9.4 T Fourier transform ion cyclotron resonance (FTICR) mass spectrometer, can be used to calculate the isotopic state of specific amide hydrogens. This calculation is based on the experimentally observed isotopic patterns of two consecutive fragments and on the isotopic binomial distributions of the atoms in the residue constituting the difference between these two consecutive fragments. The applicability of the method is demonstrated by following the sequence-specific D/H exchange rate in solution of single amide hydrogens within some peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号