首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A statistical analytical approach has been used to analyze the secondary structure (SS) of amino acids as a function of the sequence of amino acid residues. We have used 306 non-homologous best-resolved protein structures from the Protein Data Bank for the analysis. A sequence region of 32 amino acids on either side of the residue is considered in order to calculate single amino acid propensities, di-amino acid potentials and tri-amino acid potentials. A weighted sum of predictions obtained using these properties is used to suggest a final prediction method. Our method is as good as the best-known SS prediction methods, is the simplest of all the methods, and uses no homologous sequence/family alignment data, yet gives 72% SS prediction accuracy. Since the method did not use many other factors that may increase the prediction accuracy there is scope to achieve greater accuracy using this approach. Received: 4 May 1998 / Accepted: 17 September 1998 / Published online: 10 December 1998  相似文献   

2.
用"相对熵"作为优化函数,提出了一个有效快速的折叠预测优化算法.使用了非格点模型,预测只关心蛋白质主链的走向.其中只用到了蛋白质主链上的两两连续的Cα原子间的距离信息以及20种氨基酸的接触势的一个扩展形式.对几个真实蛋白质做了算法测试,预测的初始结构都为比较大的去折叠态,预测构象相对于它们天然结构的均方根偏差(RMSD)为5~7 A.从原理上讲,该方法是对能量优化的改进.  相似文献   

3.
The recent accumulation of experimentally determined protein 3D structures combined with our ability to computationally model structure from amino acid sequence has resulted in an increased importance of structure-based methods for protein function prediction. Two types of methods for function prediction have been proposed: those that can accurately predict overall biochemical or biological roles of a protein and those that predict its functional residues. Here, we review approaches used for the computational identification of functional residues in protein structures and summarize their applications to a wide variety of problems in functional proteomics, such as the prediction of catalytic residues, post-translational modifications, or nucleic acid-binding sites. We examine four different problems in order to perform a comparison between several recently proposed methods and, finally, conclude by identifying limitations and future challenges in this field.  相似文献   

4.
Predicting protein structures from their amino acid sequences is a problem of global optimization. Global optima (native structures) are often sought using stochastic sampling methods such as Monte Carlo or molecular dynamics, but these methods are slow. In contrast, there are fast deterministic methods that find near-optimal solutions of well-known global optimization problems such as the traveling salesman problem (TSP). But fast TSP strategies have yet to be applied to protein folding, because of fundamental differences in the two types of problems. Here, we show how protein folding can be framed in terms of the TSP, to which we apply a variation of the Durbin-Willshaw elastic net optimization strategy. We illustrate using a simple model of proteins with database-derived statistical potentials and predicted secondary structure restraints. This optimization strategy can be applied to many different models and potential functions, and can readily incorporate experimental restraint information. It is also fast; with the simple model used here, the method finds structures that are within 5-6 A all-Calpha-atom RMSD of the known native structures for 40-mers in about 8 s on a PC; 100-mers take about 20 s. The computer time tau scales as tau approximately n, where n is the number of amino acids. This method may prove to be useful for structure refinement and prediction.  相似文献   

5.
Development of protein 3-D structural comparison methods is essential for understanding protein functions. Some amino acids share structural similarities while others vary considerably. These structures determine the chemical and physical properties of amino acids. Grouping amino acids with similar structures potentially improves the ability to identify structurally conserved regions and increases the global structural similarity between proteins. We systematically studied the effects of amino acid grouping on the numbers of Specific/specific, Common/common, and statistically different keys to achieve a better understanding of protein structure relations. Common keys represent substructures found in all types of proteins and Specific keys represent substructures exclusively belonging to a certain type of proteins in a data set. Our results show that applying amino acid grouping to the Triangular Spatial Relationship (TSR)-based method, while computing structural similarity among proteins, improves the accuracy of protein clustering in certain cases. In addition, applying amino acid grouping facilitates the process of identification or discovery of conserved structural motifs. The results from the principal component analysis (PCA) demonstrate that applying amino acid grouping captures slightly more structural variation than when amino acid grouping is not used, indicating that amino acid grouping reduces structure diversity as predicted. The TSR-based method uniquely identifies and discovers binding sites for drugs or interacting proteins. The binding sites of nsp16 of SARS-CoV-2, SARS-CoV and MERS-CoV that we have defined will aid future antiviral drug design for improving therapeutic outcome. This approach for incorporating the amino acid grouping feature into our structural comparison method is promising and provides a deeper insight into understanding of structural relations of proteins.  相似文献   

6.
The 3D structure of a protein is the main physical support of a protein's biological function; 3D protein folds are primarily maintained through interactions between amino acids. Inter-residue contacts are essential for the stability of protein folds. Therefore, many methodologies in the fields of structure analysis, structure prediction, and structure-function relationships are based on residue contacts. The present study provides a comparative analysis of two approaches for determining contacts: the classical distance-threshold method and an application of Laguerre, or weighted Voronoi tessellation. First, we examined mean contact distributions and their dependence on residue volumes, accessibility and hydrophobicity. In general, the different methods gave concordant results, although the method based on Cα distances showed significant discrepancies with the all-atom tessellation method. We also analyzed preferential contacts between all amino acid species and studied the influence of protein chain length, the proximity of the residues along the sequence, and the secondary structure environment. Interestingly, the discrepancies between methods were occasionally large enough to substantially change the relative preferences of some contacts. Finally, a case study on disulfide bridges demonstrated the importance of the structural environment in determining contacts from tessellation. In conclusion, the tessellation method is more accurate because of its fine adaptation to local protein topology, with far-reaching implications for most contact-based prediction methods of protein folding.  相似文献   

7.
8.
Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate and reliable prediction of protein domain linkers and boundaries is often considered to be the initial step of protein tertiary structure and function predictions. In this paper, we introduce CISA as a method for predicting inter-domain linker regions solely from the amino acid sequence information. The method first computes the amino acid compositional index from the protein sequence dataset of domain-linker segments and the amino acid composition. A preference profile is then generated by calculating the average compositional index values along the amino acid sequence using a sliding window. Finally, the protein sequence is segmented into intervals and a simulated annealing algorithm is employed to enhance the prediction by finding the optimal threshold value for each segment that separates domains from inter-domain linkers. The method was tested on two standard protein datasets and showed considerable improvement over the state-of-the-art domain linker prediction methods.  相似文献   

9.
Understanding of proteins adaptive to hypersaline environment and identifying them is a challenging task and would help to design stable proteins. Here, we have systematically analyzed the normalized amino acid compositions of 2121 halophilic and 2400 non-halophilic proteins. The results showed that halophilic protein contained more Asp at the expense of Lys, Ile, Cys and Met, fewer small and hydrophobic residues, and showed a large excess of acidic over basic amino acids. Then, we introduce a support vector machine method to discriminate the halophilic and non-halophilic proteins, by using a novel Pearson VII universal function based kernel. In the three validation check methods, it achieved an overall accuracy of 97.7%, 91.7% and 86.9% and outperformed other machine learning algorithms. We also address the influence of protein size on prediction accuracy and found the worse performance for small size proteins might be some significant residues (Cys and Lys) were missing in the proteins.  相似文献   

10.
Conformations of peptides are the basis for their property studies and the predictions of peptide structures are highly important in life science but very complex in practice. Here, thorough searches on the potential energy surfaces of 13 representative dipeptides by considering all possible combinations of the bond rotational degrees of freedom are performed using the density functional theory based methods. Careful analyses of the conformers of the 13 dipeptides and the corresponding amino acids reveal the connections between the structures of dipeptide and amino acids. A method for finding all important dipeptide conformers by optimizing a small number of trial structures generated by suitable superposition of the parent amino acid conformations is thus proposed. Applying the method to another eight dipeptides carefully examined by others shows that the new approach is both highly efficient and reliable by providing the most complete ensembles of dipeptide conformers and much improved agreements between the theoretical and experimental IR spectra. The method opens the door for the determination of the stable structures of all dipeptides with a manageable amount of effort. Preliminary result on the applicability of the method to the tripeptide structure determination is also presented. The results are the first step towards proving Anfinsen's hypothesis by revealing the relationships between the structures of the simplest peptide and its constituting amino acids. It implies that the structures of peptides are not only determined by their amino acid sequences, but also closely linked with the amino acid conformations. © 2009 Wiley Periodicals, Inc. J Comput Chem 2009  相似文献   

11.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

12.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

13.
This paper compares the performance of two clustering methods; DPClus graph clustering and hierarchical clustering to classify volatile organic compounds (VOCs) using fingerprint-based similarity measure between chemical structures. The clustering results from each method were compared to determine the degree of cluster overlap and how well it classified chemical structures of VOCs into clusters. Additionally, we also point out the advantages and limitations of both clustering methods. In conclusion, chemical similarity measure can be used to predict biological activities of a compound and this can be applied in the medical, pharmaceutical and agrotechnology fields.  相似文献   

14.
Molecular dynamics (MD) simulations are extensively used in the study of the structures and functions of proteins. Ab initio protein structure prediction is one of the most important subjects in computational biology, and many trials have been performed using MD simulation so far. Since the results of MD simulations largely depend on the force field, reliable force field parameters are indispensable for the success of MD simulation. In this work, we have modified atom charges in a standard force field on the basis of water-phase quantum chemical calculations. The modified force field turned out appropriate for ab initio protein structure prediction by the MD simulation with the generalized Born method. Detailed analysis was performed in terms of the conformational stability of amino acid residues, the stability of secondary structure of proteins, and the accuracy for prediction of protein tertiary structure, comparing the modified force field with a standard one. The energy balance between alpha-helix and beta-sheet structures was significantly improved by the modification of charge parameters.  相似文献   

15.
Currently, much effort is being directed to the determination of the three-dimensional structure of proteins. Two classes of research are of interest; spectrometric techniques which include Fourier transform infrared (FT-IR) spectrometry, and non-spectrometric prediction schemes. The spectra obtained using FT-IR spectrometry, are analyzed to determine the percentages of alpha-helices, beta-pleated sheets, and non-structured coils in a protein. Unfortunately, FT-IR, as well as other spectrometric techniques, cannot be used to determine the exact secondary structure of a protein reliably. Non-spectrometric prediction methods yield information on the exact secondary structure, but are not always accurate. Most prediction methods relate the primary amino acid sequence to the secondary structure of a protein, allowing sequential secondary structure information for the protein examined to be obtained. The goal of this research is to incorporate FT-IR with a prediction method, resulting in an improvement in the accuracy of the prediction.  相似文献   

16.
We have developed a soft energy function, termed GEMSCORE, for the protein structure prediction, which is one of emergent issues in the computational biology. The GEMSORE consists of the van der Waals, the hydrogen-bonding potential and the solvent potential with 12 parameters which are optimized by using a generic evolutionary method. The GEMSCORE is able to successfully identify 86 native proteins among 96 target proteins on six decoy sets from more 70,000 near-native structures. For these six benchmark datasets, the predictive performance of the GEMSCORE, based on native structure ranking and Z-scores, was superior to eight other energy functions. Our method is based solely on a simple and linear function and thus is considerably faster than other methods that rely on the additional complex calculations. In addition, the GEMSCORE recognized 17 and 2 native structures as the first and the second rank, respectively, among 21 targets in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction). These results suggest that the GEMSCORE is fast and performs well to discriminate between native and nonnative structures from thousands of protein structure candidates. We believe that GEMSCORE is robust and should be a useful energy function for the protein structure prediction.  相似文献   

17.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature.The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.  相似文献   

18.
19.
Computational methods are needed to help characterize the structure and function of protein–protein complexes. To develop and improve such methods, standard test problems are essential. One important test is to identify experimental structures from among large sets of decoys. Here, a flexible docking procedure was used to produce such a large ensemble of decoy complexes. In addition to their use for structure prediction, they can serve as a proxy for the nonspecific, protein–protein complexes that occur transiently in the cell, which are hard to characterize experimentally, yet biochemically important. For 202 homodimers and 41 heterodimers with known X‐ray structures, we produced an average of 1217 decoys each. The structures were characterized in detail. The decoys have rather large protein–protein interfaces, with at least 45 residue–residue contacts for every 100 contacts found in the experimental complex. They have limited intramonomer deformation and limited intermonomer steric conflicts. The decoys thoroughly sample each monomer's surface, with all the surface amino acids being part of at least one decoy interface. The decoys with the lowest intramonomer deformation were analyzed separately, as proxies for nonspecific protein–protein complexes. Their interfaces are less hydrophobic than the experimental ones, with an amino acid composition similar to the overall surface composition. They have a poorer shape complementarity and a weaker association energy, but are no more fragmented than the experimental interfaces, with 2.1 distinct patches of interacting residues on average, compared to 2.6 for the experimental interfaces. The decoys should be useful for testing and parameterizing docking methods and scoring functions; they are freely available as PDB files at http://biology.polytechnique.fr/decoys . © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

20.
Supersecondary structures (SSSs) are the building blocks of protein 3D structures. Accurate prediction of SSSs can be one important step toward building a tertiary structure from the specified secondary structure. How to improve the accuracy of prediction of SSSs by effectively incorporating the sequence order effects is an important and challenging problem. Based on a different form of Chou's pseudo amino acid composition, a novel approach for feature representation of SSSs is proposed. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are incorporated to represent the compositional features of proteins. Each supersecondary structural motif is characterized as a vector of 36 dimensions. In addition, we propose a novel prediction system by using SVM and IDQD algorithm as classifiers. Our method is trained and tested on ArchDB40 dataset containing 3088 proteins. The highest overall accuracy for the training dataset and the independent testing dataset are 77.7 and 69.4%, respectively. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号