首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Prediction of protein structural classes and subcellular locations   总被引:1,自引:0,他引:1  
The structural class and subcellular location are the two important features of proteins that are closely related to their biological functions. With the rapid increase in new protein sequences entering into data banks, it is highly desirable to develop a fast and accurate method for predicting the attributes of these features for them. This can expedite the functionality determination of new proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Various prediction methods have been developed during the last two decades. This review is devoted to presenting a systematic introduction and comparison of the existing methods in respect to the prediction algorithm and classification scheme. The attention is focused on the state-of-the-art, which is featured by the covarient-discriminant algorithm developed very recently, as well as some new classification schemes for protein structural classes and subcellular locations. Particularly, addressed are also the physical chemistry foundation of the existing prediction methods, and the essence why the covariant-discriminant algorithm is so powerful.  相似文献   

2.
Peptide surfactants are a kind of newly emerged functional materials, which have a variety of applications such as building nanoarchitecture, stabilizing membrane proteins and controlling drug release. In the present study, we report the modelling and prediction of critical aggregation concentration (CAC), an important parameter that characterizes the self-assembling behaviour of peptide surfactants through the use of statistical modelling and quantitative structure–property relationship (QSPR) approaches. In order to accurately describe the structural and physicochemical properties of the highly flexible peptide molecules, a new method called molecular dynamics-based hydrophobic cross-field (MD-HCF) is proposed to capture both the hydrophobic profile and dynamic feature of 32 surface-activity, structure-known peptides. A number of statistical models are then developed using partial least squares (PLS) regression with or without improvement by genetic algorithm (GA). We demonstrate that MD-HCF performs much better than the widely used CODESSA method in both its predictability and interpretability. We also highlight the importance of dynamic hydrophobic property in accurate prediction and reasonable explanation of peptide self-assembling behaviour in solution, albeit which is exhaustive to compute compared with those derived directly from peptide static structure. To the best of our knowledge, this study is the first to computationally model and predict the self-assembling behaviour of peptide surfactants.  相似文献   

3.
Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.  相似文献   

4.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

5.
A modified Sammon algorithm was developed to display a relationship between proteins based on their amino acid composition. In the first stage of the method, a 19-dimensional compositional space of representative proteins was mapped into a two-dimensional space (2D) using the original Sammon projection creating a contour map. In the second stage, this contour map was used as a reference for new proteins projected into 2D. Data analysis showed that proteins belonging to the same structural classes formed characteristic and distinct clusters, which could be potentially useful in the prediction of protein structural classes. However, we observed significant overlapping of the clusters, which may explain the limited success of previous protein folding prediction based solely on amino acid composition. Regardless, the modified Sammon projections can generate a unique index for each individually projected protein related to its amino acid composition, which may be a useful tool in the exploratory classification of proteins. ©1999 John Wiley & Sons, Inc. J Comput Chem 20: 1049–1059, 1999  相似文献   

6.
A new algorithm to predict protein-protein binding sites using conservation of both protein surface structure and physical-chemical properties in structurally similar proteins is developed. Binding-site residues in proteins are known to be more conserved than the rest of the surface, and finding local surface similarities by comparing a protein to its structural neighbors can potentially reveal the location of binding sites on this protein. This approach, which has previously been used to predict binding sites for small ligands, is now extended to predict protein-protein binding sites. Examples of binding-site predictions for a set of proteins, which have previously been studied for sequence conservation in protein-protein interfaces, are given. The predicted binding sites and the actual binding sites are in good agreement. Our algorithm for finding conserved surface structures in a set of similar proteins is a useful tool for the prediction of protein-protein binding sites.  相似文献   

7.
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.  相似文献   

8.
Newly synthesized proteins have an intrinsic signal sequence, functioning as "address tags" or "zip codes", that is essential for guiding them wherever they are needed. Owing to such a unique function, protein signals have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. However, to effectively use protein signals as a desirable vehicle in the field of proteomics, the first important thing is to find a fast and powerful method to identify the "address tag" or "zip code" entity. Although all signal sequences contain a hydrophobic core region, they show great variation in both overall length and amino acid sequence. It is this variation that makes it possible to deliver thousands of proteins to many different cellular locations by varieties of modes. It is also this variation that makes it very difficult to formulate a general algorithm to predict signal sequences. Nevertheless, various prediction models and algorithms have been developed during the past 17 years. This Review summarizes the development in this area, from the pioneering methods to neural network approaches, and to the sub-site coupling approaches. Meanwhile, the future challenges in this area, as well as some promising avenues for further improving the prediction quality, have been briefly addressed as well.  相似文献   

9.
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.  相似文献   

10.
This review is to summarize three new QSAR (quantitative structure-activity relationship) methods recently developed in our group and their applications for drug design. Based on more solid theoretical models and advanced mathematical techniques, the conventional QSAR technique has been recast in the following three aspects. (1) In the fragment-based two dimensional QSAR, or abbreviated as FB-QSAR, the molecular structures in a family of drug candidates are divided into several fragments according to the substitutes being investigated. The bioactivities of drug candidates are correlated with physicochemical properties of the molecular fragments through two sets of coefficients: one is for the physicochemical properties and the other for the molecular fragments. (2) In the multiple field three dimensional QSAR, or MF-3D-QSAR, more molecular potential fields are integrated into the comparative molecular field analysis (CoMFA) through two sets of coefficients: one is for the potential fields and the other for the Cartesian three dimensional grid points. (3) In the AABPP (amino acid-based peptide prediction), the bioactivities of peptides or proteins are correlated with the physicochemical properties of all or partial residues of the sequence through two sets of coefficients: one is for the physicochemical properties of amino acids and the other for the weight factors of the residues. Meanwhile, an iterative double least square (IDLS) technique is developed for solving the two sets of coefficients in a training dataset alternately and iteratively. Using the two sets of coefficients, one can predict the bioactivity of a query peptide, protein, or drug candidate. Compared with the old methods, the new QSAR approaches as summarized in this review possess machine learning ability, can remarkably enhance the prediction power, and provide more structural information. Meanwhile, the future challenge and possible development in this area have been briefly addressed as well.  相似文献   

11.
Structural and theoretical analyses of proteins are central to the understanding of complex molecular mechanisms and are fundamental to the drug discovery process. Computational techniques yield useful insights into an ever-wider range of biomolecular systems. Protein three-dimensional structures and molecular functions can be predicted in some circumstances, while experimental structures can be analyzed in depth via such computational approaches. Non-covalent binding of biomolecules can be understood by considering structural, thermodynamic and kinetic issues, and theoretical simulations of such events can be attempted. The central role of electrostatic interactions with regard to protein function, structure and stability has been investigated and some electrostatic properties can be modeled theoretically. Computer methods thus help to prioritize, design, analyze and rationalize biochemical experiments. Cardiovascular diseases and associated blood coagulation disorders are leading causes of death worldwide. Blood coagulation involves more than 30 proteins that interact specifically with various degrees of affinity. Many of these molecules can also bind transiently to phospholipid surfaces. Numerous point mutations in the genes of coagulation proteins and regulators have been identified. Understanding the coagulation cascade, its regulation and the impact of mutations is required for the development of new therapies and diagnostic tools. In this review, we describe concepts and methods pertaining to the field of structural bioinformatics. We provide examples of applications of these approaches to blood coagulation proteins and show that such studies can give insights about molecular mechanisms contributing to cardiovascular disease susceptibility.  相似文献   

12.
Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein–protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.  相似文献   

13.
14.
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.  相似文献   

15.
Protein fold recognition   总被引:4,自引:0,他引:4  
Summary An important, yet seemingly unattainable, goal in structural molecular biology is to be able to predict the native three-dimensional structure of a protein entirely from its amino acid sequence. Prediction methods based on rigorous energy calculations have not yet been successful, and best results have been obtained from homology modelling and statistical secondary structure prediction. Homology modelling is limited to cases where significant sequence similarity is shared between a protein of known structure and the unknown. Secondary structure prediction methods are not only unreliable, but also do not offer any obvious route to the full tertiary structure. Recently, methods have been developed whereby entire protein folds are recognized from sequence, even where little or no sequence similarity is shared between the proteins under consideration. In this paper we review the current methods, including our own, and in particular offer a historical background to their development. In addition, we also discuss the future of these methods and outline the developments under investigation in our laboratory.  相似文献   

16.
Prediction methods of structural features in 1D represent a useful tool for the understanding of folding, classification, and function of proteins, and, in particular, for 3D structure prediction. Among the structural aspects characterizing a protein, solvent accessibility has received great attention in recent years. The available methods proposed for predicting accessibility have never considered the combination of the results deriving from different methods to construct a consensus prediction able to provide more reliable results. A consensus approach that increases prediction accuracy using three high-performance methods is described. The results of our method for three different protein data sets show that up to 3.0% improvement in prediction accuracy of solvent accessibility may be obtained by a consensus approach. The improvement also extends to the correlation coefficient. Application of our consensus approach to the accessibility prediction using only three prediction methods gives results better than single methods combined for consensus formation. Currently, the scarce availability of predictors with similar parameters defining solvent accessibility hinders the testing of other methods in our consensus procedure.  相似文献   

17.
Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.  相似文献   

18.
The preferential occurrence of certain disulphide-bridge topologies in proteins has prompted us to design a method and a program, KNOT-MATCH, for their classification. The program has been applied to a database of proteins with less than 65% homology and more than two disulphide bridges. We have investigated whether there are topological preferences that can be used to group proteins and if these can be applied to gain insight into the structural or functional relationships among them. The classification has been performed by Density Search and Hierarchical Clustering Techniques, yielding thirteen main protein classes from the superimposition and clustering process. It is noteworthy that besides the disulphide bridges, regular secondary structures and loops frequently become correctly aligned. Although the lack of significant sequence similarity among some clustered proteins precludes the easy establishment of evolutionary relationships, the program permits us to find out important structural or functional residues upon the superimposition of two protein structures apparently unrelated. The derived classification can be very useful for finding relationships among proteins which would escape detection by current sequence or topology-based analytical algorithms.  相似文献   

19.
The classification of patterns of the three-dimensional folding of a covalently crosslinked polypeptide chain can be used to introduce long-range interactions into the theoretical search for the native conformation of a protein. This classification into Spatial Geometric Arrangements of Loops (SGAL) had been proposed earlier (H. Meirovitch and H. A. Scheraga, Macromolecules 14 , 1250, 1981). It is based on the subdivision of the protein molecule into closed loops, defined by covalent crosslinks (such as disulfide bonds). Various SGAL classes correspond to the presence or absence of mutual penetration of loops, called entanglements or thrustings. A systematic and objective method is developed here to enumerate all theoretically possible SGAL's for a protein, based only on its covalent structure, i.e., the pattern of disulfide bonds or other crosslinks, regardless of whether the three-dimensional structure is known or unknown. This information can be of use in structural predictions of folding patterns. Using a modification of the method, it is also possible to determine the SGAL class to which a protein of known structure belongs. Out of 18 proteins with known three-dimensional structure and containing more than two disulfide bonds, five have a native structure with at least one entanglement or thrusting. Thus, threaded SGAL's represent a significant structural feature of native proteins. All five involve neighboring loops in the sequences. Their presence in a protein can suggest restrictions on the possible ways of folding the protein.  相似文献   

20.
Nearly all enzymes are proteins. They are the biological catalysts that accelerate the function of cellular reactions. Because of different characteristics of reaction tasks, they split into six classes: oxidoreductases (EC-1), transferases (EC-2), hydrolases (EC-3), lyases (EC-4), isomerases (EC-5), ligases (EC-6). Prediction of enzyme classes is of great importance in identifying which enzyme class is a member of a protein. Since the enzyme sequences increase day by day, contrary to experimental analysis in prediction of enzyme classes for a newly found enzyme sequence, providing from data mining techniques becomes very useful and time-saving.In this paper, two kinds of simple minimum distance-based classifier methods have been proposed. These methods and known K-nearest neighbor (KNN) classification algorithm have been performed in order to classify enzymes according to their amino acid composition. Performance measurements and elapsed time to execute algorithms have been compared. In addition, equality of two proposed approaches under special condition has been proved in order to be a guide for researchers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号