首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. We have developed a method based on radial basis function networks and position specific scoring matrix (PSSM) profiles generated by PSI-BLAST and non-redundant protein database. Our approach with PSSM profiles has correctly predicted the OMPs with a cross-validated accuracy of 96.4% in a set of 1251 proteins, which contain 206 OMPs, 667 globular proteins and 378 alpha-helical inner membrane proteins. Furthermore, we applied our method on a dataset containing 114 OMPs, 187 TMH proteins and 195 globular proteins obtained with less than 20% sequence identity and obtained the cross-validated accuracy of 95%. This accuracy of discriminating OMPs is higher than other methods in the literature and our method could be used as an effective tool for dissecting OMPs from genomic sequences. We have developed a prediction server, TMBETADISC-RBF, which is available at http://rbf.bioinfo.tw/~sachen/OMP.html.  相似文献   

2.
Prediction of membrane spanning segments in β‐barrel outer membrane proteins (OMP) and their topology is an important problem in structural and functional genomics. In this work, we propose a method based on radial basis networks for predicting the number of β‐strands in OMPs and identifying their membrane spanning segments. Our method showed a leave‐one‐out cross validation accuracy of 96% in a set of 28 OMPs, which have the range of 8–22 β‐strand segments. The β‐strand segments in OMPs and the residues in membrane spanning segments are correctly predicted with the accuracy of 96% and 87%, respectively. We have developed a web server, TMBETAPRED‐RBF for predicting the transmembrane β‐strands from amino acid sequence and it is available at http://rbf.bioinfo.tw/~sachen/tmrbf.html . We suggest that our method could be an effective tool for predicting the membrane spanning regions and topology of β‐barrel membrane proteins. © 2009 Wiley Periodicals, Inc. J Comput Chem 2010  相似文献   

3.
We have developed a novel approach for dissecting transmembrane beta-barrel proteins (TMBs) in genomic sequences. The features include (i) the identification of TMBs using the preference of residue pairs in globular, transmembrane helical (TMH) and TMBs, (ii) elimination of globular/TMH proteins that show sequence identity of more than 70% for the coverage of 80% residues with known structures, (iii) elimination of globular/TMH proteins that have sequence identity of more than 60% with known sequences in SWISS-PROT, and (iv) exclusion of TMH proteins using SOSUI, a prediction system for TMH proteins. Our approach picked up 7% TMBs in all the considered genomes. The comparison between the identified TMBs in E. coli genome and available experimental data demonstrated that the new approach could correctly identify all the 11 known TMBs, whose crystal structures are available. Further, it revealed the presence of 19 TMBs, homology with known structures, 60 TMBs similar to well annotated sequences, and 54 TMBs that have high sequence similarity with Escherichia coli beta-barrel proteins deposited in Transport Classification Database (TCDB). Interestingly, the present approach identified TMBs from all 15 families in TCDB. In human genome, the occurrence of TMBs varies from 0 to 3% in different chromosomes. We suggest that our approach could lead to a step forward in the advancement of structural and functional genomics.  相似文献   

4.
Prediction of transmembrane beta-strands in outer membrane proteins (OMP) is one of the important problems in computational chemistry and biology. In this work, we propose a method based on neural networks for identifying the membrane-spanning beta-strands. We introduce the concept of "residue probability" for assigning residues in transmembrane beta-strand segments. The performance of our method is evaluated with single-residue accuracy, correlation, specificity, and sensitivity. Our predicted segments show a good agreement with experimental observations with an accuracy level of 73% solely from amino acid sequence information. Further, the predictive power of N- and C-terminal residues in each segments, number of segments in each protein, and the influence of cutoff probability for identifying membrane-spanning beta-strands will be discussed. We have developed a Web server for predicting the transmembrane beta-strands from the amino acid sequence, and the prediction results are available at http://psfs.cbrc.jp/tmbeta-net/.  相似文献   

5.
beta-barrel membrane proteins perform a variety of functions, such as mediating non-specific, passive transport of ions and small molecules, selectively passing the molecules like maltose and sucrose and are involved in voltage dependent anion channels. Understanding the structural features of beta-barrel membrane proteins and detecting them in genomic sequences are challenging tasks in structural and functional genomics. In this review, with the survey of experimentally known amino acid sequences and structures, the characteristic features of amino acid residues in beta-barrel membrane proteins and novel parameters for understanding their folding and stability will be described. The development of statistical methods and machine learning techniques for discriminating beta-barrel membrane proteins from other folding types of globular and membrane proteins will be explained along with their relative importance. Further, different methods including hydrophobicity profiles, rule based approach, amino acid properties, neural networks, hidden Markov models etc. for predicting membrane spanning segments of beta-barrel membrane proteins will be discussed. In addition, the applications of discrimination techniques for detecting beta-barrel membrane proteins in genomic sequences will be outlined. In essence, this comprehensive review would provide an overall picture about beta-barrel membrane proteins starting from the construction of datasets to genome-wide applications.  相似文献   

6.
A novel method is developed to model and predict the transmembrane regions of beta-barrel membrane proteins. It is based on a Hidden Markov model (HMM) with architecture obeying those proteins' construction principles. The HMM is trained and tested on a non-redundant set of 11 beta-barrel membrane proteins known to date at atomic resolution with a jack-knife procedure. As a result, the method correctly locates 97% of 172 transmembrane beta-strands. Out of the 11 proteins, the barrel size for ten proteins and the overall topology for seven proteins are correctly predicted. Additionally, it successfully assigns the entire topology for two new beta-barrel membrane proteins that have no significant sequence homology to the 11 proteins. Predicted topology for two candidates for beta-barrel structure of the outer mitochondrial membrane is also presented in the paper.  相似文献   

7.
β-Barrel membrane proteins are found in the outer membrane of gram-negative bacteria, mitochondria, and chloroplasts. They are important for pore formation, membrane anchoring, and enzyme activity. These proteins are also often responsible for bacterial virulence. Due to difficulties in experimental structure determination, they are sparsely represented in the protein structure databank. We have developed a computational method for predicting structures of the transmembrane (TM) domains of β-barrel membrane proteins. Based on physical principles, our method can predict structures of the TM domain of β-barrel membrane proteins of novel topology, including those from eukaryotic mitochondria. Our method is based on a model of physical interactions, a discrete conformational state space, an empirical potential function, as well as a model to account for interstrand loop entropy. We are able to construct three-dimensional atomic structure of the TM domains from sequences for a set of 23 nonhomologous proteins (resolution 1.8-3.0 ?). The median rmsd of TM domains containing 75-222 residues between predicted and measured structures is 3.9 ? for main chain atoms. In addition, stability determinants and protein-protein interaction sites can be predicted. Such predictions on eukaryotic mitochondria outer membrane protein Tom40 and VDAC are confirmed by independent mutagenesis and chemical cross-linking studies. These results suggest that our model captures key components of the organization principles of β-barrel membrane protein assembly.  相似文献   

8.
Prediction of protein domain boundaries is an important step for the prediction of three-dimensional structure. The simple method PDP has been elaborated for prediction of the number and position of domain boundaries in multi-domain proteins by use of amino acid sequence alone. The method uses an optimized scale based on the statistics of appearance of amino acid residues at domain boundaries. Our method demonstrates promising results in comparison to other methods that do not use homologous sequences. From the database of proteins that are targets from CASP6 (Critical Assessment of Techniques for Protein Structure Prediction) our program correctly assigned the number of domains for approximately 80% of one domain proteins and approximately 50% for two-domain proteins. Our method offers three main advantages: it is very simple, it is fast, and it uses a minimal number of parameters in comparison with other methods.  相似文献   

9.
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.  相似文献   

10.
Several computational methods exist for the identification of transmembrane beta barrel proteins (TMBs) from sequence. Some of these methods also provide the transmembrane (TM) boundaries of the putative TMBs. The aim of this study is to (1) derive the propensities of the TM residues to be exposed to the lipid bilayer and (2) to predict the exposure status (i.e. exposed to the bilayer or hidden in protein structure) of TMB residues. Three novel propensity scales namely, BTMC, BTMI and HTMI were derived for the TMB residues at the hydrophobic core region of the outer membrane (OM), the lipid-water interface regions of the OM, and for the helical membrane proteins (HMPs) residues at the lipid-water interface regions of the inner membrane (IM), respectively. Separate propensity scales were derived for monomeric and functionally oligomeric TMBs. The derived propensities reflect differing physico-chemical properties of the respective membrane bilayer regions and were employed in a computational method for the prediction of the exposure status of TMB residues. Based on the these propensities, the conservation indices and the frequency profile of the residues, the transmembrane residues were classified into buried/exposed with an accuracy of 77.91% and 80.42% for the residues at the membrane core and the interface regions, respectively. The correlation of the derived scales with different physico-chemical properties obtained from the AAIndex database are also discussed. Knowledge about the residue propensities and burial status will be useful in annotating putative TMBs with unknown structure.  相似文献   

11.
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/  相似文献   

12.
Abstract Protein structure modelling offers a method of obtaining 3-dimensional information that can be tested and used to plan mutagenesis experiments when a crystallographically determined structure is not available. At its simplest a model may consist of little more than a secondary structure prediction coupled with a determination of the likely regions of transmembrane/membrane surface/globular configuration. These methods can yield an interesting topology map of the protein, which places the residues in their likely positions with respect to, for example, the membrane interface. If it is a member of a large family of related proteins then aligned protein sequences can be used to predict the residues that have an important function as these. will be largely conserved in the alignments. Using all these methods a model can be constructed (using for example, the Nicholson Molecular Modelling Kit) to visualize the proposed structure in three dimensions following the premise of good design, that is, avoiding obvious steric clashes, packing of helices in a realistic manner, observing the correct H-bond lengths, etc . In this latter exercise the review of Chothia ( Annu. Rev. Biochem . 53 , 537–572, 1984) of the principles of protein structure is particularly helpful as it clearly sets out how proteins pack and their preferred configuration. There is a wealth of information about individual amino acid conformational preferences and observed frequencies of occurrence in known protein structures, which can help decide how the residues in the model can be oriented.
In this article we have collated the various protein models of the bacterial light-harvesting complexes and present our own model, which is a synthesis of the available biophysical data and theoretical predictions, and show its performance in explaining recent results of site-directed mutants of the LHI and LH2 light-harvesting complexes of Rhodobacter sphaeroides .  相似文献   

13.
A computational model, IMP-TYPE, is proposed for the classification of five types of integral membrane proteins from protein sequence. The proposed model aims not only at providing accurate predictions but most importantly it incorporates interesting and transparent biological patterns. When contrasted with the best-performing existing models, IMP-TYPE reduces the error rates of these methods by 19 and 34% for two out-of-sample tests performed on benchmark datasets. Our empirical evaluations also show that the proposed method provides even bigger improvements, i.e., 29 and 45% error rate reductions, when predictions are performed for sequences that share low (40%) identity with sequences from the training dataset. We also show that IMP-TYPE can be used in a standalone mode, i.e., it duplicates significant majority of correct predictions provided by other leading methods, while providing additional correct predictions which are incorrectly classified by the other methods. Our method computes predictions using a Support Vector Machine classifier that takes feature-based encoded sequence as its input. The input feature set includes hydrophobic AA pairs, which were selected by utilizing a consensus of three feature selection algorithms. The hydrophobic residues that build up the AA pairs used by our method are shown to be associated with the formation of transmembrane helices in a few recent studies concerning integral membrane proteins. Our study also indicates that Met and Phe display a certain degree of hydrophobicity, which may be more crucial than their polarity or aromaticity when they occur in the transmembrane segments. This conclusion is supported by a recent study on potential of mean force for membrane protein folding and a study of scales for membrane propensity of amino acids.  相似文献   

14.
The method proposed for the evaluation of statistical weights in paper I, and the three-state model [alpha-helical (alpha), extended (epsilon), and other (c) states] formulated in paper II, have been used to develop a procedure to predict the backbone conformations of proteins, based on the concept of the predominant role played by shortrange interactions in determining protein conformation. Conformational probability profiles, in which the probabilities of formation of three consecutive alpha-helical conformations (triad) and of four consecutive extended conformations (tetrad) have been defined relative to their average values over the whole molecule, are calculated for 19 proteins, of which 16 had been used in paper I to evaluate the set of statistical weights of the 20 naturally occurring amino acids. By comparing these conformational probability profiles to experimental x-ray observations, the following results have been obtained: 80% of the alpha-helical regions and 72% of the extended conformational regions have been predicted correctly for the 19 proteins. The percentage of residues predicted correctly is in the range of 53 to 90% for the alpha-helical conformation and in the range of 63 to 88% for the extended conformation for the 19 proteins in the two-state models [alpha-helical (alpha) and other (c) states, and extended (epsilon) and other (c) states]. In the three-state model, the percentage of residues predicted correctly is in the range of 47% to 77 for 19 proteins. These results suggest that the assumption of the dominance of short-range interactions, on which the predictive scheme is based, is a reasonable one. The present predictive method is compared with that of other authors.  相似文献   

15.
Protein structure determination has long been one of the most challenging problems in molecular biology for the past 60 years. Here we present an ab initio protein tertiary-structure prediction method assisted by predicted contact maps from SPOT-Contact and predicted dihedral angles from SPIDER 3. These predicted properties were then fed to the crystallography and NMR system (CNS) for restrained structure modeling. The resulted structures are first evaluated by the potential energy calculated by CNS, followed by dDFIRE energy function for model selections. The method called SPOT-Fold has been tested on 241 CASP targets between 67 and 670 amino acid residues, 60 randomly selected globular proteins under 100 amino acids. The method has a comparable accuracy to other contact-map-based modeling techniques. © 2019 Wiley Periodicals, Inc.  相似文献   

16.
Understanding of proteins adaptive to hypersaline environment and identifying them is a challenging task and would help to design stable proteins. Here, we have systematically analyzed the normalized amino acid compositions of 2121 halophilic and 2400 non-halophilic proteins. The results showed that halophilic protein contained more Asp at the expense of Lys, Ile, Cys and Met, fewer small and hydrophobic residues, and showed a large excess of acidic over basic amino acids. Then, we introduce a support vector machine method to discriminate the halophilic and non-halophilic proteins, by using a novel Pearson VII universal function based kernel. In the three validation check methods, it achieved an overall accuracy of 97.7%, 91.7% and 86.9% and outperformed other machine learning algorithms. We also address the influence of protein size on prediction accuracy and found the worse performance for small size proteins might be some significant residues (Cys and Lys) were missing in the proteins.  相似文献   

17.
Due to advances in structural biology, an increasing number of protein structures of unknown function have been deposited in Protein Data Bank (PDB). These proteins are usually characterized by novel structures and sequences. Conventional comparative methodology (such as sequence alignment, structure comparison, or template search) is unable to determine their function. Thus, it is important to identify protein's function directly from its structure, but this is not an easy task. One of the strategies used is to analyze whether there are distinctive structure-derived features associated with functional residues. If so, one may be able to identify the functional residues directly from a single structure. Recently, we have shown that protein weighted contact number is related to atomic thermal fluctuations and can be used to derive motional correlations in proteins. In this report, we analyze the weighted contact-number profiles of both catalytic residues and non-catalytic residues for a dataset of 760 structures. We found that catalytic residues have distinct distributions of weighted contact numbers from those of non-catalytic residues. Using this feature, we are able to effectively differentiate catalytic residues from other residues with a single optimized threshold value. Our method is simple to implement and compares favourably with other more sophisticated methods. In addition, we discuss the physics behind the relationship between catalytic residues and their contact numbers as well as other features (such as residue centrality or B-factors) associated with catalytic residues.  相似文献   

18.
The growth and spread of drug resistance in bacteria have been well established in both mankind and beasts and thus is a serious public health concern. Due to the increasing problem of drug resistance, control of infectious diseases like diarrhea, pneumonia etc. is becoming more difficult. Hence, it is crucial to understand the underlying mechanism of drug resistance mechanism and devising novel solution to address this problem. Multidrug And Toxin Extrusion (MATE) proteins, first characterized as bacterial drug transporters, are present in almost all species. It plays a very important function in the secretion of cationic drugs across the cell membrane. In this work, we propose SVM based method for prediction of MATE proteins. The data set employed for training consists of 189 non-redundant protein sequences, that are further classified as positive (63 sequences) set comprising of sequences from MATE family, and negative (126 sequences) set having protein sequences from other transporters families proteins and random protein sequences taken from NCBI while in the test set, there are 120 protein sequences in all (8 in positive and 112 in negative set). The model was derived using Position Specific Scoring Matrix (PSSM) composition and achieved an overall accuracy 92.06%. The five-fold cross validation was used to optimize SVM parameter and select the best model. The prediction algorithm presented here is implemented as a freely available web server MATEPred, which will assist in rapid identification of MATE proteins.  相似文献   

19.
This paper describes an automated method for sequence-specific NMR assignment of the aliphatic resonances of protein side chains in small- and medium-sized globular proteins in aqueous solution. The method requires the recording of a five-dimensional (5D) automated projection spectroscopy (APSY-) NMR experiment and the subsequent analysis of the APSY peak list with the algorithm ALASCA (Algorithm for local and linear assignment of side chains from APSY data). The 5D APSY-HC(CC-TOCSY)CONH experiment yields 5D chemical shift correlations of aliphatic side chain C-H moieties with the backbone atoms H(N), N, and C'. A simultaneous variation of the TOCSY mixing times and the projection angles in this APSY-type TOCSY experiment gives access to all aliphatic C-H moieties in the 20 proteinogenic amino acids. The correlation peak list resulting from the 5D APSY-HC(CC-TOCSY)CONH experiment together with the backbone assignment of the protein under study is the sole input for the algorithm ALASCA that assigns carbon and proton resonances of protein side chains. The algorithm is described, and it is shown that the aliphatic parts of 17 of the 20 common amino acid side chains are assigned unambiguously, whereas the remaining three amino acids are assigned with a certainty of above 95%. The overall feasibility of the approach is demonstrated with the globular 116-residue protein TM1290, for which reference assignments are known. For this protein, 97% of the expected side chain carbon atoms and 87% of the expected side chain protons were detected with the 5D APSY-HC(CC-TOCSY)CONH experiment in 24 h of spectrometer time, and all these resonances were correctly assigned by ALASCA. Based on the experience with TM1290, we expect that the approach presented in this work is routinely applicable to globular proteins with sizes up to at least 120 amino acids.  相似文献   

20.
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号