首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

2.
Efforts to use computers in predicting the secondary structure of proteins based only on primary structure information started over a quarter century ago [1-3]. Although the results were encouraging initially, the accuracy of the pioneering methods generally did not attain the level required for using predictions of secondary structures reliably in modelling the three-dimensional topology of proteins. During the last decade, however, the introduction of new computational techniques as well as the use of multiple sequence information has lead to a dramatic increase in the success rate of prediction methods, such that successful 3D modelling based on predicted secondary structure has become feasible [e.g., Ref 4]. This review is aimed at presenting an overview of the scale of the secondary structure prediction problem and associated pitfalls, as well as the history of the development of computational prediction methods. As recent successful strategies for secondary structure prediction all rely on multiple sequence information, some methods for accurate protein multiple sequence alignments will also be described. While the main focus is on prediction methods for globular proteins, also the prediction of trans-membrane segments within membrane proteins will be briefly summarised. Finally, an integrated iterative approach tying secondary structure prediction and multiple alignment will be introduced [5].  相似文献   

3.
Modern protein secondary structure prediction methods are based on exploiting evolutionary information contained in multiple sequence alignments. Critical steps in the secondary structure prediction process are (i) the selection of a set of sequences that are homologous to a given query sequence, (ii) the choice of the multiple sequence alignment method, and (iii) the choice of the secondary structure prediction method. Because of the close relationship between these three steps and their critical influence on the prediction results, secondary structure prediction has received increased attention from the bioinformatics community over the last few years. In this treatise, we discuss recent developments in computational methods for protein secondary structure prediction and multiple sequence alignment, focus on the integration of these methods, and provide some recommendations for state-of-the-art secondary structure prediction in practice.  相似文献   

4.
Structural Chemistry - Determining protein structures plays an important role in the field of drug design. Currently, the machine learning methods including artificial neural network (ANN) and...  相似文献   

5.
RNA-binding proteins (RBPs) perform fundamental and diverse functions within the cell. Approximately 15% of proteins sequences are annotated as RNA-binding, but with a significant number of proteins without functional annotation, many RBPs are yet to be identified. A percentage of uncharacterised proteins can be annotated by transferring functional information from proteins sharing significant sequence homology. However, genomes contain a significant number of orphan open reading frames (ORFs) that do not share significant sequence similarity to other ORFs, but correspond to functional proteins. Hence methods for protein function annotation that go beyond sequence homology are essential. One method of annotation is the identification of ligands that bind to proteins, through the characterisation of binding site residues. In the current work RNA-binding residues (RBRs) are characterised in terms of their evolutionary conservation and the patterns they form in sequence space. The potential for such characteristics to be used to identify RBPs from sequence is then evaluated.In the current work the conservation of residues in 261 RBPs is compared for (a) RBRs vs. non-RBRs surface residues, and for (b) specific and non-specific RBRs. The analysis shows that RBRs are more conserved than other surface residues, and RBRs hydrogen-bonded to the RNA backbone are more conserved than those making hydrogen bonds to RNA bases. This observed conservation of RBRs was then used to inform the construction of RBR sequence patterns from known protein–RNA structures. A series of RBR patterns were generated for a case study protein aspartyl-tRNA synthetase bound to tRNA; and used to differentiate between RNA-binding and non-RNA-binding protein sequences. Six sequence patterns performed with high precision values of >80% and recall values 7 times that of an homology search. When the method was expanded to the complete dataset of 261 proteins, many patterns were of poor predictive value, as they had not been manipulated on a family-specific basis. However, two patterns with precision values ≥85% were used to make function predictions for a set of hypothetical proteins. This revealed a number of potential RBPs that require experimental verification.  相似文献   

6.
For predicting solvent accessibility from the sequence of amino acids in proteins, we use a logistic function trained on a non-redundant protein database. Using a principal component analysis, we find that the prediction can be considered, in a good approximation, as a monofactorial problem: a crossed effect of the burial propensity of amino acids and of their locations at positions flanking the amino acid of interest. Complementary effects depend on the presence of certain amino acids (mostly P, G and C) at given positions. We have refined the predictive model (1) by adding supplementary input data, (2) by using a strategy of prediction correction and (3) by adapting the decision rules according to the amino acid type. We obtain a best score of 77.6% correct prediction for a relative accessibility of 9%. However, compared to trivial strategy only based upon the frequencies of buried or exposed residues, the gain is less than 4%. Received: 4 June 1998 / Accepted: 17 September 1998 / Published online: 10 December 1998  相似文献   

7.
Solvent accessibility prediction from amino acid sequences has been pursued by several researchers. Such a prediction typically starts by transforming the amino acid category (or type) information into numerical representations. All twenty amino acids can be completely and uniquely represented by 20-dimensional vectors. Here, we investigate if the amino acid space defined in this way really requires twenty dimensions. We tried to develop corresponding representations in fewer dimensions. A method for searching optimal codification schema in an arbitrary space using neural networks was developed. The method is used to obtain optimal encoding of amino acids at various levels of dimensionality, and applied to optimize the amino acid codifications for the prediction of the solvent accessibility values of the proteins using feed-forward neural networks. The traditional 20-dimensional codification seems to be redundant in solving the solvent accessibility prediction problem, since a 1-dimensional codification is able to achieve almost the same degree of accuracy as the 20-dimensional codification. Optimal coding in much fewer dimensions could be used to make the predictions of accessible surface area with almost the same degree of accuracy as that obtained by a fully unique 20-dimensional coding. The 1-dimensional amino acid codification for solvent accessibility prediction obtained by a purely mathematical way based on neural networks is highly correlated with a physical property of the amino acids, namely their average solvent accessibility. The method developed to find the optimal codification is general, although the codification thus produced is dependent on the type of estimated property.  相似文献   

8.
We have developed an efficient and reliable methodology for crystal structure prediction, merging ab initio total-energy calculations and a specifically devised evolutionary algorithm. This method allows one to predict the most stable crystal structure and a number of low-energy metastable structures for a given compound at any P-T conditions without requiring any experimental input. Extremely high (nearly 100%) success rate has been observed in a few tens of tests done so far, including ionic, covalent, metallic, and molecular structures with up to 40 atoms in the unit cell. We have been able to resolve some important problems in high-pressure crystallography and report a number of new high-pressure crystal structures (stable phases: epsilon-oxygen, new phase of sulphur, new metastable phases of carbon, sulphur and nitrogen, stable and metastable phases of CaCO3). Physical reasons for the success of this methodology are discussed.  相似文献   

9.
All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) re-introducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.  相似文献   

10.
Literature contains over fifty years of accumulated methods proposed by researchers for predicting the secondary structures of proteins in silico. A large part of this collection is comprised of artificial neural network-based approaches, a field of artificial intelligence and machine learning that is gaining increasing popularity in various application areas. The primary objective of this paper is to put together the summary of works that are important but sparse in time, to help new researchers have a clear view of the domain in a single place. An informative introduction to protein secondary structure and artificial neural networks is also included for context. This review will be valuable in designing future methods to improve protein secondary structure prediction accuracy. The various neural network methods found in this problem domain employ varying architectures and feature spaces, and a handful stand out due to significant improvements in prediction. Neural networks with larger feature scope and higher architecture complexity have been found to produce better protein secondary structure prediction. The current prediction accuracy lies around the 84% marks, leaving much room for further improvement in the prediction of secondary structures in silico. It was found that the estimated limit of 88% prediction accuracy has not been reached yet, hence further research is a timely demand.  相似文献   

11.
12.
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/  相似文献   

13.
Predicting RNA secondary structure using evolutionary history can be carried out by using an alignment of related RNA sequences with conserved structure. Accurately determining evolutionary substitution rates for base pairs and single stranded nucleotides is a concern for methods based on this type of approach. Determining these rates can be hard to do reliably without a large and accurate initial alignment, which ideally also has structural annotation. Hence, one must often apply rates extracted from other RNA families with trusted alignments and structures. Here, we investigate this problem by applying rates derived from tRNA and rRNA to the prediction of the much more rapidly evolving 5'-region of HIV-1. We find that the HIV-1 prediction is in agreement with experimental data, even though the relative evolutionary rate between A and G is significantly increased, both in stem and loop regions. In addition we obtained an alignment of the 5' HIV-1 region that is more consistent with the structure than that currently in the database. We added randomized noise to the original values of the rates to investigate the stability of predictions to rate matrix deviations. We find that changes within a fairly large range still produce reliable predictions and conclude that using rates from a limited set of RNA sequences is valid over a broader range of sequences.  相似文献   

14.
NHS-biotin modification as a specific lysine probe coupled to mass spectrometry detection is increasingly used over the past years for assessing amino acid accessibility of proteins or complexes as an alternative when well-established methods are challenged. We present a strategy based on usage in parallel of three commercially available reagents (Sulfo-NHS-biotin, Sulfo-NHS-LC-biotin, and Sulfo-NHS-LC-LC-biotin) to efficiently assess the solvent accessibility of amino acids using MALDI-TOF mass spectrometry. The same qualitative pattern of reactivity was observed for these three reagents on the THUMPalpha protein at four reagent/polypeptide molar ratios (2 : 1, 6 : 1, 13 : 1, and 26 : 1). Peptide assignment of the detected ions gains in accuracy because of the triple redundancy due to specific increments of monoisotopic mass. These reagents are a good alternative to isotope labeling when using only a single MALDI-TOF mass spectrometer. We observed that hydroxyl groups of serine and tyrosine residues were also modified by these Sulfo-NHS-biotin reagents. The low amount of protein required and the method's simplicity make this procedure accessible and affordable in order to obtain topological information on proteins difficult to purify. This method was used to identify two lysine residues of the TrmG10 methyltransferase from Pyrococcus abyssi that were differentially reactive, modified in the protein but not in the tRNA-protein complex.  相似文献   

15.
We propose a method for predicting RNA base pairing which imposes no restrictions on the order of base pairs, allows for pseudoknots and runs in O(mN2) time for N base pairs and m iterations. It employs a self‐consistent mean field method in which all base pairs are possible, but with each iteration, the most energetically favored base pairs become more likely as long as they are consistent with their neighbors. Performance was compared against three other programs using three test sets. Sensitivity varied from 20% to 74% and specificity from 44% to 77% and generally, the method predicts too many base pairs leading to good sensitivity and worse specificity. The predicted structures have excellent energies suggesting that, algorithmically, the method performs well, but the classic literature energy models may not be appropriate when pseudoknots are permitted. Website and source code for the simulations are available at http://cardigan.zbh.uni‐hamburg.de/~rnascmf . © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

16.
The identification of RNA secondary structure has been an important tool for the characterization of nucleic acids. Computational structure prediction has been an effective approach toward this end, but improvement of established methods is often slow and reliant on redundant methodology. Here we present a novel consensus scoring approach, created to incorporate inputs from an array of established methods with the goal of producing outputs that contain mutual structures from these programs. This method is implemented in RNAdemocracy, a python program capable of competing with existing methods. This ensemble approach was limited by commonalities in established methods like parameter sourcing, which may lead to agreement error, an unavoidable outcome due to the limit of available RNA structure datasets. The modular construction of RNAdemocracy allows for its easy upgrading and customization to suit user’s needs. RNAdemocracy, while capable of accurate predictions, is best suited to guide users to regions of the sequence space that exhibit agreement instead of a totally reliant predictor of structure. It is also capable of grading predictions for potential accuracy by providing a percentage of consensus between contributing methods in the final structure.  相似文献   

17.
DNA-templated organic synthesis enables the translation, selection, and amplification of DNA sequences encoding synthetic small-molecule libraries. As the size of DNA-templated libraries increases, the possibility of forming intramolecularly base-paired structures within templates that impede templated reactions increases as well. To achieve uniform reactivity across many template sequences and to computationally predict and remove any problematic sequences from DNA-templated libraries, we have systematically examined the effects of template sequence and secondary structure on DNA-templated reactivity. By testing a series of template sequences computationally designed to contain different degrees of internal secondary structure, we observed that high levels of predicted secondary structure involving the reagent binding site within a DNA template interfere with reagent hybridization and impair reactivity, as expected. Unexpectedly, we also discovered that templates containing virtually no predicted internal secondary structure also exhibit poor reaction efficiencies. Further studies revealed that a modest degree of internal secondary structure is required to maximize effective molarities between reactants, possibly by compacting intervening template nucleotides that separate the hybridized reactants. Therefore, ideal sequences for DNA-templated synthesis lie between two undesirable extremes of too much or too little internal secondary structure. The relationship between effective molarity and intervening nucleic acid secondary structure described in this work may also apply to nucleic acid sequences in living systems that separate interacting biological molecules.  相似文献   

18.
We developed a method, called RNA Assembler using Secondary Structure Information Effectively (RASSIE), for predicting RNA tertiary structures using known secondary structure information. We attempted a fragment assembly-based method that uses a secondary structure-based fragment library. For several typical target structures such as stem-loops, bulge-loops, and 2-way junctions, our method provided numerous good quality candidate structures in less computational time than previously proposed methods. By using a high-resolution potential energy function, we were able to select good predicted structures from candidate structures. This method of efficient conformational search and detailed structure evaluation using high-resolution potential is potentially useful for the tertiary structure prediction of RNA.  相似文献   

19.
Abstract

The stabilization energy for the secondary structures of wild-type hammerhead and mutant ribozymes has been calculated at different salt conditions and temperatures by using the thermodynamic parameters for RNA structure prediction. The most stable structure at each condition has been searched and the obtained secondary structure is compared with the structure suggested phylogenetically or experimentally. The results indicate that the hammerhead-type secondary structure of the ribozyme and its reactivity correlate with each other. The multibranched loop containing the self-cleavage site of the ribozyme particularly should be a key structure in the hammerhead ribozyme reaction. The predicted secondary structures also suggest that the reactivity of the hammerhead ribozyme should be very much lower at 10°C than that at 37°C.  相似文献   

20.
Unlike all-helices membrane proteins, beta-barrel membrane proteins can not be successfully discriminated from other proteins, especially from all-beta soluble proteins. This paper performs an analysis on the amino acid composition in membrane parts of 12 beta-barrel membrane proteins versus beta-strands of 79 all-beta soluble proteins. The average and variance of the amino acid composition in these two classes are calculated. Amino acids such as Gly, Asn, Val that are most likely associated with classification are selected based on Fishers discriminant ratio. A linear classifier built with these selected amino acids composition in observed beta-strands achieves 100% classification accuracy for 12 membrane proteins and 79 soluble proteins in a four-fold cross-validation experiment. Since at present the accuracy of secondary structure prediction is quite high, a promising method to identify beta-barrel membrane proteins is presented based on the linear classifier coupled with predicted secondary structure. Applied to 241 beta-barrel membrane proteins and 3855 soluble proteins with various structures, the method achieves 85.48% (206/241) sensitivity and 92.53% specificity (3567/3855).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号