共查询到20条相似文献,搜索用时 0 毫秒
1.
Secondary structure motifs in nucleic acid probes generally impair intended hybridization reactions and so efforts to predict and avoid such structures are commonly employed in probe design schemes. Another key facet of probe design that has received much less attention, however, is that secondary structure at targeted probe binding site regions may also impair hybridization. Thus, evaluation of both probe and target site secondary structures together should improve hybridization prediction and design effectiveness. Several challenges confound this goal, including imperfect empirical rules and parameters underlying predictions and the fact that folding algorithms scale poorly with respect to sequence length. Here, we attempt to quantify the consequences of target site structure on predicted hybridization using sequences sampled from the human genome. We also provide a methodology for choosing a reasonable “window size” around target sites that is as small as possible without compromising folding algorithm prediction accuracy. 相似文献
2.
Glycoproteins are important biomolecules with a diverse array of structural and signaling functions in biology. Determination of glycoprotein secondary structure is becoming increasingly important in aiding the understanding of how these molecules function in biological environments and disease. Furthermore, glycoproteins such as mucins are being evaluated in various nano-engineering processes that require knowledge of how the underlying secondary structure might alter in different target environments. We have developed an analytical procedure for predicting the secondary structures of glycoprotein using ATR-FTIR on dry film. Using Bovine submaxillary mucin (BSM) as a glycoprotein model, we determined the additive infrared spectral pattern of acetyl amino sugars and amino acids that could contribute to the absorbance in the Amide I band of BSM through empirical data. We show through subtraction of these spectra how the absorbance pattern of the protein backbone can be determined in order to predict glycoprotein secondary structure. The analysis predicted a predominant pattern of random coil, beta sheet and beta turn secondary structure for BSM after carbohydrate and amino acid spectral subtraction in agreement with other methods. Our relatively simple approach can be applied to predict secondary structure in other glycoproteins. 相似文献
3.
Nucleic acid secondary structure models usually exclude pseudoknots due to the difficulty of treating these nonnested structures efficiently in structure prediction and partition function algorithms. Here, the standard secondary structure energy model is extended to include the most physically relevant pseudoknots. We describe an O(N(5)) dynamic programming algorithm, where N is the length of the strand, for computing the partition function and minimum energy structure over this class of secondary structures. Hence, it is possible to determine the probability of sampling the lowest energy structure, or any other structure of particular interest. This capability motivates the use of the partition function for the design of DNA or RNA molecules for bioengineering applications. 相似文献
4.
We propose a method for predicting RNA base pairing which imposes no restrictions on the order of base pairs, allows for pseudoknots and runs in O(mN2) time for N base pairs and m iterations. It employs a self‐consistent mean field method in which all base pairs are possible, but with each iteration, the most energetically favored base pairs become more likely as long as they are consistent with their neighbors. Performance was compared against three other programs using three test sets. Sensitivity varied from 20% to 74% and specificity from 44% to 77% and generally, the method predicts too many base pairs leading to good sensitivity and worse specificity. The predicted structures have excellent energies suggesting that, algorithmically, the method performs well, but the classic literature energy models may not be appropriate when pseudoknots are permitted. Website and source code for the simulations are available at http://cardigan.zbh.uni‐hamburg.de/~rnascmf . © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010 相似文献
5.
In this paper, the support vector machine was trained to grasp the relationship between the pair-coupled amino acid composition and the content of protein secondary structural elements, including -helix, 310-helix, π-helix, β-strand, β-bridge, turn, bend and the rest random coil. Self-consistency and cross validation tests were made to assess the performance of our method. Results superior to or competitive with the popular theoretical and experimental methods have been obtained. 相似文献
6.
Literature contains over fifty years of accumulated methods proposed by researchers for predicting the secondary structures of proteins in silico. A large part of this collection is comprised of artificial neural network-based approaches, a field of artificial intelligence and machine learning that is gaining increasing popularity in various application areas. The primary objective of this paper is to put together the summary of works that are important but sparse in time, to help new researchers have a clear view of the domain in a single place. An informative introduction to protein secondary structure and artificial neural networks is also included for context. This review will be valuable in designing future methods to improve protein secondary structure prediction accuracy. The various neural network methods found in this problem domain employ varying architectures and feature spaces, and a handful stand out due to significant improvements in prediction. Neural networks with larger feature scope and higher architecture complexity have been found to produce better protein secondary structure prediction. The current prediction accuracy lies around the 84% marks, leaving much room for further improvement in the prediction of secondary structures in silico. It was found that the estimated limit of 88% prediction accuracy has not been reached yet, hence further research is a timely demand. 相似文献
7.
RNA structure comparison is a fundamental problem in structural biology, structural chemistry, and bioinformatics. It can be used for analysis of RNA energy landscapes, conformational switches, and facilitating RNA structure prediction. The purpose of our integrated tool RNACluster is twofold: to provide a platform for computing and comparison of different distances between RNA secondary structures, and to perform cluster identification to derive useful information of RNA structure ensembles, using a minimum spanning tree (MST) based clustering algorithm. RNACluster employs a cluster identification approach based on a MST representation of the RNA ensemble data and currently supports six distance measures between RNA secondary structures. RNACluster provides a user-friendly graphical interface to allow a user to compare different structural distances, analyze the structure ensembles, and visualize predicted structural clusters. 相似文献
8.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences. 相似文献
9.
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/ 相似文献
10.
11.
Aloy P Mas JM Martí-Renom MA Querol E Avilés FX Oliva B 《Journal of computer-aided molecular design》2000,14(1):83-92
Knowledge-based energy profiles combined with secondary structure prediction have been applied to molecular modelling refinement. To check the procedure, three different models of human procarboxypeptidase A2 (hPCPA2) have been built using the 3D structures of procarboxypeptidase A1 (pPCPA1) and bovine procarboxypeptidase A (bPCPA) as templates. The results of the refinement can be tested against the X-ray structure of hPCPA2 which has been recently determined. Regions miss-modelled in the activation segment of hPCPA2 were detected by means of pseudo-energies using Prosa II and modified afterwards according to the secondary structure prediction. Moreover, models obtained by automated methods as COMPOSER, MODELLER and distance restraints have also been compared, where it was found possible to find out the best model by means of pseudo-energies. Two general conclusions can be elicited from this work: (1) on a given set of putative models it is possible to distinguish among them the one closest to the crystallographic structure, and (2) within a given structure it is possible to find by means of pseudo-energies those regions that have been defectively modelled. 相似文献
12.
The identification of RNA secondary structure has been an important tool for the characterization of nucleic acids. Computational structure prediction has been an effective approach toward this end, but improvement of established methods is often slow and reliant on redundant methodology. Here we present a novel consensus scoring approach, created to incorporate inputs from an array of established methods with the goal of producing outputs that contain mutual structures from these programs. This method is implemented in RNAdemocracy, a python program capable of competing with existing methods. This ensemble approach was limited by commonalities in established methods like parameter sourcing, which may lead to agreement error, an unavoidable outcome due to the limit of available RNA structure datasets. The modular construction of RNAdemocracy allows for its easy upgrading and customization to suit user’s needs. RNAdemocracy, while capable of accurate predictions, is best suited to guide users to regions of the sequence space that exhibit agreement instead of a totally reliant predictor of structure. It is also capable of grading predictions for potential accuracy by providing a percentage of consensus between contributing methods in the final structure. 相似文献
13.
14.
15.
A statistical analytical approach has been used to analyze the secondary structure (SS) of amino acids as a function of the
sequence of amino acid residues. We have used 306 non-homologous best-resolved protein structures from the Protein Data Bank
for the analysis. A sequence region of 32 amino acids on either side of the residue is considered in order to calculate single
amino acid propensities, di-amino acid potentials and tri-amino acid potentials. A weighted sum of predictions obtained using
these properties is used to suggest a final prediction method. Our method is as good as the best-known SS prediction methods,
is the simplest of all the methods, and uses no homologous sequence/family alignment data, yet gives 72% SS prediction accuracy.
Since the method did not use many other factors that may increase the prediction accuracy there is scope to achieve greater
accuracy using this approach.
Received: 4 May 1998 / Accepted: 17 September 1998 / Published online: 10 December 1998 相似文献
16.
The presence of repetitive or non-unique DNA persisting over sizable regions of a eukaryotic genome can hinder the genome's successful de novo assembly from short reads: ambiguities in assigning genome locations to the non-unique subsequences can result in premature termination of contigs and thus overfragmented assemblies. Fungal mitochondrial (mtDNA) genomes are compact (typically less than 100 kb), yet often contain short non-unique sequences that can be shown to impede their successful de novo assembly in silico. Such repeats can also confuse processes in the cell in vivo. A well-studied example is ectopic (out-of-register, illegitimate) recombination associated with repeat pairs, which can lead to deletion of functionally important genes that are located between the repeats. Repeats that remain conserved over micro- or macroevolutionary timescales despite such risks may indicate functionally or structurally (e.g., for replication) important regions. This principle could form the basis of a mining strategy for accelerating discovery of function in genome sequences. We present here our screening of a sample of 11 fully sequenced fungal mitochondrial genomes by observing where exact k-mer repeats occurred several times; initial analyses motivated us to focus on 17-mers occurring more than three times. Based on the diverse repeats we observe, we propose that such screening may serve as an efficient expedient for gaining a rapid but representative first insight into the repeat landscapes of sparsely characterized mitochondrial chromosomes. Our matching of the flagged repeats to previously reported regions of interest supports the idea that systems of persisting, non-trivial repeats in genomes can often highlight features meriting further attention. 相似文献
17.
SSThread: Template‐free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs 下载免费PDF全文
Kevin J. Maurice 《Journal of computational chemistry》2014,35(8):644-656
Acquiring the three‐dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template‐free algorithm is described here that consists of making several predictions of contacting pairs of α‐helices and β‐strands derived from a database of experimental structures using a knowledge‐based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β‐strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages. © 2014 Wiley Periodicals, Inc. 相似文献
18.
PCASSO: A fast and efficient Cα‐based method for accurately assigning protein secondary structure elements 下载免费PDF全文
Sean M. Law Aaron T. Frank Charles L. Brooks III 《Journal of computational chemistry》2014,35(24):1757-1761
Proteins are often characterized in terms of their primary, secondary, tertiary, and quaternary structure. Algorithms such as define secondary structure of proteins (DSSP) can automatically assign protein secondary structure based on the backbone hydrogen‐bonding pattern. However, the assignment of secondary structure elements (SSEs) becomes a challenge when only the Cα coordinates are available. In this work, we present protein C‐alpha secondary structure output (PCASSO), a fast and accurate program for assigning protein SSEs using only the Cα positions. PCASSO achieves ~95% accuracy with respect to DSSP and takes ~0.1 s using a single processor to analyze a 1000 residue system with multiple chains. Our approach was compared with current state‐of‐the‐art Cα‐based methods and was found to outperform all of them in both speed and accuracy. A practical application is also presented and discussed. © 2014 Wiley Periodicals, Inc. 相似文献
19.
At present, tertiary structure discovery growth rate is lagging far behind discovery of primary structure. The prediction of protein structural class using Machine Learning techniques can help reduce this gap. The Structural Classification of Protein – Extended (SCOPe 2.07) is latest and largest dataset available at present. The protein sequences with less than 40% identity to each other are used for predicting α, β, α/β and α + β SCOPe classes. The sensitive features are extracted from primary and secondary structure representations of Proteins. Features are extracted experimentally from secondary structure with respect to its frequency, pitch and spatial arrangements. Primary structure based features contain species information for a protein sequence. The species parameters are further validated with uniref100 dataset using TaxId. As it is known, protein tertiary structure is manifestation of function. Functional differences are observed in species. Hence, the species are expected to have strong correlations with structural class, which is discovered in current work. It enhances prediction accuracy by 7%–10%. The subset of SCOPe 2.07 is trained using 65 dimensional feature vector using Random Forest classifier. The test result for the rest of the set gives consistent accuracy of better than 95%. The accuracy achieved on benchmark datasets ASTRAL 1.73, 25PDB and FC699 is better than 86%, 91% and 97% respectively, which is best reported to our knowledge. 相似文献
20.
Perczel A Jákli I Csizmadia IG 《Chemistry (Weinheim an der Bergstrasse, Germany)》2003,9(21):5332-5342
Different protein architectures show strong similarities regardless of their amino acid composition: the backbone folds of the different secondary structural elements exhibit nearly identical geometries. To investigate the principles of folding and stability properties, oligopeptide models (that is, HCO-(NH-L-CHR-CO)(n)-NH(2)) have been studied. Previously, ab initio structure determinations have provided a small amount of information on the conformational building units of di- and tripeptides. A maximum of nine differently folded backbone types is available for any natural alpha-amino acid residue, with the exception of proline. All of these conformers have different relative energies. The present study compiles an ab inito database of optimized HCO-(L-Xxx)(n)-NH(2) structures, where 1相似文献