首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
According to the three classifications of nucleotides, we introduce a sort of binary coding method of RNA secondary structures. On the basis of this representation, we can reduce a RNA secondary structure into three binary digit sequences. We also propose coding rules based on the exclusive‐OR operation. Associating with the proposed coding rules, we can judge the mutation between bases or between base and base pair, and make sequence alignment easily. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

2.
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.  相似文献   

3.
Constrained sequence alignment has been studied extensively in the past. Different forms of constraints have been investigated, where a constraint can be a subsequence, a regular expression, or a probability matrix of symbols and positions. However, constrained structural alignment has been investigated to a much lesser extent. In this paper, we present an efficient method for constrained structural alignment and apply the method to detecting conserved secondary structures, or structural motifs, in a set of RNA molecules. The proposed method combines both sequence and structural information of RNAs to find an optimal local alignment between two RNA secondary structures, one of which is a query and the other is a subject structure in the given set. The method allows a biologist to annotate conserved regions, or constraints, in the query RNA structure and incorporates these regions into the alignment process to obtain biologically more meaningful alignment scores. A statistical measure is developed to assess the significance of the scores. Experimental results based on detecting internal ribosome entry sites in the RNA molecules of hepatitis C virus and Trypanosoma brucei demonstrate the effectiveness of the proposed method and its superiority over existing techniques.  相似文献   

4.
RNA structure is hierarchical. Secondary structure contacts, i.e. the canonical base pair contacts, are generally stronger and form faster than the tertiary structure. Therefore, RNA secondary structures can be predicted independently of tertiary structure prediction. Furthermore, the stability of a given RNA secondary structure can be quantified using nearest neighbor free energy parameters. These parameters are the basis of a number of free energy minimization algorithms that predict RNA secondary structure for either a single sequence or multiple sequences. This article reviews the progress of RNA secondary structure prediction by free energy minimization and describes many of the algorithms that have been developed.  相似文献   

5.
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws.  相似文献   

6.
Rift Valley fever virus (RVFV) is a potent human and livestock pathogen endemic to sub-Saharan Africa and the Arabian Peninsula that has potential to spread to other parts of the world. Although there is no proven effective and safe treatment for RVFV infections, a potential therapeutic target is the virally encoded nucleocapsid protein (N). During the course of infection, N binds to viral RNA, and perturbation of this interaction can inhibit viral replication. To gain insight into how N recognizes viral RNA specifically, we designed an algorithm that uses a distance matrix and multidimensional scaling to compare the predicted secondary structures of known N-binding RNAs, or aptamers, that were isolated and characterized in previous in vitro evolution experiment. These aptamers did not exhibit overt sequence or predicted structure similarity, so we employed bioinformatic methods to propose novel aptamers based on analysis and clustering of secondary structures. We screened and scored the predicted secondary structures of novel randomly generated RNA sequences in silico and selected several of these putative N-binding RNAs whose secondary structures were similar to those of known N-binding RNAs. We found that overall the in silico generated RNA sequences bound well to N in vitro. Furthermore, introduction of these RNAs into cells prior to infection with RVFV inhibited viral replication in cell culture. This proof of concept study demonstrates how the predictive power of bioinformatics and the empirical power of biochemistry can be jointly harnessed to discover, synthesize, and test new RNA sequences that bind tightly to RVFV N protein. The approach would be easily generalizable to other applications.  相似文献   

7.
We propose a 4-D representation of RNA secondary structures. The four-dimensional representation resolves structures’ degeneracy and avoids loss of information and the limitation that different structures correspond the same plot set (or presentation). The RNA pseudoknpts also can be represented as four-dimensional representations. Based on this representation, we outline an approach to compute the similarities between six RNA secondary structures for illustrating the utility of our approach.  相似文献   

8.
Consider the network of all secondary structures of a given RNA sequence, where nodes are connected when the corresponding structures have base pair distance one. The expected degree of the network is the average number of neighbors, where average may be computed with respect to the either the uniform or Boltzmann probability. Here, we describe the first algorithm, RNAexpNumNbors , that can compute the expected number of neighbors, or expected network degree, of an input sequence. For RNA sequences from the Rfam database, the expected degree is significantly less than the constrained minimum free energy structure, defined to have minimum free energy (MFE) over all structures consistent with the Rfam consensus structure. The expected degree of structural RNAs, such as purine riboswitches, paradoxically appears to be smaller than that of random RNA, yet the difference between the degree of the MFE structure and the expected degree is larger than that of random RNA. Expected degree does not seem to correlate with standard structural diversity measures of RNA, such as positional entropy and ensemble defect. The program RNAexpNumNbors is written in C, runs in cubic time and quadratic space, and is publicly available at http://bioinformatics.bc.edu/clotelab/RNAexpNumNbors . © 2014 Wiley Periodicals, Inc.  相似文献   

9.
We developed a method, called RNA Assembler using Secondary Structure Information Effectively (RASSIE), for predicting RNA tertiary structures using known secondary structure information. We attempted a fragment assembly-based method that uses a secondary structure-based fragment library. For several typical target structures such as stem-loops, bulge-loops, and 2-way junctions, our method provided numerous good quality candidate structures in less computational time than previously proposed methods. By using a high-resolution potential energy function, we were able to select good predicted structures from candidate structures. This method of efficient conformational search and detailed structure evaluation using high-resolution potential is potentially useful for the tertiary structure prediction of RNA.  相似文献   

10.
According to the characterization of RNA secondary structures, the RNA secondary structures are transformed into elementary sequences, namely characteristic sequences of RNA secondary structures, by representing A, U, G, C in A-U/ G-C pairs, as A′, U′, G′, C′. Based on the representation, three recurrences for mapping RNA secondary structures into 1-D graph, 2-D graph and 3-D graph are given, respectively. Furthermore, a frequency-based method for RNA secondary structures is given in terms of 1-D graph.  相似文献   

11.
In this article, we proposed a 3D representation of RNA secondary structures. Based on this representation, we outline an approach by constructing a 3-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with RNA secondary structure. The examination of similarities/dissimilarities among the secondary structure at the 3’-terminus of different viruses illustrates the utility of the approach.  相似文献   

12.
Because of the branching arising from partial self-complementarity, long single-stranded (ss) RNA molecules are significantly more compact than linear arrangements (e.g., denatured states) of the same sequence of monomers. To elucidate the dependence of compactness on the nature and extent of branching, we represent ssRNA secondary structures as tree graphs which we treat as ideal branched polymers, and use a theorem of Kramers for evaluating their root-mean-square radius of gyration, ?R(g)=√R(g)(2). We consider two sets of sequences--random and viral--with nucleotide sequence lengths (N) ranging from 100 to 10,000. The RNAs of icosahedral viruses are shown to be more compact (i.e., to have smaller ?R(g)) than the random RNAs. For the random sequences we find that ?R(g) varies as N(1/3). These results are contrasted with the scaling of ?R(g) for ideal randomly branched polymers (N(1/4)), and with that from recent modeling of (relatively short, N ≤ 161) RNA tertiary structures (N(2/5)).  相似文献   

13.
Methods of artificial evolution such as SELEX and in vitro selection have made it possible to isolate RNA and DNA motifs with a wide range of functions from large random sequence libraries. Once the primary sequence of a functional motif is known, the sequence space around it can be comprehensively explored using a combination of random mutagenesis and selection. However, methods to explore the sequence space of a secondary structure are not as well characterized. Here we address this question by describing a method to construct libraries in a single synthesis which are enriched for sequences with the potential to form a specific secondary structure, such as that of an aptamer, ribozyme, or deoxyribozyme. Although interactions such as base pairs cannot be encoded in a library using conventional DNA synthesizers, it is possible to modulate the probability that two positions will have the potential to pair by biasing the nucleotide composition at these positions. Here we show how to maximize this probability for each of the possible ways to encode a pair (in this study defined as A-U or U-A or C-G or G-C or G.U or U.G). We then use these optimized coding schemes to calculate the number of different variants of model stems and secondary structures expected to occur in a library for a series of structures in which the number of pairs and the extent of conservation of unpaired positions is systematically varied. Our calculations reveal a tradeoff between maximizing the probability of forming a pair and maximizing the number of possible variants of a desired secondary structure that can occur in the library. They also indicate that the optimal coding strategy for a library depends on the complexity of the motif being characterized. Because this approach provides a simple way to generate libraries enriched for sequences with the potential to form a specific secondary structure, we anticipate that it should be useful for the optimization and structural characterization of functional nucleic acid motifs.  相似文献   

14.
DNA/RNA chromatography presents a versatile platform for the analysis of nucleic acids. Although the mechanism of separation of double stranded (ds) DNA fragments is largely understood, the mechanism by which RNA is separated appears more complicated. To further understand the separation mechanisms of RNA using ion pair reverse phase liquid chromatography, we have analysed a number of dsRNA and single stranded (ss) RNA fragments. The high-resolution separation of dsRNA was observed, in a similar manner to dsDNA under non-denaturing conditions. Moreover, the high-resolution separation of ssRNA was observed at high temperatures (75 °C) in contrast to ssDNA. It is proposed that the presence of duplex regions/secondary structures within the RNA remain at such temperatures, resulting in high-resolution RNA separations. The retention time of the nucleic acids reflects the relative hydrophobicity, through contributions of the nucleic sequence and the degree of secondary structure present. In addition, the analysis of RNA using such approaches was extended to enable the discrimination of bacterial 16S rRNA fragments and as an aid to conformational analysis of RNA. RNA:RNA interactions of the human telomerase RNA component (hTR) were analysed in conjunction with the incorporation of Mg2+ during chromatography. This novel chromatographic procedure permits analysis of the temperature dependent formation of dimeric RNA species.  相似文献   

15.
Based on the concepts of cell and system of graphical representation, a class of 2D graphical representations of RNA secondary structures are given in terms of classifications of bases of nucleic acids. The representations can completely avoid loss of information associated with crossing and overlapping of the corresponding curve. As an application, we make quantitative comparisons for a set of RNA secondary structures at the 3'-terminus of different viruses based on the graphical representations. The examination of similarities/dissimilarities illustrates the utility of the approach.  相似文献   

16.
E Unus pluribum, or "Of One, Many", may be at the root of decoding the RNA sequence-structure-function relationship. RNAs embody the large majority of genes in higher eukaryotes and fold in a sequence-directed fashion into three-dimensional structures that perform functions conserved across all cellular life forms, ranging from regulating to executing gene expression. While it is the most important determinant of the RNA structure, the nucleotide sequence is generally not sufficient to specify a unique set of secondary and tertiary interactions due to the highly frustrated nature of RNA folding. This frustration results in folding heterogeneity, a common phenomenon wherein a chemically homogeneous population of RNA molecules folds into multiple stable structures. Often, these alternative conformations constitute misfolds, lacking the biological activity of the natively folded RNA. Intriguingly, a number of RNAs have recently been described as capable of adopting multiple distinct conformations that all perform, or contribute to, the same function. Characteristically, these conformations interconvert slowly on the experimental timescale, suggesting that they should be regarded as distinct native states. We discuss how rugged folding free energy landscapes give rise to multiple native states in the Tetrahymena Group I intron ribozyme, hairpin ribozyme, sarcin-ricin loop, ribosome, and an in vitro selected aptamer. We further describe the varying degrees to which folding heterogeneity impacts function in these RNAs, and compare and contrast this impact with that of heterogeneities found in protein folding. Embracing that one sequence can give rise to multiple native folds, we hypothesize that this phenomenon imparts adaptive advantages on any functionally evolving RNA quasispecies.  相似文献   

17.
The graphical representation of biological sequences is an important subject in the area of genome studies. We propose a novel visual representation for RNA secondary structures. Some symmetric properties and information on the base distribution and compositions can be intuitively reflected by the projection graphs of the points corresponding to the RNA secondary structures. Then our method is applied to compute the similarity of 12 classical samples and 11 real RNA secondary structures. The results indicate that our method can not only effectively analyze the similarity between RNA secondary structures but also show a high consistency with other literatures. Moreover, our method only needs the geometrical center of the characteristic curve of the RNA secondary structure to compute the similarity matrix, which means a low computational complexity. © 2011 Wiley Periodicals, Inc. Int J Quantum Chem, 2011  相似文献   

18.
With more and more RNA secondary structures accumulated, the need for comparing different RNA secondary structures often arises in function prediction and evolutionary analysis. Numerous efficient algorithms were developed for comparing different RNA secondary structures, but challenges remain. In this article, a new statistical measure extending the notion of relative entropy based on the proposed stochastic model is evaluated for RNA secondary structures. The results obtained from several experiments on real datasets have shown the effectiveness of the proposed approach. Moreover, the time complexity of our method is favorable by comparing with that of the existing methods which solve the similar problem.  相似文献   

19.
20.
Deep learning methods for RNA secondary structure prediction have shown higher performance than traditional methods, but there is still much room to improve. It is known that the lengths of RNAs are very different, as are their secondary structures. However, the current deep learning methods all use length-independent models, so it is difficult for these models to learn very different secondary structures. Here, we propose a length-dependent model that is obtained by further training the length-independent model for different length ranges of RNAs through transfer learning. 2dRNA, a coupled deep learning neural network for RNA secondary structure prediction, is used to do this. Benchmarking shows that the length-dependent model performs better than the usual length-independent model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号