首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A DNA sequence can be regarded as a discrete-time Markov chain. Based on k-step transition probabilities, we construct a series of 4 x 4 k-step transition matrices to characterize the DNA primary sequences. According to the properties of Markov chains, we obtain distributions of A, T, C and G, and analyze the changes among them from yesterday to tomorrow. We can calculate the probabilities of nucleotide triples of DNA primary sequences. Finally, we introduce a correlation of this kind of transition matrices and consider it as an invariant to analyze the similarities/dissimilarities of DNA sequences.  相似文献   

2.
On the basis of the Huffman coding method, we propose a new graphical representation of DNA sequence. The representation can avoid degeneracy and loss of information in the transfer of data from a DNA sequence to its graphical representation. Then a multicomponent vector from the representation is introduced to characterize quantitatively DNA sequences. The components of the vector are derived from the graphical representation of DNA primary sequence. The examination of similarities and dissimilarities among the complete coding sequences of β-globin gene of 11 species and six ND6 proteins shows the utility of the scheme.  相似文献   

3.
Based on the classification of 20 amino acids, we reduce a protein primary sequence to six (0,1) sequences. For each of them, two so-called normalized relative-entropies are calculated and thus a 12-D vector is constructed to describe the protein primary sequence. The examination of similarities/dissimilarities among eight different proteins illustrates the utility of the approach.  相似文献   

4.
We present recent data of our Monte Carlo computer simulation study of properties of AB-copolymer globules which depend strongly on the primary sequence of A and B monomeric units. Different primary sequences have been studied: random, random-block, regular and designed ones by using some particular spatial conformation of a homopolymer chain (we have compared here three models: proteinlike copolymers, AB-copolymers modeling membrane proteins and ABC-copolymers modeling proteins with active enzymatic center). We have found several evidences for the fact that an AB-copolymer chain with a primary sequence prepared on the basis of a particular conformation of a homopolymer chain by some “coloring” procedure preserves the “memory” about its “parent” spatial conformation. Analyzing the power spectra of AB-sequences, we find the existence of long-range power-law correlations for the copolymers with specially designed primary sequences.  相似文献   

5.
DNA序列编码及相似度计算   总被引:4,自引:0,他引:4  
将DNA的一级序列如β-球蛋白基因的第一个外显子(Exon)转化为分子“结构图”, 然后由所得“结构图”提取图的不变量, 如分子连接性指数. 以图的不变量作为自变量, 再由相似度计算公式或距离公式进行相似度计算, 其相似度的大小显示不同物种间亲缘关系的远近程度. 运用这种方法对人、 猴及鼠等8个物种的β-球蛋白基因的第一个外显子的相似度进行计算, 所得结果与生物学中的进化树符合得较好.  相似文献   

6.
We consider a 6-D representation of triplets of nucleotide bases of DNA sequences. Based on this representation, we outline an approach by constructing a 3-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with the triplets derived from DNA sequences. The examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of different species illustrates the utility of the approach.  相似文献   

7.
The use of DNA methylation to predict chronological age has shown promising potential for obtaining additional information in forensic investigations. To date, several studies have reported age prediction models based on DNA methylation in body fluids with high DNA content. However, it is often difficult to apply these existing methods in practice due to the low amount of DNA present in stains of body fluids that are part of a trace material. In this study, we present a sensitive and rapid test for age prediction with bloodstains based on pyrosequencing and random forest regression. This assay requires only 0.1 ng of genomic DNA and the entire procedure can be completed within 10 h, making it practical for forensic investigations that require a short turnaround time. We examined the methylation levels of 46 CpG sites from six genes using bloodstain samples from 128 males and 113 females aged 10–79 years. A random forest regression model was then used to construct an age prediction model for males and females separately. The final age prediction models were developed with seven CpG sites (three for males and four for females) based on the performance of the random forest regression. The mean absolute deviation was less than 3 years for each model. Our results demonstrate that DNA methylation-based age prediction using pyrosequencing and random forest regression has potential applications in forensics to accurately predict the biological age of a bloodstain donor.  相似文献   

8.
DNA编码序列的图形表示及相似度计算   总被引:1,自引:0,他引:1  
将DNA编码序列转化为图,然后计算得图的不变量-分子连接性指数,由所得的拓扑指数对DNA编码序列进行相似度比较以确认其同源性,得到了较好的结果.  相似文献   

9.
10.
Methods of artificial evolution such as SELEX and in vitro selection have made it possible to isolate RNA and DNA motifs with a wide range of functions from large random sequence libraries. Once the primary sequence of a functional motif is known, the sequence space around it can be comprehensively explored using a combination of random mutagenesis and selection. However, methods to explore the sequence space of a secondary structure are not as well characterized. Here we address this question by describing a method to construct libraries in a single synthesis which are enriched for sequences with the potential to form a specific secondary structure, such as that of an aptamer, ribozyme, or deoxyribozyme. Although interactions such as base pairs cannot be encoded in a library using conventional DNA synthesizers, it is possible to modulate the probability that two positions will have the potential to pair by biasing the nucleotide composition at these positions. Here we show how to maximize this probability for each of the possible ways to encode a pair (in this study defined as A-U or U-A or C-G or G-C or G.U or U.G). We then use these optimized coding schemes to calculate the number of different variants of model stems and secondary structures expected to occur in a library for a series of structures in which the number of pairs and the extent of conservation of unpaired positions is systematically varied. Our calculations reveal a tradeoff between maximizing the probability of forming a pair and maximizing the number of possible variants of a desired secondary structure that can occur in the library. They also indicate that the optimal coding strategy for a library depends on the complexity of the motif being characterized. Because this approach provides a simple way to generate libraries enriched for sequences with the potential to form a specific secondary structure, we anticipate that it should be useful for the optimization and structural characterization of functional nucleic acid motifs.  相似文献   

11.
New 2D graphical representation of DNA sequences   总被引:5,自引:0,他引:5  
We consider a 2D graphical representations of DNA sequences, which avoids loss of information associated with crossing and overlapping of the corresponding curve. We outline an approach, which is based on the construction of a three-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with DNA. The examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of different species illustrates the utility of the approach.  相似文献   

12.
Aptamers are DNA oligonucleotides capable of binding different classes of targets with high affinity and selectivity. They are particularly attractive as affinity probes in multiplexed quantitative analysis of proteins. Aptamers are typically selected from large libraries of random DNA sequences in a general approach termed systematic evolution of ligands by exponential enrichment (SELEX). SELEX involves repetitive rounds of two processes: (i) partitioning of aptamers from non-aptamers by an affinity method and (ii) amplification of aptamers by the polymerase chain reaction (PCR). New partitioning methods, which are characterized by exceptionally high efficiency of partitioning, have been recently introduced. For the overall SELEX procedure to be efficient, the high efficiency of new partitioning methods has to be matched by high efficiency of PCR. Here we present the first detailed study of PCR amplification of random DNA libraries used in aptamer selection. With capillary electrophoresis as an analytical tool, we found fundamental differences between PCR amplification of homogeneous DNA templates and that of large libraries of random DNA sequences. Product formation for a homogeneous DNA template proceeds until primers are exhausted. For a random DNA library as a template, product accumulation stops when PCR primers are still in excess of the products. The products then rapidly convert to by-products and virtually disappear after only 5 additional cycles of PCR. The yield of the products decreases with the increasing length of DNA molecules in the library. We also proved that the initial number of DNA molecules in PCR mixture has no effect on the by-products formation. While the increase of the Taq DNA polymerase concentration in PCR mixture selectively increases the yield of PCR products. Our findings suggest that standard procedures of PCR amplification of homogeneous DNA samples cannot be transferred to PCR amplification of random DNA libraries: to ensure efficient SELEX, PCR has to be optimized for the amplification of random DNA libraries.  相似文献   

13.
The intriguing structural diversity in folded topologies available to guanine-rich nucleic acid repeat sequences have made four-stranded G-quadruplex structures the focus of both basic and applied research, from cancer biology and novel therapeutics through to nanoelectronics. Distributed widely in the human genome as targets for regulating gene expression and chromosomal maintenance, they offer unique avenues for future cancer drug development. In particular, the recent advances in chemical and structural biology have enabled the construction of bespoke selective DNA based aptamers to be used as novel therapeutic agents and access to detailed structural models for structure based drug discovery. In this critical review, we will explore the important underlying characteristics of G-quadruplexes that make them functional, stable, and predictable nanoscaffolds. We will review the current structural database of folding topologies, molecular interfaces and novel interaction surfaces, with a consideration to their future exploitation in drug discovery, molecular biology, supermolecular assembly and aptamer design. In recent years the number of potential applications for G-quadruplex motifs has rapidly grown, so in this review we aim to explore the many future challenges and highlight where possible successes may lie. We will highlight the similarities and differences between DNA and RNA folded G-quadruplexes in terms of stability, distribution, and exploitability as small molecule targets. Finally, we will provide a detailed review of basic G-quadruplex geometry, experimental tools used, and a critical evaluation of the application of high-resolution structural biology and its ability to provide meaningful and valid models for future applications (255 references).  相似文献   

14.
Condensed representation of DNA primary sequences   总被引:6,自引:0,他引:6  
With rapid reporting of DNA sequences derived from automated DNA sequencing techniques, the problem of reviewing and ordering such information has become acute. We have introduced a condensed representation of primary sequences of DNA that offers an alternative method of registering DNA. The advantage of the condensed codes for DNA is that it not only offers fast, qualitative comparisons of DNA but also allows quantitative comparisons of DNA from different sources. The approach is outlined for a particular human beta globin sequence extract. Using the condensed representation of the primary DNA sequences, comparisons are made between primary sequences for Exon 1 of human beta globin and seven other beta globins.  相似文献   

15.
On the similarity of DNA primary sequences   总被引:3,自引:0,他引:3  
We consider numerical characterization of graphical representations of DNA primary sequences. In particular we consider graphical representation of DNA of beta-globins of several species, including human, on the basis of the approach of A. Nandy in which nucleic bases are associated with a walk over integral points of a Cartesian x, y-coordinate system. With a so-generated graphical representation of DNA, we associate a distance/distance matrix, the elements of which are given by the quotient of the Euclidean and the graph theoretical distances, that is, through the space and through the bond distances for pairs of bases of graphical representation of DNA. We use eigenvalues of so-constructed matrices to characterize individual DNA sequences. The eigenvalues are used to construct numerical sequences, which are subsequently used for similarity/dissimilarity analysis. The results of such analysis have been compared and combined with similarity tables based on the frequency of occurrence of pairs of bases.  相似文献   

16.
From the perspective of the neighboring dual nucleotides, we introduce a novel 2D graphical representation of DNA sequences based on the magic circle, which correspond to 16 dual nucleotides. So, we can reduce a DNA sequence into a plot set in two‐dimensional space and get a two‐component vector relatively to the introduced covariance matrix. The utility of our approach can be illustrated by the examination of similarities/dissimilarities among the complete coding sequences of β‐globin gene belonging to 11 species. © 2008 Wiley Periodicals, Inc. Int J Quantum Chem, 2009  相似文献   

17.
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.  相似文献   

18.
张宝华  王海水  许禄 《化学学报》2006,64(22):2221-2224
在脱氧核糖核酸(DNA)一级序列中, 4种碱基可以有16种两两组合的方式(例如, ct, ag, cg, tc等), 不同的组合方式在DNA的该种序列中出现的次数也不一样, 其中, cg出现的频数甚少. 本文将cg所处位置及频率作为一种数学的量, 计算了各物种的分子连接性指数, 并以之为参数进行了物种间相似性分析. 在此基础上, 进一步对10种物种做了分组和归类.  相似文献   

19.
In order to extend the results obtained with minimal lattice models to more realistic systems, we study a model where proteins are described as a chain of 20 kinds of structureless amino acids moving in a continuum space and interacting through a contact potential controlled by a 20x20 quenched random matrix. The goal of the present work is to design and characterize amino acid sequences folding to the SH3 conformation, a 60-residue recognition domain common to many regulatory proteins. We show that a number of sequences can fold, starting from a random conformation, to within a distance root-mean-square deviation between 2.6 and 4.0 A from the native state. Good folders are those sequences displaying in the native conformation an energy lower than a sequence-independent threshold energy.  相似文献   

20.
On the basis of a class of 2D graphical representations of DNA sequences, sensitivity analysis has been performed, showing the high-capability of the proposed representations to take into account small modifications of the DNA sequences. And sensitivity analysis also indicates that the absolute differences of the leading eigenvalues of the L/L matrices associated with DNA increase with the increase of the number of the base mutations. Besides, we conclude that the similarity analysis method based on the correlation angles can better eliminate the effects of the lengths of DNA sequences if compared with the method using the Euclidean distances. As application, the examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of different species has been performed by our method, and the reasonable results verify the validity of our method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号