首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
A DNA sequence can be identified with a word over an alphabet N = [A, C, G, T]. Characteristic sequences of a DNA sequence are given in term of classifications of bases of nucleic acids. Using the characteristic sequences, we construct a set of 2 x 2 matrices to represent DNA primary sequences, which are based on counting of the frequency of occurrence of all (0,1) triplets of characteristic sequences. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the DNA primary sequences. Similarity and dissimilarity analysis based on the characteristic sequences are given for eight exon-1 genes of beta-globin about eight species.  相似文献   

2.
In this paper, we (1) introduce a logical representation (LR) for DNA primary sequences; (2) show relations between LR and some other representations including the characteristic sequences of a DNA sequence, Randic's 2-D, 4-D representations, and Z-curve (a 3-D graphical representation); and (3) outline the constructions of the S/S matrix specific for a logical sequence and its 2*2 condensed matrix.  相似文献   

3.
Given a bi-classification of nucleotides, we can obtain a reduced binary sequence of a primary DNA sequence. This binary sequence will undoubtedly retain some biological information and lose the rest. Here we want to know what kind of and how much biological information an individual binary sequence carries. Three classifications of nucleotides are explored in the present article. Phylogenetic trees are built from these binary sequences by the Neighbor-Joining (NJ) method, with evolutionary distance evaluated on the basis of a symbolic sequence complexity. We find that, for all data sets studied, binary sequences reduced by the purine/pyrimidine classification give reliable phylogeny (almost the same as that from the primary sequences), while the other two result in discrepancies at different levels. Some possible reasons and a simple model of sequence evolutionary are introduced to interpret this phenomenon.  相似文献   

4.
In the basic biological research, one of major tasks is to compare biological sequences to infer evolutionary relations among sequences. In this paper, considering both the positions and numbers of a k-word and the random background, a novel characteristic vector of a DNA sequence is proposed to serve for genetic sequences comparison and phylogenetic analysis. The vector is composed of elements which characterize the relative difference of a DNA sequence from a sequence generated by a (k − 2)th order Markov process. Finally, we reconstruct the phylogenetic trees of 48 HEV (Hepatitis E virus) and 20 Eutherian mammals. The results show that this new method provides more information about k-word and improves the efficiency of sequence comparison.  相似文献   

5.
Recently, we established a robust method for the detection of hybridization events using a DNA microarray deposited on a nanoporous membrane. Here, in a follow-up study, we demonstrate the performance of this approach on a larger set of LNA-modified oligoprobes and genomic DNA sequences. Twenty-six different LNA-modified 7-mer oligoprobes were hybridized to a set of 66 randomly selected human genomic DNA clones spotted on a nanoporous membrane slide. Subsequently, assay sensitivity analysis was performed using receiver operating characteristic (ROC) curves. Comparison of LNA-modified heptamers and DNA heptamers revealed that the LNA modification clearly improved sensitivity and specificity of hybridization experiment. Clustering analysis was applied in order to test practical performance of hybridization experiments with LNA-modified oligoprobes in recognizing similarity of genomic DNA sequences. Comparing the results with the theoretical sequence clusters, we conclude that the application of LNA-modified oligoprobes allows for reliable clustering of DNA sequences which reflects the underlying sequence homology. Our results show that LNA-modified oligoprobes can be used effectively to unravel sequence similarity of DNA sequences and thus, to characterize the content of unknown DNA libraries.  相似文献   

6.
The conformational properties of some nucleotide sequences result in their ability to bind specifically some ligands or tobe recognized by specific proteins. In order to investigate the dependence of conformational behavior of the DNA duplex on nucleotide sequence, we analyzed the interaction energy of nucleic acid bases as a function of conformational parameters and base sequence. Extended regions of minimum energy values were found for different sequences. Although these regions (valleys) largely overlap, each one shows specificity for a particular sequence. This suggests that a specific pathway of changes in conformational parameters exists for each sequence. the changes may be accompanied by considerable shifts (2–3 Å) of the atom positions and an only slight variation (1–2 kcal/mol) of energy. Even small shifts in other directions can cause a drastic energy increase. For some nucleotide sequences, the energetically preferable conformations are the B-like ones (e.g., ApA, TpA), whereas for others the A-like ones are preferable (e.g., GpG, ApT). IN general, Pyr-Pur sequences have a tendency to a larger τ and smaller H and D than Pur-Pyr sequences. A large body of experimental data on nucleic acid structure in fibers and in solutions can be explained by results obtained.  相似文献   

7.
A multilayered feed-forward ANN architecture trained using the error-back-propagation (EBP) algorithm has been developed for predicting whether a given nucleotide sequence is a mycobacterial promoter sequence. Owing to the high prediction capability (97%) of the developed network model, it has been further used in conjunction with the caliper randomization (CR) approach for determining the structurally/functionally important regions in the promoter sequences. The results obtained thereby indicate that: (i) upstream region of −35 box, (ii) −35 region, (iii) spacer region and, (iv) −10 box, are important for mycobacterial promoters. The CR approach also suggests that the −38 to −29 region plays a significant role in determining whether a given sequence is a mycobacterial promoter. In essence, the present study establishes ANNs as a tool for predicting mycobacterial promoter sequences and determining structurally/functionally important sub-regions therein.  相似文献   

8.
9.
On a four-dimensional representation of DNA primary sequences   总被引:1,自引:0,他引:1  
We consider a four-dimensional representation of DNA primary sequences by assigning to each of the four basic amino acids A, T, G, C directions along the four orthogonal coordinate axes. Advantages and limitations of the novel representation of DNA primary sequences are discussed, and the use of the 4-D representation is illustrated by constructing novel sequence invariants. Comparisons with the similarity/dissimilarity results based on 2-D and 3-D representations for a set of eight short DNA sequences corresponding to the first exon of beta globin in eight species, including human, are considered to illustrate the use of our novel sequence invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal.  相似文献   

10.
A new type of molecular representation is introduced that is based on activity class characteristic substructures extracted from random fragment populations. Mapping of characteristic substructures is used to determine atom match rates in active molecules. Comparison of match rates of bonded atoms defines a hierarchical molecular fragmentation scheme. Active compounds are encoded as fragmentation pathways isolated from core trees. These paths are amenable to biological sequence alignment methods in combination with substructure-based scoring functions. From multiple core path alignments, consensus fragment sequences are derived that represent compound activity classes. Consensus fragment sequences weighted by increasing structural specificity can also be used to map molecules and search databases for active compounds.  相似文献   

11.
New 2D graphical representation of DNA sequences   总被引:5,自引:0,他引:5  
We consider a 2D graphical representations of DNA sequences, which avoids loss of information associated with crossing and overlapping of the corresponding curve. We outline an approach, which is based on the construction of a three-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with DNA. The examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of different species illustrates the utility of the approach.  相似文献   

12.

Background  

Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families.  相似文献   

13.
Based on the classification of 20 amino acids, we reduce a protein primary sequence to six (0,1) sequences. For each of them, two so-called normalized relative-entropies are calculated and thus a 12-D vector is constructed to describe the protein primary sequence. The examination of similarities/dissimilarities among eight different proteins illustrates the utility of the approach.  相似文献   

14.
This study brings a detailed bioinformatics analysis of fungal and chloride-dependent α-amylases from the family GH13. Overall, 268 α-amylase sequences were retrieved from subfamilies GH13_1 (39 sequences), GH13_5 (35 sequences), GH13_15 (28 sequences), GH13_24 (23 sequences), GH13_32 (140 sequences) and GH13_42 (3 sequences). Eight conserved sequence regions (CSRs) characteristic for the family GH13 were identified in all sequences and respective sequence logos were analysed in an effort to identify unique sequence features of each subfamily. The main emphasis was given on the subfamily GH13_32 since it contains both fungal α-amylases and their bacterial chloride-activated counterparts. In addition to in silico analysis focused on eventual ability to bind the chloride anion, the property typical mainly for animal α-amylases from subfamilies GH13_15 and GH13_24, attention has been paid also to the potential presence of the so-called secondary surface-binding sites (SBSs) identified in complexed crystal structures of some particular α-amylases from the studied subfamilies. As template enzymes with already experimentally determined SBSs, the α-amylases from Aspergillus niger (GH13_1), Bacillus halmapalus, Bacillus paralicheniformis and Halothermothrix orenii (all from GH13_5) and Homo sapiens (saliva; GH13_24) were used. Evolutionary relationships between GH13 fungal and chloride-dependent α-amylases were demonstrated by two evolutionary trees—one based on the alignment of the segment of sequences spanning almost the entire catalytic TIM-barrel domain and the other one based on the alignment of eight extracted CSRs. Although both trees demonstrated similar results in terms of a closer evolutionary relatedness of subfamilies GH13_1 with GH13_42 including in a wider sense also the subfamily GH13_5 as well as for subfamilies GH13_32, GH13_15 and GH13_24, some subtle differences in clustering of particular α-amylases may nevertheless be observed.  相似文献   

15.
The authors propose a novel approach to design and evaluate sequences for zero-field NMR spectra in high field (ZFHF) by using amplitude and phase modulated rf sequences. ZFHF provide sharp peaks for the dipolar interaction between two nuclear spins even if the orientation of the molecules is distributed. The internuclear distance r can be directly obtained from the peak position which is proportional to r-3. Numerous ZFHF sequences are obtained. A sequence is selected from them by the systematic evaluation of the sequences. The new ZFHF sequence is less affected by chemical shift anisotropy (CSA) than the previous sequences; the sequence can be used for systems with large CSA such as a dipolar coupled 13C-pair system under realistically high field. 13C ZFHF spectra of 13C2 diammonium succinate and 13C2 diammonium oxalate were observed under the 9.4 T field.  相似文献   

16.
For a DNA sequence with n bases, one can always associate it with an n x n nonnegative real symmetric matrix whose diagonal entries are zero. Once the matrix is given, its leading eigenvalue is usually calculated and used as an invariant to characterize the DNA sequence. Let M be such a matrix, and lambda1 its leading eigenvalue. Then (1/n)//M//m1 and sqrt [(n-1)/n]//M//F are the lower and upper bounds of lambda1, respectively. Since their arithmetic average is an approximate value of lambda1 and simpler for calculation, we can use it as an alternative invariant to characterize the DNA sequence. The utility of the new parameter is illustrated on the DNA sequences of five species: human, chimpanzee, mouse, rat, and gallus.  相似文献   

17.
We have developed a novel approach for dissecting transmembrane beta-barrel proteins (TMBs) in genomic sequences. The features include (i) the identification of TMBs using the preference of residue pairs in globular, transmembrane helical (TMH) and TMBs, (ii) elimination of globular/TMH proteins that show sequence identity of more than 70% for the coverage of 80% residues with known structures, (iii) elimination of globular/TMH proteins that have sequence identity of more than 60% with known sequences in SWISS-PROT, and (iv) exclusion of TMH proteins using SOSUI, a prediction system for TMH proteins. Our approach picked up 7% TMBs in all the considered genomes. The comparison between the identified TMBs in E. coli genome and available experimental data demonstrated that the new approach could correctly identify all the 11 known TMBs, whose crystal structures are available. Further, it revealed the presence of 19 TMBs, homology with known structures, 60 TMBs similar to well annotated sequences, and 54 TMBs that have high sequence similarity with Escherichia coli beta-barrel proteins deposited in Transport Classification Database (TCDB). Interestingly, the present approach identified TMBs from all 15 families in TCDB. In human genome, the occurrence of TMBs varies from 0 to 3% in different chromosomes. We suggest that our approach could lead to a step forward in the advancement of structural and functional genomics.  相似文献   

18.
An approach to design modulated rf sequences under sample spinning which decouple/recouple a specific nuclear-spin interaction in solid-state NMR is presented. The Euler angles of the spin rotation caused by a general rf field are forced to fulfill the symmetry principle theory for selecting an interaction of interest. Then, modulated rf sequences are directly obtained from the Euler angles with a large degree of freedom. rf sequences with high performance can be selected from them by numerically optimizing rf sequence parameters. As an example of this approach, an amplitude- and phase-modulated rf sequence to recouple chemical-shift anisotropy (CSA) is developed, which is robust with respect to rf inhomogeneity. Two-dimensional (2D) experiments with this rf sequence under on and off magic-angle spinning (MAS) provide one-dimensional and 2D powder patterns, respectively. The latter enables us to determine the CSA principal values more accurately even for overlapped signals in MAS spectra. The effectiveness of this modulated rf sequence is experimentally demonstrated on [(15)N]-N-acetyl-D,L-alanine for determination of the (15)N and (13)CO CSA principal values.  相似文献   

19.
We consider a 6-D representation of triplets of nucleotide bases of DNA sequences. Based on this representation, we outline an approach by constructing a 3-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with the triplets derived from DNA sequences. The examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of different species illustrates the utility of the approach.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号