首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
DNA中编码序列的分形特征研究   总被引:1,自引:0,他引:1  
随着基因组数据库的日益增大,如何从这庞大的数据库中提取有用的信息已成为全世界科学家迫在眉睫的难题。本文运用网格维数分别刻画了60个人类基因序列编码区的分形特征。研究结果表明:在同一个基因中,外显子的维数一般要大于整个蛋白质编码序列的维数,并通过对比随机序列的网格维数,证实了这一结论。结合分形理论及功率谱研究可以得出,具有较少外显子的基因,外显子中包含有较多的遗传信息,而对于较多外显子的基因则相反,遗传信息可能储存于内含子中。这些结论对内含子功能以及DNA序列的复杂性的研究具有一定的理论意义和实用价值。  相似文献   

2.
基因组中硒蛋白的信息学   总被引:4,自引:0,他引:4  
人类已经步入"后基因组"时代,基因组研究的重心将由测定基因的DNA序列转移到解释生命的所有遗传信息,从分子整体水平对生物学功能的研究,在分子层面上探索人类健康和疾病的奥秘.硒蛋白基因是各种基因组中一类重要的蛋白质基因,从基因组中寻找新的硒蛋白基因,对于硒蛋白生物功能的探索具有十分重要的意义.本文就硒代半胱氨酸插入元件(SECIS)结构特征、从基因组中寻找硒蛋白的生物信息学方法及其研究进展作了简要介绍,并对未来的发展趋势进行了展望.  相似文献   

3.
A point mutation of a nucleotide within a single gene can have a profound effect on a specific organ and/or the entire human body. DNA sequences associated with human diseases may differ from the corresponding normal sequences by single nucleotide mutations or by large alterations such as deletions, insertions, duplications, or translocations of DNA segments or entire chromosomes. As a result of the heterogeneity of DNA alterations and genetic mutations, various screening approaches are required to detect these alterations. However, methods which facilitate the detection of large mutations in the genome are typically insensitive to point mutations, whereas methods which detect point mutations are not appropriate to detect large alterations within the genome. Since there is no single perfect method to screen for unknown mutations, combinations of these methods may be necessary for accurate genetic diagnosis. The applications of polymerase chain reaction (PCR) technology to genomic screening have made rapid and accurate genetical diagnosis possible. Furthermore, recent developments in the technology of DNA microarrays have opened the way for high throughput sequence analysis by hybridization, which shows great potential in both molecular biology and medicine in the near future.  相似文献   

4.
Based on the relative ratios of di- and tri-nucleotides in the DNA sequences, the profiles of 164 genome sequences from 152 representative microbial organisms were computed. By comparing the profiles of the genomes and their substrings with length 500 bps, the fluctuations of the relative abundances of di- and tri-nucleotides of these genomic sequences were analyzed. A new method to discriminate the origins of orphan DNA sequences was proposed, and the origins of 17 uncultured bacterium sequences from a bacterial community in the human gut were postulated and discussed.  相似文献   

5.
We analyze publicly available data on Affymetrix microarray spike-in experiments on the human HGU133 chipset in which sequences are added in solution at known concentrations. The spike-in set contains sequences of bacterial, human, and artificial origin. Our analysis is based on a recently introduced molecular-based model (Carlon, E.; Heim, T. Physica A 2006, 362, 433) that takes into account both probe-target hybridization and target-target partial hybridization in solution. The hybridization free energies are obtained from the nearest-neighbor model with experimentally determined parameters. The molecular-based model suggests a rescaling that should result in a "collapse" of the data at different concentrations into a single universal curve. We indeed find such a collapse, with the same parameters as obtained previously for the older HGU95 chip set. The quality of the collapse varies according to the probe set considered. Artificial sequences, chosen by Affymetrix to be as different as possible from any other human genome sequence, generally show a much better collapse and thus a better agreement with the model than all other sequences. This suggests that the observed deviations from the predicted collapse are related to the choice of probes or have a biological origin rather than being a problem with the proposed model.  相似文献   

6.
Tandem repeats of short DNA sequences are commonly found in human DNA. These simple sequence repeats or microsatellites are highly polymorphic in the human genome. Since the anti-tumour agent cisplatin preferentially forms DNA adducts at runs of consecutive guanine nucleotides (poly(G)), the position and frequency of occurrence of poly(G) sequences in the updated human genome was investigated. There are more runs of consecutive guanines than would be expected by random chance. This especially true for poly(G) sequences longer than approximately n = 9. A plot of poly(G) length against log(observed/expected) frequency produced a straight line for n > 9. A similar observation was also found for poly(A) DNA sequence repeats. This data implied that the increase in observed/expected frequency is directly related to length of DNA repeat. It was proposed that long runs of consecutive guanine nucleotides could be a sensitive sensor of cellular DNA damage since a number of DNA damaging agents cause lesions at poly(G) sequences.  相似文献   

7.
Loci containing tandem repeats of short sequences are sometimes associated with a high level of polymorphism due to variations in the number of repeats. The different variants can be easily characterized by Southern blotting when the repeats span a range from a few hundred bases to a few kilobases, and probes derived from such tandem repeats constitute convenient genetic markers. These structures, usually called minisatellites, are best documented in the human genome, where their number has been estimated to be at least 1500. However, their role and mode of evolution are poorly understood. We are developing tools to evaluate the number of such redundant sequences in a genome and to gain access to new polymorphic loci. Our strategy is based on the use of polymers of oligonucleotides as DNA probes for hybridization on Southern blots. In a previous report, we made polymers with random units of 14 bp and showed that they detect multiple polymorphic loci on human genomic DNA. At present, we are testing the effect of an increase in the complexity of the polymer, as obtained by the use of a longer random unit, and the effect of slight sequence modifications to a particular tandem repeat sequence. In addition, some of these synthetic probes can detect a single polymorphic locus and directly provide new genetic markers.  相似文献   

8.
Matching peptide tandem mass spectra to their cognate amino acid sequences in databases is a key step in proteomics. It is usually performed by assigning a score to a spectrum-sequence combination. De novo sequencing or partial de novo sequencing is useful for organisms without sequenced genome or for peptides with unexpected modifications. Here we use a very large, high accuracy proteomic dataset to investigate how much peptide sequence information is present in tandem mass spectra generated in a linear ion trap (LTQ). More than 400,000 identified tandem mass spectra from a single human cancer cell line project were assigned to 26,896 distinct peptide sequences. The average absolute fragment mass accuracy is 0.102 Da. There are on average about four complementary b- and y-ions; both series are equally represented but y ions are 2- to 3-fold more intense up to mass 1000. Half of all spectra contain uninterrupted b- or y-ion series of at least six amino acids and combining b- and y-ion information yields on average seven amino acid sequences. These sequences are almost always unique in the human proteome, even without using any precursor or peptide sequence tag information. Thus, optimal de novo sequencing algorithms should be able to obtain substantial sequence information in at least half of all cases.  相似文献   

9.
High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell--the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated.  相似文献   

10.
Protein motifs, which are specific regions and conserved regions, are found by comparing multiple protein sequences. These conserved regions in general play an important role in protein functions and protein folds, for example, for their binding properties or enzymatic activities. The aim here is to find the existence correlations of protein motifs. The knowledge of protein motif/domain sharing should be important in shedding new light on the biologic functions of proteins and offering a basis in analyzing the evolution in the human genome or other genomes. The protein sequences used here are obtained from the PIR-NREF database and the protein motifs are retrieved from the PROSITE database. We apply data mining approach to discover the occurrence correlations of motif in protein sequences. The correlation of motifs mined can be used in evolution analyses and protein structure prediction. We discuss the latter, i.e., protein structure prediction in this study. The correlations mined are stored and maintained in a database system. The database is now available at http://bioinfo.csie.ncu.edu.tw/ProMotif/.  相似文献   

11.
High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell – the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated. Received: 16 December 1999 / Accepted: 17 December 1999  相似文献   

12.
We have developed a novel approach for dissecting transmembrane beta-barrel proteins (TMBs) in genomic sequences. The features include (i) the identification of TMBs using the preference of residue pairs in globular, transmembrane helical (TMH) and TMBs, (ii) elimination of globular/TMH proteins that show sequence identity of more than 70% for the coverage of 80% residues with known structures, (iii) elimination of globular/TMH proteins that have sequence identity of more than 60% with known sequences in SWISS-PROT, and (iv) exclusion of TMH proteins using SOSUI, a prediction system for TMH proteins. Our approach picked up 7% TMBs in all the considered genomes. The comparison between the identified TMBs in E. coli genome and available experimental data demonstrated that the new approach could correctly identify all the 11 known TMBs, whose crystal structures are available. Further, it revealed the presence of 19 TMBs, homology with known structures, 60 TMBs similar to well annotated sequences, and 54 TMBs that have high sequence similarity with Escherichia coli beta-barrel proteins deposited in Transport Classification Database (TCDB). Interestingly, the present approach identified TMBs from all 15 families in TCDB. In human genome, the occurrence of TMBs varies from 0 to 3% in different chromosomes. We suggest that our approach could lead to a step forward in the advancement of structural and functional genomics.  相似文献   

13.
In an era that has been dominated by Structural Biology for the last 30-40 years, a dramatic change of focus towards sequence analysis has spurred the advent of the genome projects and the resultant diverging sequence/structure deficit. The central challenge of Computational Structural Biology is therefore to rationalize the mass of sequence information into biochemical and biophysical knowledge and to decipher the structural, functional and evolutionary clues encoded in the language of biological sequences. In investigating the meaning of sequences, two distinct analytical themes have emerged: in the first approach, pattern recognition techniques are used to detect similarity between sequences and hence to infer related structures and functions; in the second ab initio prediction methods are used to deduce 3D structure, and ultimately to infer function, directly from the linear sequence. In this article, we attempt to provide a critical assessment of what one may and may not expect from the biological sequences and to identify major issues yet to be resolved. The presentation is organized under several subtitles like protein sequences, pattern recognition techniques, protein tertiary structure prediction, membrane protein bioinformatics, human proteome, protein-protein interactions, metabolic networks, potential drug targets based on simple sequence properties, disordered proteins, the sequence-structure relationship and chemical logic of protein sequences.  相似文献   

14.
MicroRNAs are important negative regulators of gene expression in higher eukaryotes. The miRNA repertoire of the closest human animal relative, the chimpanzee (Pan troglodytes), is largely unknown. In this study, we focused on computational search of novel miRNA homologs in chimpanzee. We have searched and analyzed the chimp homologs of the human pre-miRNA and mature miRNA sequences. Based on a homology search of the chimpanzee genome with human miRNA precursor sequences as queries, we identified 639 chimp miRNA genes, including 529 novel chimp miRNAs. 91.8% of chimp mature miRNAs and 60.3% of precursors are 100% identical to their human orthologs. The pre-miRNA secondary structures, miRNA families, and clusters are also highly conserved. We also found certain sequence differences in pre-miRNAs and even mature miRNAs that occurred after the divergence of the two species. Some of these differences (especially in mature miRNAs) could have caused species-specific changes in the expression levels of their target genes which in turn could have resulted in phenotypic variation between human and chimp.  相似文献   

15.
Mycobacterium tuberculosis is the infectious agent giving rise to human tuberculosis. The entire genome of M. tuberculosis, comprising approximately 4000 open reading frames, has been sequenced. The huge amount of information released from this project has facilitated proteome analysis of M. tuberculosis. Two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) was applied to fractions derived from M. tuberculosis culture filtrate, cell wall, and cytosol, resulting in the resolution of 376, 413, and 395 spots, respectively, in silver-stained gels. By microsequencing and immunodetection, 38 culture filtrate proteins were identified and mapped, of which 12 were identified for the first time. In the same manner, 23 cell wall proteins and 19 cytosol proteins were identified and mapped, with 9 and 10, respectively, being novel proteins. One of the novel proteins was not predicted in the genome project, and for four of the identified proteins alternative start codons were suggested. Fourteen of the culture filtrate proteins were proposed to possess signal sequences. Seven of these proteins were microsequenced and the N-terminal sequences obtained confirmed the prediction. The data presented here are an important complement to the genetic information, and the established 2-D PAGE maps (also available at: www.ssi.dk/publichealth/tbimmun) provide a basis for comparative studies of protein expression.  相似文献   

16.
Kulski JK  Ward BK 《Electrophoresis》2000,21(5):896-903
A goat genomic library was screened by Southern blot hybridization at reduced stringency with a bovine papillomavirus type 5 (BPV 5) DNA probe in order to identify potential cellular and viral sequences related to the papillomavirus genome. A recombinant clone with an 8.5 kb genomic insert was found to contain a 1.3 kb PstI subfragment (designated as P1-1) that hybridized with the DNA of BPV 5, two murine papillomaviruses and human papillomavirus types 5 and 8, but not with DNA from another eight human and bovine papillomavirus types. Southern blot hybridization of the goat P1-1 DNA probe was restricted to a single 1.0 kb subfragment within the E1 open reading frame (ORF) of BPV 5 but produced multiple bands ranging between 1.0 and 9.0 kb when hybridized under stringent conditions with PstI-digested DNA obtained from different goat tissues. The genomic sequence of P1-1 has direct repeats of 10 and 13 nucleotides flanking 153 nucleotides, and 889 nucleotides of sequence, respectively, and an inverted repeat sequence of 11 nucleotides flanking a major ORF potentially coding for 244 residues. Potential splice acceptor and donor sites capable of joining with upstream and downstream exons are present within the major ORF. Sequence similarity between P1-1 and BPV 5 DNA at the nucleotide and amino acid level was limited to a stretch of 58 nucleotides which includes an oligopurine/pyrimidine tract. This region of similarity contains a predicted glutamic acid-rich domain. The P1-1 sequence is a novel repetitive element within the goat genome that is unrelated in sequence to papillomavirus DNA and to genomic sequences of mouse and man.  相似文献   

17.
The Human Genome Project (HGP) is the most ambitious and important effort in the history of biology. It has provided a complete genetic blueprint for human life, and will provide important insights into human health and development. HGP involves a huge amount of data that is stored on computers all over the world. More than just vast amounts of DNA sequences, the project is about developing sets of integrated maps that involve genetic, physical, and sequence data. The data can be sorted, annotated and organized in many different ways using different types of database software, different analysis algorithms and different forms of interfaces. The genomic sequences of the human and the substantial portions of the mouse genome are expected to be finished by 2005. Analytical chemists took the opportunity, addressing the problem of achieving a high throughput with good sensitivity. This paper discusses how analytical chemists saved the Human Genome Project or at least gave it a helping hand.  相似文献   

18.
19.
The recent success of the human genome project and the continued accomplishment in obtaining DNA sequences for a vast array of organisms is providing an unprecedented wealth of information. Nevertheless, an abundance of the proteome contains hypothetical proteins or proteins of unknown function, where high throughput approaches for genome-wide functional annotation (functional genomics) has evolved as the necessary next step. Nuclear magnetic resonance spectroscopy is playing an important role in functional genomics by providing information on the structure of protein and protein-ligand complexes, from metabolite fingerprinting and profiling, from the analysis of the metabolome, and from ligand affinity screens to identify chemical probes.  相似文献   

20.
Guanine‐rich sequences of DNA can assemble into tetrastranded structures known as G‐quadruplexes. It has been suggested that these secondary DNA structures could be involved in the regulation of several key biological processes. In the human genome, guanine‐rich sequences with the potential to form G‐quadruplexes exist in the telomere as well as in promoter regions of certain oncogenes. The identification of these sequences as novel targets for the development of anticancer drugs has sparked great interest in the design of molecules that can interact with quadruplex DNA. While most reported quadruplex DNA binders are based on purely organic templates, numerous metal complexes have more recently been shown to interact effectively with this DNA secondary structure. This Review provides an overview of the important roles that metal complexes can play as quadruplex DNA binding molecules, highlighting the unique properties metals can confer to these molecules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号