首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The revolution in genome sequencing technologies has enabled the comprehensive detection of genomic variations in human cells, including inherited germline polymorphisms, de novo mutations, and postzygotic mutations. When these technologies are combined with techniques for isolating and expanding single-cell DNA, the landscape of somatic mosaicism in an individual body can be systematically revealed at a single-cell resolution. Here, we summarize three strategies (whole-genome amplification, microdissection of clonal patches in the tissue, and in vitro clonal expansion of single cells) that are currently applied for single-cell mutational analyses. Among these approaches, in vitro clonal expansion, particularly via adult stem cell-derived organoid culture technologies, yields the most sensitive and precise catalog of somatic mutations in single cells. Moreover, because it produces living mutant cells, downstream validation experiments and multiomics profiling are possible. Through the synergistic combination of organoid culture and genome sequencing, researchers can track genome changes at a single-cell resolution, which will lead to new discoveries that were previously impossible.Subject terms: Genomic analysis, Genomics, Adult stem cells, Genetic techniques  相似文献   

2.
De novo assembly of bacterial genomes from next-generation sequencing (NGS) data allows a reference-free discovery of single nucleotide polymorphisms (SNP). However, substantial rates of errors in genomes assembled by this approach remain a major barrier for the reference-free analysis of genome variations in medically important bacteria. The aim of this report was to improve the quality of SNP identification in bacterial genomes without closely related references. We developed a bioinformatics pipeline (SnpFilt) that constructs an assembly using SPAdes and then removes unreliable regions based on the quality and coverage of re-aligned reads at neighbouring regions. The performance of the pipeline was compared against reference-based SNP calling for Illumina HiSeq, MiSeq and NextSeq reads from a range of bacterial pathogens including Salmonella, which is one of the most common causes of food-borne disease. The SnpFilt pipeline removed all false SNP in all test NGS datasets consisting of paired-end Illumina reads. We also showed that for reliable and complete SNP calls, at least 40-fold coverage is required. Analysis of bacterial isolates associated with epidemiologically confirmed outbreaks using the SnpFilt pipeline produced results consistent with previously published findings. The SnpFilt pipeline improves the quality of de-novo assembly and precision of SNP calling in bacterial genomes by removal of regions of the assembly that may potentially contain assembly errors. SnpFilt is available from https://github.com/LanLab/SnpFilt.  相似文献   

3.
To examine copy number variations among the Korean population, we compared individual genomes with the Korean reference genome assembly using the publicly available Korean HapMap SNP 50 k chip data from 90 individuals. Korean individuals exhibited 123 copy number variation regions (CNVRs) covering 27.2 mb, equivalent to 1.0% of the genome in the copy number variation (CNV) analysis using the combined criteria of P value (P < 0.01) and standard deviation of copy numbers (SD ≥ 0.25) among study subjects. In contrast, when compared to the Affymetrix reference genome assembly from multiple ethnic groups, considerably more CNVRs (n = 643) were detected in larger proportions (5.0%) of the genome covering 135.1 mb even by more stringent criteria (P < 0.001 and SD ≥ 0.25), reflecting ethnic diversity of structural variations between Korean and other populations. Some CNVRs were validated by the quantitative multiplex PCR of short fluorescent fragment (QMPSF) method, and then copy number invariant regions were detected among the study subjects. These copy number invariant regions would be used as good internal controls for further CNV studies. Lastly, we demonstrated that the CNV information could stratify even a single ethnic population with a proper reference genome assembly from multiple heterogeneous populations.  相似文献   

4.
The global incidence of early-onset colorectal cancer (EO-CRC) is rapidly rising. However, the reason for this rise in incidence as well as the genomic characteristics of EO-CRC remain largely unknown. We performed whole-exome sequencing in 47 cases of EO-CRC and targeted deep sequencing in 833 cases of CRC. Mutational profiles of EO-CRC were compared with previously published large-scale studies. EO-CRC and The Cancer Genome Atlas (TCGA) data were further investigated according to copy number profiles and mutation timing. We classified colorectal cancer into three subgroups: the hypermutated group consisted of mutations in POLE and mismatch repair genes; the whole-genome doubling group had early functional loss of TP53 that led to whole-genome doubling and focal oncogene amplification; the genome-stable group had mutations in APC and KRAS, similar to conventional colon cancer. Among non-hypermutated samples, whole-genome doubling was more prevalent in early-onset than in late-onset disease (54% vs 38%, Fisher’s exact P = 0.04). More than half of non-hypermutated EO-CRC cases involved early TP53 mutation and whole-genome doubling, which led to notable differences in mutation frequencies between age groups. Alternative carcinogenesis involving genomic instability via loss of TP53 may be related to the rise in EO-CRC.Subject terms: Medical genomics, Colon cancer  相似文献   

5.
Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. Modern sequencers generate a huge number of small sequence fragments called reads, where the read length and the per-base sequencing cost depend on the technology used. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. However, rigorous investigation of the feasibility conditions for complete genome reconstruction and the optimal sequencing strategy for minimizing the sequencing cost has been conspicuously missing. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this paper, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. Through extensive evaluations based on several genomes and different read sets, we verify the derived feasibility conditions and demonstrate the performance of the proposed optimal hybrid sequencing and assembly strategy.  相似文献   

6.
The presence of repetitive or non-unique DNA persisting over sizable regions of a eukaryotic genome can hinder the genome's successful de novo assembly from short reads: ambiguities in assigning genome locations to the non-unique subsequences can result in premature termination of contigs and thus overfragmented assemblies. Fungal mitochondrial (mtDNA) genomes are compact (typically less than 100 kb), yet often contain short non-unique sequences that can be shown to impede their successful de novo assembly in silico. Such repeats can also confuse processes in the cell in vivo. A well-studied example is ectopic (out-of-register, illegitimate) recombination associated with repeat pairs, which can lead to deletion of functionally important genes that are located between the repeats. Repeats that remain conserved over micro- or macroevolutionary timescales despite such risks may indicate functionally or structurally (e.g., for replication) important regions. This principle could form the basis of a mining strategy for accelerating discovery of function in genome sequences. We present here our screening of a sample of 11 fully sequenced fungal mitochondrial genomes by observing where exact k-mer repeats occurred several times; initial analyses motivated us to focus on 17-mers occurring more than three times. Based on the diverse repeats we observe, we propose that such screening may serve as an efficient expedient for gaining a rapid but representative first insight into the repeat landscapes of sparsely characterized mitochondrial chromosomes. Our matching of the flagged repeats to previously reported regions of interest supports the idea that systems of persisting, non-trivial repeats in genomes can often highlight features meriting further attention.  相似文献   

7.
Compound marker consists of two different types of genetic markers, like deletion/insertion polymorphism and single nucleotide polymorphism in the genomic region of 200 bp, and microhaplotype consists of a series of closely linked single nucleotide polymorphisms in a small DNA segment (<300 bp), which show great potential for human identifications and mixture analyses. In this study, we initially selected 23 novel genetic markers comprising 10 microhaplotypes and 13 compound markers according to previously reported single nucleotide polymorphism or deletion/insertion polymorphism loci. Genetic distributions of these 23 loci in different continental populations showed that they could be used as valuable loci for forensic human identification purpose. Besides, high informativeness values (>0.1) were observed in six loci which could be further employed for forensic ancestry analyses. Finally, 18 loci were successfully developed into a multiplex panel and detected by the next generation sequencing (NGS) technology. Further analyses of these 18 loci in the studied Shaanxi Han population showed that 15 loci exhibited relatively high expected heterozygosities (>0.5). Cumulative power of discrimination (0.999 999 999 99 4835) of these 18 loci revealed that the multiplex panel could also be utilized for human identifications in the studied Shaanxi Han population.  相似文献   

8.
Matching peptide tandem mass spectra to their cognate amino acid sequences in databases is a key step in proteomics. It is usually performed by assigning a score to a spectrum-sequence combination. De novo sequencing or partial de novo sequencing is useful for organisms without sequenced genome or for peptides with unexpected modifications. Here we use a very large, high accuracy proteomic dataset to investigate how much peptide sequence information is present in tandem mass spectra generated in a linear ion trap (LTQ). More than 400,000 identified tandem mass spectra from a single human cancer cell line project were assigned to 26,896 distinct peptide sequences. The average absolute fragment mass accuracy is 0.102 Da. There are on average about four complementary b- and y-ions; both series are equally represented but y ions are 2- to 3-fold more intense up to mass 1000. Half of all spectra contain uninterrupted b- or y-ion series of at least six amino acids and combining b- and y-ion information yields on average seven amino acid sequences. These sequences are almost always unique in the human proteome, even without using any precursor or peptide sequence tag information. Thus, optimal de novo sequencing algorithms should be able to obtain substantial sequence information in at least half of all cases.  相似文献   

9.
Fagerquist CK  Yee E  Miller WG 《The Analyst》2007,132(10):1010-1023
Protein biomarkers observed in the matrix-assisted laser desorption/ionization time-of-flight mass spectra (MALDI-TOF-MS) of cell lysates of three strains of Campylobacter coli, two strains of C. lari and one strain of C. concisus have been identified by 'bottom-up' proteomic techniques. The significant findings are as follows. First, the protein biomarkers identified were: PhnA-related protein, 4-oxalocrotonate tautomerase (DmpI)-related protein, NifU-like protein, cytochrome c, DNA-binding protein HU, 10 kDa chaperonin, thioredoxin, as well as several conserved hypothetical and ribosomal proteins. Second, variations in the biomarker ion m/z in MALDI-TOF-MS spectra across species and strains are the result of variations in the amino acid sequence of the protein due to non-synonymous mutations of the biomarker gene. Third, the most common post-translational modifications (PTMs) were the removal of the N-terminal methionine and N-terminal signal peptides. However, in the case of the NifU protein (an iron-sulfur cluster transport protein), post-translational cleavage occurred from the C-terminus. Fourth, only the genomes of the C. coli strain RM2228 and C. lari strain RM2100 have been sequenced; thus, proteomic identification of the proteins of the other strains in this study relied upon sequence homology to the genomic sequence of these strains as well as the genomes of sequences of other Campylobacter strains. In some cases, the determination of the full amino acid sequence of a protein biomarker from a genomically non-sequenced strain was accomplished by combining non-overlapping partial sequences from proteomic identifications of genomically-sequenced strains that were of the same species (or of a different species) to that of the non-sequenced strain. The accuracy of this composite sequence was confirmed by both MS and MS/MS. It was necessary, in some cases, to perform de novo sequencing on 'gaps' in the composite sequence that were not homologous to any genomically-sequenced strain. In order to validate the composite sequence approach, composite sequences were further confirmed by subsequent DNA sequencing of the biomarker gene. Thus, using the composite sequence approach, it was possible to determine the full amino acid sequence of an unknown protein from a genomically non-sequenced bacterial strain without the necessity of either sequencing the biomarker gene or performing full de novo MS/MS sequencing. The sequence obtained could then be used as a strain-specific biomarker for analysis by 'top-down' proteomics techniques.  相似文献   

10.
Advances in DNA sequencing technology over the past decade have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many algorithms for assembly of sequenced genomic material, they often experience difficulties in constructing complete genomic sequences. Instead, they produce long genomic subsequences (scaffolds), which then become a subject to scaffold assembly aimed at reconstruction of their order along genome chromosomes. The balance between reliability and cost for scaffold assembly is not there just yet, which inspires one to seek for new approaches to address this problem. We present a new method for scaffold assembly based on the analysis of gene orders and genome rearrangements in multiple related genomes (some or even all of which may be fragmented). Evaluation of the proposed method on artificially fragmented mammalian genomes demonstrates its high reliability. We also apply our method for incomplete anophelinae genomes, which expose high fragmentation, and further validate the assembly results with referenced-based scaffolding. While the two methods demonstrate consistent results, the proposed method is able to identify more assembly points than the reference-based scaffolding.  相似文献   

11.
MotivationCheap and fast next generation sequencing (NGS) technologies facilitate research of de novo assembly greatly. The reliability of contigs is critical to construct reliable scaffolding. However, contigs generated from most assemblers contain errors because of the limitation of assembly strategy and computation complexity. Among all these errors, the misassembly error is one of the most harmful types.ResultsIn this paper, we propose a new method named “PECC” to identify and correct misassembly errors in contigs based on the paired-end read distribution. PECC extracts sequence regions with lower paired-end reads supports and verifies them based on the distribution of paired-end supports. To validate the effectiveness of PECC, we applied PECC to the contigs produced by five popular assemblers on four real datasets, and we also carried out experiments to analyze the influences of PECC on scaffolding. The results show that PECC can reduce misassembly errors and improve the performance of scaffolding results, which demonstrate the promising applications of PECC in de novo assembly.  相似文献   

12.
Matrix-assisted laser desorption ionization (MALDI), Peptide Mass Fingerprinting (PMF) and MALDI-MS/MS ion search (using MASCOT) have become the preferred methods for high-throughput identification of proteins. Unfortunately, PMF can be ambiguous, mainly when the genome of the organism under investigation is unknown and the quality of spectra generated is poor and does not allow confident identification. The post-source decay (PSD) fragmentation of singly charged tryptic peptide ions generated by MALDI-TOF/TOF typically results in low fragmentation efficiency and/or complex spectra, including backbone fragmentation ions (series b and y), internal fragmentation etc. Interpreting these data either manually and/or using de novo sequencing software can frequently be a challenge. To overcome this limitation when studying the proteome of adult Angiostrongylus costaricensis, a nematode with unknown genome, we have used chemical N-terminal derivatization of the tryptic peptides with 4-sulfophenyl isothiocyanate (SPITC) prior to MALDI-TOF/TOF MS. This methodology has recently been reported to enhance the quality of MALDI-TOF/TOF-PSD data, allowing the obtainment of complete sequence of most of the peptides and thus facilitating de novo peptide sequencing. Our approach, consisting of SPITC derivatization along with manual spectra interpretation and Blast analysis, was able to positively identify 76% of analyzed samples, whereas MASCOT analysis of derivatized samples, MASCOT analysis of nonderivatized samples and PMF of nonderivatized samples yielded only 35, 41 and 12% positive identifications, respectively. Moreover, de novo sequencing of SPITC modified peptides resulted in protein sequences not available in NCBInr database paving the way to the discovery of new protein molecules.  相似文献   

13.
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing software package, PEAKS, to extract amino acid sequence information without the use of databases. PEAKS uses a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum. The output of the software gives amino acid sequences with confidence scores for the entire sequences, as well as an additional novel positional scoring scheme for portions of the sequences. The performance of PEAKS is compared with Lutefisk, a well-known de novo sequencing software, using quadrupole-time-of-flight (Q-TOF) data obtained for several tryptic peptides from standard proteins.  相似文献   

14.
15.
Matrix-assisted laser desorption ionization (MALDI), Peptide Mass Fingerprinting (PMF) and MALDI-MS/MS ion search (using MASCOT) have become the preferred methods for high-throughput identification of proteins. Unfortunately, PMF can be ambiguous, mainly when the genome of the organism under investigation is unknown and the quality of spectra generated is poor and does not allow confident identification. The post-source decay (PSD) fragmentation of singly charged tryptic peptide ions generated by MALDI-TOF/TOF typically results in low fragmentation efficiency and/or complex spectra, including backbone fragmentation ions (series b and y), internal fragmentation etc. Interpreting these data either manually and/or using de novo sequencing software can frequently be a challenge. To overcome this limitation when studying the proteome of adult Angiostrongylus costaricensis, a nematode with unknown genome, we have used chemical N-terminal derivatization of the tryptic peptides with 4-sulfophenyl isothiocyanate (SPITC) prior to MALDI-TOF/TOF MS. This methodology has recently been reported to enhance the quality of MALDI-TOF/TOF-PSD data, allowing the obtainment of complete sequence of most of the peptides and thus facilitating de novo peptide sequencing. Our approach, consisting of SPITC derivatization along with manual spectra interpretation and Blast analysis, was able to positively identify 76% of analyzed samples, whereas MASCOT analysis of derivatized samples, MASCOT analysis of nonderivatized samples and PMF of nonderivatized samples yielded only 35, 41 and 12% positive identifications, respectively. Moreover, de novo sequencing of SPITC modified peptides resulted in protein sequences not available in NCBInr database paving the way to the discovery of new protein molecules.  相似文献   

16.
17.
De novo analysis of protein N-terminal sequence is important for identification of N-terminal proteolytic processing such as N-terminal methionine or signal peptide removal, or for the genome annotation of uncharacterized proteins. We introduce a de novo sequencing method of protein N terminus utilizing matrix-assisted laser desorption/ionization (MALDI) signal enhancing picolinamidination with bromine isotopic tag incorporated to the N terminus. The doublet signature of bromine in the tandem mass (MS/MS) spectrum distinguished N-terminal ion series from C-terminal ion series, facilitating de novo N-terminal sequencing of protein. The dual advantage of MALDI signal enhancement by the basic picolinamidine and b-ion selection aided by Br signature is demonstrated using a variety of peptides. The N-terminal sequences of myoglobin and hemoglobin as model proteins were determined by incorporating the Br tag to the N terminus of the proteins and obtaining a series of b-ions with Br signature by MS/MS analysis after chymotryptic digestion of the tagged proteins. The N-terminal peptide was selected for MS/MS analysis from the chymotryptic digest based on the Br signature in the mass spectrum. Identification of phosphorylation site as well as N-terminal sequencing of a phosphopeptide was straightforward.  相似文献   

18.
Much effort has focused on methods for detecting various genetic differences in individuals, including single nucleotide polymorphisms (SNPs). SNP can be characterized as a substitution, insertion, or deletion at a single base position on a DNA strand. There is expected to be on average one SNP for every 1000 bases of the human genome, and some variations located in genes are suspected to alter both the protein structure and the expression level. Therefore, highly sensitive techniques with a simple procedure would be desirable for a high-throughput screening of millions of SNPs widely dispersed throughout the human genome. In this short review, we consider recently reported unique techniques for genotyping in a homogeneous solution, and organize them in terms of the chemical and physical processes accelerated on DNA.  相似文献   

19.
Because protein identifications rely on matches with sequence databases, high-throughput proteomics is currently largely restricted to those species for which comprehensive sequence databases are available. The identification of proteins derived from organisms with unsequenced genomes mainly depends on homology searching. Here, we report the use of a simplified, gel-based, chemical derivatization strategy for de novo sequence analysis using a MALDI-TOF/TOF mass spectrometer. This approach allows the determination of de novo peptide sequences of up to 20 amino acid residues in length. The protocol was applied on a proteomic study of 2-D PAGE-separated proteins from Halorhodospira halophila, an extremophilic eubacterium with yet unsequenced genome. Using three different homology-based search algorithms, we were able to identify more than 30 proteins from this organism using subpicomole quantities of protein.  相似文献   

20.
Metagenomic studies suggest that only a small fraction of the viruses that exist in nature have been identified and studied. Characterization of unknown viral genomes is hindered by the many genomes populating any virus sample. A new method is reported that integrates drop‐based microfluidics and computational analysis to enable the purification of any single viral species from a complex mixed virus sample and the retrieval of complete genome sequences. By using this platform, the genome sequence of a 5243 bp dsDNA virus that was spiked into wastewater was retrieved with greater than 96 % sequence coverage and more than 99.8 % sequence identity. This method holds great potential for virus discovery since it allows enrichment and sequencing of previously undescribed viruses as well as known viruses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号