首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Accurate clustering of cells from single-cell RNA sequencing (scRNA-seq) data is an essential step for biological analysis such as putative cell type identification. However, scRNA-seq data has high dimension and high sparsity, which makes traditional clustering methods less effective to reflect the similarity between cells. Since genetic network fundamentally defines the functions of cell and deep learning shows strong advantages in network representation learning, we propose a novel scRNA-seq clustering framework ScGSLC based on graph similarity learning. ScGSLC effectively integrates scRNA-seq data and protein-protein interaction network to a graph. Then graph convolution network is employed by ScGSLC to embedding graph and clustering the cells by the calculated similarity between graphs. Unsupervised clustering results of nine public data sets demonstrate that ScGSLC shows better performance than the state-of-the-art methods.  相似文献   

3.
Single-cell RNA sequencing technologies have revolutionized biomedical research by providing an effective means to profile gene expressions in individual cells. One of the first fundamental steps to perform the in-depth analysis of single-cell sequencing data is cell type classification and identification. Computational methods such as clustering algorithms have been utilized and gaining in popularity because they can save considerable resources and time for experimental validations. Although selecting the optimal features (i.e., genes) is an essential process to obtain accurate and reliable single-cell clustering results, the computational complexity and dropout events that can introduce zero-inflated noise make this process very challenging. In this paper, we propose an effective single-cell clustering algorithm based on the ensemble feature selection and similarity measurements. We initially identify the set of potential features, then measure the cell-to-cell similarity based on the subset of the potentials through multiple feature sampling approaches. We construct the ensemble network based on cell-to-cell similarity. Finally, we apply a network-based clustering algorithm to obtain single-cell clusters. We evaluate the performance of our proposed algorithm through multiple assessments in real-world single-cell RNA sequencing datasets with known cell types. The results show that our proposed algorithm can identify accurate and consistent single-cell clustering. Moreover, the proposed algorithm takes relative expression as input, so it can easily be adopted by existing analysis pipelines. The source code has been made publicly available at https://github.com/jeonglab/scCLUE.  相似文献   

4.
Microorganisms are found throughout every corner of nature, and vast number of microorganisms is difficult to cultivate by classical microbiological techniques. The advent of metagenomics has revolutionized the field of microbial biotechnology. Metagenomics allow the recovery of genetic material directly from environmental niches without any cultivation techniques. Currently, metagenomic tools are widely employed as powerful tools to isolate and identify enzymes with novel biocatalytic activities from the uncultivable component of microbial communities. The employment of next-generation sequencing techniques for metagenomics resulted in the generation of large sequence data sets derived from various environments, such as soil, the human body and ocean water. This review article describes the state-of-the-art techniques and tools in metagenomics and discusses the potential of metagenomic approaches for the bioprospecting of industrial enzymes from various environmental samples. We also describe the unusual novel enzymes discovered via metagenomic approaches and discuss the future prospects for metagenome technologies.  相似文献   

5.
Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology allows one to explore gene expression heterogeneity among single-cells, thus providing important cancer cell evolution information. Single-cell DNA/RNA sequencing data usually have low genome coverage, which requires an extra step of amplification to accumulate enough samples. However, such amplification will introduce large bias and makes bioinformatics analysis challenging. Accurately modeling the distribution of sequencing data and effectively suppressing the bias influence is the key to success variations analysis.Recent advances demonstrate the technical noises by amplification are more likely to follow negative binomial distribution, a special case of Poisson distribution. Thus, we tackle the problem CNV detection by formulating it into a quadratic optimization problem involving two constraints, in which the underling signals are corrupted by Poisson distributed noises. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signals from single-cell sequencing data are anticipated to fit the CNVs patterns more accurately. An efficient numerical solution based on the classical alternating direction minimization method (ADMM) is tailored to solve the proposed model. We demonstrate the advantages of the proposed method using both synthetic and empirical single-cell sequencing data. Our experimental results demonstrate that the proposed method achieves excellent performance and high promise of success with single-cell sequencing data.  相似文献   

6.
7.
8.
Over the past decade, there have been remarkable advances in understanding the signaling pathways involved in cancer development. It is well-established that cancer is caused by the dysregulation of cellular pathways involved in proliferation, cell cycle, apoptosis, cell metabolism, migration, cell polarity, and differentiation. Besides, growing evidence indicates that extracellular matrix signaling, cell surface proteoglycans, and angiogenesis can contribute to cancer development. Given the genetic instability and vast intra-tumoral heterogeneity revealed by the single-cell sequencing of tumoral cells, the current approaches cannot eliminate the mutating cancer cells. Besides, the polyclonal expansion of tumor-infiltrated lymphocytes in response to tumoral neoantigens cannot elicit anti-tumoral immune responses due to the immunosuppressive tumor microenvironment. Nevertheless, the data from the single-cell sequencing of immune cells can provide valuable insights regarding the expression of inhibitory immune checkpoints/related signaling factors in immune cells, which can be used to select immune checkpoint inhibitors and adjust their dosage. Indeed, the integration of the data obtained from the single-cell sequencing of immune cells with immune checkpoint inhibitors can increase the response rate of immune checkpoint inhibitors, decrease the immune-related adverse events, and facilitate tumoral cell elimination. This study aims to review key pathways involved in tumor development and shed light on single-cell sequencing. It also intends to address the shortcomings of immune checkpoint inhibitors, i.e., their varied response rates among cancer patients and increased risk of autoimmunity development, via applying the data from the single-cell sequencing of immune cells.  相似文献   

9.
10.
Limited proteolysis is an important and widely used method for analyzing the tertiary structure and determining the domain boundaries of proteins. Here we describe a novel method for determining the N- and C-terminal boundary amino acid sequences of products derived from limited proteolysis using semi-specific and/or non-specific enzymes, with mass spectrometry as the only analytical tool. The core of this method is founded on the recognition that cleavage of proteins with non-specific proteases is not random, but patterned. Based on this recognition, we have the ability to determine the sequence of each proteolytic fragment by extracting a common association between data sets containing multiple potential sequences derived from two or more different mass spectral molecular weight measurements. Proteolytic product sequences derived from specific and non-specific enzymes can be accurately determined without resorting to the conventional time-consuming and laborious methods of SDS-PAGE and N-terminal sequencing analysis. Because of the sensitivity of mass spectrometry, multiple transient proteolysis intermediates can also be identified and analyzed by this method, which allows the ability to monitor the progression of proteolysis and thereby gain insight into protein structures.  相似文献   

11.
We describe the application of ensemble methods to binary classification problems on two pharmaceutical compound data sets. Several variants of single and ensembles models of k-nearest neighbors classifiers, support vector machines (SVMs), and single ridge regression models are compared. All methods exhibit robust classification even when more features are given than observations. On two data sets dealing with specific properties of drug-like substances (cytochrome P450 inhibition and "Frequent Hitters", i.e., unspecific protein inhibition), we achieve classification rates above 90%. We are able to reduce the cross-validated misclassification rate for the Frequent Hitters problem by a factor of 2 compared to previous results obtained for the same data set with different modeling techniques.  相似文献   

12.
13.
14.
The revolution in genome sequencing technologies has enabled the comprehensive detection of genomic variations in human cells, including inherited germline polymorphisms, de novo mutations, and postzygotic mutations. When these technologies are combined with techniques for isolating and expanding single-cell DNA, the landscape of somatic mosaicism in an individual body can be systematically revealed at a single-cell resolution. Here, we summarize three strategies (whole-genome amplification, microdissection of clonal patches in the tissue, and in vitro clonal expansion of single cells) that are currently applied for single-cell mutational analyses. Among these approaches, in vitro clonal expansion, particularly via adult stem cell-derived organoid culture technologies, yields the most sensitive and precise catalog of somatic mutations in single cells. Moreover, because it produces living mutant cells, downstream validation experiments and multiomics profiling are possible. Through the synergistic combination of organoid culture and genome sequencing, researchers can track genome changes at a single-cell resolution, which will lead to new discoveries that were previously impossible.Subject terms: Genomic analysis, Genomics, Adult stem cells, Genetic techniques  相似文献   

15.
16.
17.
MotivationSequencing-based methods to examine fundamental features of the genome, such as gene expression and chromatin structure, rely on inferences from the abundance and distribution of reads derived from Illumina sequencing. Drawing sound inferences from such experiments relies on appropriate mathematical methods to model the distribution of reads along the genome, which has been challenging due to the scale and nature of these data.ResultsWe propose a new framework (SRSFseq) based on square root slope functions shape analysis to analyse Illumina sequencing data. In the new approach the basic unit of information is the density of mapped reads over region of interest located on the known reference genome. The densities are interpreted as shapes and a new shape analysis model is proposed. An equivalent of a Fisher test is used to quantify the significance of shape differences in read distribution patterns between groups of density functions in different experimental conditions. We evaluated the performance of this new framework to analyze RNA-seq data at the exon level, which enabled the detection of variation in read distributions and abundances between experimental conditions not detected by other methods. Thus, the method is a suitable supplement to the state-of-the-art count based techniques. The variety of density representations and flexibility of mathematical design allow the model to be easily adapted to other data types or problems in which the distribution of reads is to be tested. The functional interpretation and SRSF phase-amplitude separation technique give an efficient noise reduction procedure improving the sensitivity and specificity of the method.  相似文献   

18.
We present an efficient scheme for parametrizing complex molecule–surface force fields from ab initio data. The cost of producing a sufficient fitting library is mitigated using a 2D periodic embedded slab model made possible by the quantum mechanics/molecular mechanics scheme in CP2K. These results were then used in conjunction with genetic algorithm (GA) methods to optimize the large parameter sets needed to describe such systems. The derived potentials are able to well reproduce adsorption geometries and adsorption energies calculated using density functional theory. Finally, we discuss the challenges in creating a sufficient fitting library, determining whether or not the GA optimization has completed, and the transferability of such force fields to similar molecules. © 2015 Wiley Periodicals, Inc.  相似文献   

19.
20.
In this work, we begin from the octahedron with its eight triangular faces, we reached the icosahedron with 20 triangular faces, then to the icosadodecahedron with 32 faces, 20 triangular faces that caxne from the icosahedron and 12 pentagons. We describe two important methods for which this can be performed. Both methods are used in order to present the methods, and techniques needed for actual applications in many chemical applications. The basic fundamental methods are derived here. Through these manipulations we also describe the transformation through other chemically interesting geometries that are commonly encountered in group theoretical discussions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号