首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Cancer samples clustering based on biomolecular data has been becoming an important tool for cancer classification. The recognition of cancer types is of great importance for cancer treatment. In this paper, in order to improve the accuracy of cancer recognition, we propose to use Laplacian regularized Low-Rank Representation (LLRR) to cluster the cancer samples based on genomic data. In LLRR method, the high-dimensional genomic data are approximately treated as samples extracted from a combination of several low-rank subspaces. The purpose of LLRR method is to seek the lowest-rank representation matrix based on a dictionary. Because a Laplacian regularization based on manifold is introduced into LLRR, compared to the Low-Rank Representation (LRR) method, besides capturing the global geometric structure, LLRR can capture the intrinsic local structure of high-dimensional observation data well. And what is more, in LLRR, the original data themselves are selected as a dictionary, so the lowest-rank representation is actually a similar expression between the samples. Therefore, corresponding to the low-rank representation matrix, the samples with high similarity are considered to come from the same subspace and are grouped into a class. The experiment results on real genomic data illustrate that LLRR method, compared with LRR and MLLRR, is more robust to noise and has a better ability to learn the inherent subspace structure of data, and achieves remarkable performance in the clustering of cancer samples.  相似文献   

2.
Pathway-based drug discovery can give full consideration to the efficacy of compounds in the systemic physiological environment. The recently emerged drug-pathway association identification approaches gain popularity due to its potential to decipher the mechanism of action and the targets of compounds. In this study, we propose a novel drug-pathway association identification method: Integrative Graph regularized Matrix Factorization (IGMF). It employs graph regularization to encode data geometrical information and prevent possible overfitting in prediction. Furthermore, it achieves parts-based and sparse data representation by imposing L1-norm regularization on the objective function.Empirical studies demonstrate that IGMF has strong advantages in identifying more new drug-pathway associations compared to its peer methods. It further shows a good capability to unveil the intrinsic structures of data. As an effective drug-pathway discovery method, it will inspire new analytics methods in this subfield.  相似文献   

3.
Microarrays have been widely used to identify differentially expressed genes. One related problem is to estimate the proportion of differentially expressed genes. For some complex diseases, the amount of differentially expressed genes may be relatively small and these genes may only have subtly differential expressions. For these microarray data, it is generally difficult to efficiently estimate the proportion of differentially expressed genes. In this study, I propose a likelihood-based method coupled with an expectation-maximization (E-M) algorithm for estimating the proportion of differentially expressed genes. The proposed method has favorable performances if either (i) the P values of differentially expressed genes are homogeneously distributed or (ii) the proportion of differentially expressed genes is relatively small. In both of these situations, I showed through simulations that the proposed method gave satisfactory performances when it was compared to other existing methods. As applications, these methods were applied to two microarray gene expression data sets generated from different platforms.  相似文献   

4.
It has been shown that the generalized F-statistics can give satisfactory performances in identifying differentially expressed genes with microarray data. However, for some complex diseases, it is still possible to identify a high proportion of false positives because of the modest differential expressions of disease related genes and the systematic noises of microarrays. The main purpose of this study is to develop statistical methods for Affymetrix microarray gene expression data so that the impact on false positives from non-expressed genes can be reduced. I proposed two novel generalized F-statistics for identifying differentially expressed genes and a novel approach for estimating adjusting factors. The proposed statistical methods systematically combine filtering of non-expressed genes and identification of differentially expressed genes. For comparison, the discussed statistical methods were applied to an experimental data set for a type 2 diabetes study. In both two- and three-sample analyses, the proposed statistics showed improvement on the control of false positives.  相似文献   

5.
Identifying patterns of association or dependency among high-dimensional biological datasets with sparse precision matrices remains a challenge. In this paper, we introduce a weighted sparse Gaussian graphical model that can incorporate prior knowledge to infer the structure of the network of trace element concentrations, including essential elements as well as toxic metals and metaloids measured in the human placentas. We present the weighted L1 penalized regularization procedure for estimating the sparse precision matrix in the setting of Gaussian graphical models. First, we use simulation models to demonstrate that the proposed method yields a better estimate of the precision matrix than the procedures that fail to account for the prior knowledge of the network structure. Then, we apply this method to estimate sparse element concentration matrices of placental biopsies from the New Hampshire Birth Cohort Study. The chemical architecture for elements is complex; thus, the method proposed herein was applied to infer the dependency structures of the elements using prior knowledge of their biological roles.  相似文献   

6.
This article develops a Bayesian method for fault detection and isolation using a sparse reconstruction framework. The normal/training data is assumed to follow a signal‐plus‐noise model, and an indicator matrix is used to show whether the test data is from a faulty process. The distribution of the indicator matrix is modeled by a Laplacian distribution, which forces the indicator matrix to be a sparse one, and a Gibbs sampler is derived to obtain the estimation/reconstruction of the indicator matrix, the unobserved signals, and other parameters like signal mean, covariance, and noise variance. The faulty variables can then be detected and isolated by inspecting whether corresponding rows of the indicator matrix are zero. The proposed Bayesian approach is data driven; it allows for simultaneous fault detection and isolation. A simulation study and an industrial case study are used to test the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

7.
8.
Gao HT  Li TH  Chen K  Li WG  Bi X 《Talanta》2005,66(1):65-73
Non-negative matrix factorization (NMF), with the constraints of non-negativity, has been recently proposed for multi-variate data analysis. Because it allows only additive, not subtractive, combinations of the original data, NMF is capable of producing region or parts-based representation of objects. It has been used for image analysis and text processing. Unlike PCA, the resolutions of NMF are non-negative and can be easily interpreted and understood directly. Due to multiple solutions, the original algorithm of NMF [D.D. Lee, H.S. Seung, Nature 401 (1999) 788] is not suitable for resolving chemical mixed signals. In reality, NMF has never been applied to resolving chemical mixed signals. It must be modified according to the characteristics of the chemical signals, such as smoothness of spectra, unimodality of chromatograms, sparseness of mass spectra, etc. We have used the modified NMF algorithm to narrow the feasible solution region for resolving chemical signals, and found that it could produce reasonable and acceptable results for certain experimental errors, especially for overlapping chromatograms and sparse mass spectra. Simulated two-dimensional (2-D) data and real GUJINGGONG alcohol liquor GC-MS data have been resolved soundly by NMF technique. Butyl caproate and its isomeric compound (butyric acid, hexyl ester) have been identified from the overlapping spectra. The result of NMF is preferable to that of Heuristic evolving latent projections (HELP). It shows that NMF is a promising chemometric resolution method for complex samples.  相似文献   

9.
Lists of differentially expressed genes (DEGs) detected often show low reproducibility even in technique replicate experiments. The reproducibility is even lower for those real cancer data with large biological variations and limited number of samples. Since existing methods for identifying differentially expressed genes treat each gene separately, they cannot circumvent the problem of low reproducibility. Considering correlation structures of genes may help to mitigate the effect of errors on individual gene estimates and thus get more reliable lists of DEGs. We borrowed information from large amount of existing microarray data to define the expression dependencies amongst genes. We use this prior knowledge of dependencies amongst genes to adjust the significance rank of DEGs. We applied our method and four popular ranking algorithms including mean fold change (FC), SAM, t‐statistic and Wilcoxon rank sum‐test on two cancer microarray datasets. Our method achieved higher reproducibility than other methods across a range of sample sizes. Furthermore, our method obtained higher accuracy than other methods, especially when the sample size is small. The results demonstrate that considering the dependencies amongst genes helps to adjust the significance rank of genes and find those truly differentially expressed genes.  相似文献   

10.
建立了一种基于不相交主成分分析(Disjoint PCA)和遗传算法(GA)的特征变量选择方法, 并用于从基因表达谱(Gene expression profiles)数据中识别差异表达的基因. 在该方法中, 用不相交主成分分析评估基因组在区分两类不同样品时的区分能力; 用GA寻找区分能力最强的基因组; 所识别基因的偶然相关性用统计方法评估. 由于该方法考虑了基因间的协同作用更接近于基因的生物过程, 从而使所识别的基因具有更好的差异表达能力. 将该方法应用于肝细胞癌(HCC)样品的基因芯片数据分析, 结果表明, 所识别的基因具有较强的区分能力, 优于常用的基因芯片显著性分析(Significance analysis of microarrays, SAM)方法.  相似文献   

11.
Neighborhood preserving embedding (NPE) is a useful tool for learning the manifold of high‐dimensional data. As a linear approximation of nonlinear locally linear embedding, NPE can be applied to dimensionality reduction by neighborhood preserving. However, the original NPE algorithm is an unsupervised method, which extracts features without any reference to the output information. In this paper, a supervised NPE framework is proposed for output‐related feature extraction in soft sensor applications. In the supervised NPE framework, the output information is utilized to guide the procedures for constructing the adjacent graph and calculating the weight matrix, with which the intrinsic structure of the data can be better described. For performance evaluation of the proposed method, experiments on a numerical example and an industrial debutanizer column process are carried out. The results show the effectiveness of the proposed framework.  相似文献   

12.
The noise problem of cancer sequencing data has been a problem that can’t be ignored. Utilizing considerable way to reduce noise of these cancer data is an important issue in the analysis of gene co-expression network. In this paper, we apply a sparse and low-rank method which is Robust Principal Component Analysis (RPCA) to solve the noise problem for integrated data of multi-cancers from The Cancer Genome Atlas (TCGA). And then we build the gene co-expression network based on the integrated data after noise reduction. Finally, we perform nodes and pathways mining on the denoising networks. Experiments in this paper show that after denoising by RPCA, the gene expression data tend to be orderly and neat than before, and the constructed networks contain more pathway enrichment information than unprocessed data. Moreover, learning from the betweenness centrality of the nodes in the network, we find some abnormally expressed genes and pathways proven that are associated with many cancers from the denoised network. The experimental results indicate that our method is reasonable and effective, and we also find some candidate suspicious genes that may be linked to multi-cancers.  相似文献   

13.
14.
Our ability to detect differentially expressed genes in a microarray experiment can be hampered when the number of biological samples of interest is limited. In this situation, we propose the use of information from self-self hybridizations to acuminate our inference of differential expression. A unified modelling strategy is developed to allow better estimation of the error variance. This principle is similar to the use of a pooled variance estimate in the two-sample t-test. The results from real dataset examples suggest that we can detect more genes that are differentially expressed in the combined models. Our simulation study provides evidence that this method increases sensitivity compared to using the information from comparative hybridizations alone, given the same control for false discovery rate. The largest increase in sensitivity occurs when the amount of information in the comparative hybridization is limited.  相似文献   

15.
Two fractal dimensions and the Liapunov exponent (LE) have been applied to detect noisy output signals from UV spectrophotometer (UV), thermogravimetric analyzer (TGA) and differential scanning calorimeter (DSC) apparatus of 1-ethyl-3-methylimidazolium ethylsulfate ionic liquid ([emim][EtSO4]). The data collected from these three pieces of equipment were classified before calculating LE, regularization (RD) and box dimensions (BD). The RD and LE are able individually to detect and quantify noisy output signals with a mean error value less than 5% in all cases tested. Given that the LE can be calculated using a really simple method, this chaotic parameter has been selected as the most suitable to detect noise of signals from these apparatus.  相似文献   

16.
With the proliferation of related microarray studies by independent groups, a natural approach to analysis would be to combine the results across studies. In this article, we address a meta-analysis of the gene expression data on imatinib resistance in chronic myelogenous leukemia. First, an analysis of the overlapping among 6 published studies revealed that only 3 genes were coincident between 2 studies. A later reprocessing using different methods on 4 publicly available datasets revealed that 2 extra genes were overlapped between two sets. Both poor overlappings may be due to large differences in the sample source, the microarray platforms used, and a small difference in gene expression between the imatinib non-responder and responder patients. A search of common genes inside 4 public datasets afforded 404 well defined genes. Nevertheless, this necessary condition for meta-analysis caused the loss of many genes of possible interest. The expression signals of the common genes in the four datasets were reanalyzed using three summary statistical methods for combining quantitative information: Fisher, Stouffer and effect-size. Taking the three methods together and using an FDR < 0.10 threshold, a gene-list with 33 differentially expressed genes was found. Considering all the reanalysis approaches used in this work, a final gene-list with 38 differentially expressed genes is reported. Despite the important limitations to this microarray meta-analysis, the presented procedures and integrated gene-list may have some potential value as regards imatinib resistance in CML patients since it is the first attempt to integrate evidence about gene-lists in this area.  相似文献   

17.
Impedance spectroscopy is a powerful characterization method to evaluate the performance of electrochemical systems. However, overlapping signals in the resulting impedance spectra oftentimes cause misinterpretation of the data. The distribution of relaxation times (DRT) method overcomes this problem by transferring the impedance data from the frequency domain into the time domain, which yields DRT spectra with an increased resolution. Unfortunately, the determination of the DRT is an ill-posed problem, and appropriate mathematical regularizations become inevitable to find suitable solutions. The Tikhonov algorithm is a widespread method for computing DRT data, but it leads to unlikely spectra due to necessary boundaries. Therefore, we introduce the application of three alternative algorithms (Gold, Richardson Lucy, Sparse Spike) for the determination of stable DRT solutions and compare their performances. As the promising Sparse Spike deconvolution has a limited scope when using one single regularization parameter, we furthermore replaced the scalar regularization parameter with a vector. The resulting method is able to calculate well-resolved DRT spectra.  相似文献   

18.
Regularized random-sampling high dimensional model representation (RS-HDMR)   总被引:1,自引:0,他引:1  
High Dimensional Model Representation (HDMR) is under active development as a set of quantitative model assessment and analysis tools for capturing high-dimensional input–output system behavior. HDMR is based on a hierarchy of component functions of increasing dimensions. The Random-Sampling High Dimensional Model Representation (RS-HDMR) is a practical approach to HDMR utilizing random sampling of the input variables. To reduce the sampling effort, the RS-HDMR component functions are approximated in terms of a suitable set of basis functions, for instance, orthonormal polynomials. Oscillation of the outcome from the resultant orthonormal polynomial expansion can occur producing interpolation error, especially on the input domain boundary, when the sample size is not large. To reduce this error, a regularization method is introduced. After regularization, the resultant RS-HDMR component functions are smoother and have better prediction accuracy, especially for small sample sizes (e.g., often few hundred). The ignition time of a homogeneous H2/air combustion system within the range of initial temperature, 1000 < T 0 < 1500 K, pressure, 0.1 < P < 100 atm and equivalence ratio of H2/O2, 0.2 < R < 10 is used for testing the regularized RS-HDMR.   相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号