首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It has been shown that the generalized F-statistics can give satisfactory performances in identifying differentially expressed genes with microarray data. However, for some complex diseases, it is still possible to identify a high proportion of false positives because of the modest differential expressions of disease related genes and the systematic noises of microarrays. The main purpose of this study is to develop statistical methods for Affymetrix microarray gene expression data so that the impact on false positives from non-expressed genes can be reduced. I proposed two novel generalized F-statistics for identifying differentially expressed genes and a novel approach for estimating adjusting factors. The proposed statistical methods systematically combine filtering of non-expressed genes and identification of differentially expressed genes. For comparison, the discussed statistical methods were applied to an experimental data set for a type 2 diabetes study. In both two- and three-sample analyses, the proposed statistics showed improvement on the control of false positives.  相似文献   

2.
Lists of differentially expressed genes (DEGs) detected often show low reproducibility even in technique replicate experiments. The reproducibility is even lower for those real cancer data with large biological variations and limited number of samples. Since existing methods for identifying differentially expressed genes treat each gene separately, they cannot circumvent the problem of low reproducibility. Considering correlation structures of genes may help to mitigate the effect of errors on individual gene estimates and thus get more reliable lists of DEGs. We borrowed information from large amount of existing microarray data to define the expression dependencies amongst genes. We use this prior knowledge of dependencies amongst genes to adjust the significance rank of DEGs. We applied our method and four popular ranking algorithms including mean fold change (FC), SAM, t‐statistic and Wilcoxon rank sum‐test on two cancer microarray datasets. Our method achieved higher reproducibility than other methods across a range of sample sizes. Furthermore, our method obtained higher accuracy than other methods, especially when the sample size is small. The results demonstrate that considering the dependencies amongst genes helps to adjust the significance rank of genes and find those truly differentially expressed genes.  相似文献   

3.
With the rapid development of DNA microarray technology and next-generation technology, a large number of genomic data were generated. So how to extract more differentially expressed genes from genomic data has become a matter of urgency. Because Low-Rank Representation (LRR) has the high performance in studying low-dimensional subspace structures, it has attracted a chunk of attention in recent years. However, it does not take into consideration the intrinsic geometric structures in data.In this paper, a new method named Laplacian regularized Low-Rank Representation (LLRR) has been proposed and applied on genomic data, which introduces graph regularization into LRR. By taking full advantages of the graph regularization, LLRR method can capture the intrinsic non-linear geometric information among the data. The LLRR method can decomposes the observation matrix of genomic data into a low rank matrix and a sparse matrix through solving an optimization problem. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Therefore, the differentially expressed genes can be selected according to the sparse matrix. Finally, we use the GO tool to analyze the selected genes and compare the P-values with other methods.The results on the simulation data and two real genomic data illustrate that this method outperforms some other methods: in differentially expressed gene selection.  相似文献   

4.
建立了一种基于不相交主成分分析(Disjoint PCA)和遗传算法(GA)的特征变量选择方法, 并用于从基因表达谱(Gene expression profiles)数据中识别差异表达的基因. 在该方法中, 用不相交主成分分析评估基因组在区分两类不同样品时的区分能力; 用GA寻找区分能力最强的基因组; 所识别基因的偶然相关性用统计方法评估. 由于该方法考虑了基因间的协同作用更接近于基因的生物过程, 从而使所识别的基因具有更好的差异表达能力. 将该方法应用于肝细胞癌(HCC)样品的基因芯片数据分析, 结果表明, 所识别的基因具有较强的区分能力, 优于常用的基因芯片显著性分析(Significance analysis of microarrays, SAM)方法.  相似文献   

5.
With the proliferation of related microarray studies by independent groups, a natural approach to analysis would be to combine the results across studies. In this article, we address a meta-analysis of the gene expression data on imatinib resistance in chronic myelogenous leukemia. First, an analysis of the overlapping among 6 published studies revealed that only 3 genes were coincident between 2 studies. A later reprocessing using different methods on 4 publicly available datasets revealed that 2 extra genes were overlapped between two sets. Both poor overlappings may be due to large differences in the sample source, the microarray platforms used, and a small difference in gene expression between the imatinib non-responder and responder patients. A search of common genes inside 4 public datasets afforded 404 well defined genes. Nevertheless, this necessary condition for meta-analysis caused the loss of many genes of possible interest. The expression signals of the common genes in the four datasets were reanalyzed using three summary statistical methods for combining quantitative information: Fisher, Stouffer and effect-size. Taking the three methods together and using an FDR < 0.10 threshold, a gene-list with 33 differentially expressed genes was found. Considering all the reanalysis approaches used in this work, a final gene-list with 38 differentially expressed genes is reported. Despite the important limitations to this microarray meta-analysis, the presented procedures and integrated gene-list may have some potential value as regards imatinib resistance in CML patients since it is the first attempt to integrate evidence about gene-lists in this area.  相似文献   

6.
Usage of DNA microarrays for gene expression analysis has become a common technique in many research laboratories and industry. Several target-labeling techniques have been devised to reduce the amount of RNA required for microarray experiments. In order to facilitate comparison and sharing of microarray data across the laboratories, it is crucial to determine the relative affects of these different sample-labeling techniques on the final results obtained from these experiments. We have compared two labeling methods designed for small RNA samples, an enzyme-based tyramide method (TSA) and a nucleic acid-based dendrimer method, to a more typical direct-labeling method that requires larger amounts of RNA. We observed comparable levels of reproducibility between replicate spots, with all the techniques. The dendrimer method resulted in a minimum number of spots (0.08%) that showed differential labeling due to a bias in the dyes used but resulted in highest background with only 71.4% of the spots measurable (above background) as compared to 93.3% for the TSA technique and 79.7% for the direct-labeling method. The results from differential labeling experiments showed that the dendrimer method performed better than the TSA method in detecting the same set of differentially expressed genes as observed with the direct method. Overall, our results show that the dendrimer method performs better than the TSA method. Differential labeling experiments using the TSA method show a non-linearity in the data at high intensities, leading to skewing of a portion of the data.  相似文献   

7.
Our ability to detect differentially expressed genes in a microarray experiment can be hampered when the number of biological samples of interest is limited. In this situation, we propose the use of information from self-self hybridizations to acuminate our inference of differential expression. A unified modelling strategy is developed to allow better estimation of the error variance. This principle is similar to the use of a pooled variance estimate in the two-sample t-test. The results from real dataset examples suggest that we can detect more genes that are differentially expressed in the combined models. Our simulation study provides evidence that this method increases sensitivity compared to using the information from comparative hybridizations alone, given the same control for false discovery rate. The largest increase in sensitivity occurs when the amount of information in the comparative hybridization is limited.  相似文献   

8.
When using microarray data for studying a complex disease such as cancer, it is a common practice to normalize data to force all arrays to have the same distribution of probe intensities regardless of the biological groups of samples. The assumption underlying such normalization is that in a disease the majority of genes are not differentially expressed genes (DE genes) and the numbers of up- and down-regulated genes are roughly equal. However, accumulated evidences suggest gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalization assumption. Here, we analyzed 7 large Affymetrix datasets of pair-matched normal and cancer samples for cancers collected in the NCBI GEO database. We showed that in 6 of these 7 datasets, the medians of perfect match (PM) probe intensities increased in cancer state and the increases were significant in three datasets, suggesting the assumption that all arrays have the same median probe intensities regardless of the biological groups of samples might be misleading. Then, we evaluated the effects of three currently most widely used normalization algorithms (RMA, MAS5.0 and dChip) on the selection of DE genes by comparing them with LVS which relies less on the above-mentioned assumption. The results showed using RMA, MAS5.0 and dChip may produce lots of false results of down-regulated DE genes while missing many up-regulated DE genes. At least for cancer study, normalizing all arrays to have the same distribution of probe intensities regardless of the biological groups of samples might be misleading. Thus, most current normalizations based on unreliable assumptions may distort biological differences between normal and cancer samples. The LVS algorithm might perform relatively well due to that it relies less on the above-mentioned assumption. Also, our results indicate that genes may be widely up-regulated in most human cancer.  相似文献   

9.
The aim of this study was to identify molecular markers associated with oncogenic differentiation in hepatocellular carcinoma (HCC). Using an unsupervised clustering method with a cDNA microarray, HCC (T) gene expression profiles and corresponding non-tumor tissues (NT) from 40 patients were analyzed. Of total 217 genes, 72 were expressed preferentially in HCC tissues. Among 186 differentially regulated genes, there were molecular chaperone and tumor suppressor gene clusters in the Edmondson grades I and II (GI/II) subclass compared with the liver cirrhosis (LC) subclass. The Edmondson grades III and IV (GIII/IV) subclass with a poor survival (P=0.0133) contained 122 differentially regulated genes with a cluster containing various metastasis- and invasion-related genes compared with the GI/II subclass. Immunohistochemical analysis revealed that ANXA2, one of the 72 genes preferentially expressed in HCC, was over-expressed in the sinusoidal endothelium and in malignant hepatocytes in HCC. The genes identified in the HCC subclasses will be useful molecular markers for the genesis and progression of HCC. In addition, ANXA2 might be a novel marker for tumor angiogenesis in HCC.  相似文献   

10.
Pathway analysis has become a popular technology tool for gaining insight into the underlying biology of differentially expressed genes and proteins. Although many sub-pathways analysis methods have been proposed, the function of these sub-pathways is generally implicit. In this paper, we propose a function sub-pathway analysis (FSPA) method which includes all nodes reaching a specific function node at the downstream of pathways. The perturbation degree of a sub-pathway is defined as the negative of the log p-value of the sub-pathway. The proposed FSPA allows analyzing the differentially expressed genes in a sub-pathway with diseases in explicit function level. Results from six datasets of colorectal cancer, lung cancer and pancreatic cancer show that the proposed FSPA could identify more cancer associated pathways. And more importantly, it could identify which sub-pathways lead to a specific abnormal function, and to what extent it affects the function. Furthermore, the proposed perturbation degree could also analyze the imbalance of some functions involved in some biological process. The results by FSPA are helpful for elucidating the underlying mechanisms of cancers and designing therapeutic strategies.  相似文献   

11.
双龙方组分诱导大鼠BMSCs分化的差异基因筛选及聚类分析   总被引:2,自引:1,他引:1  
利用基因芯片筛选双龙方有效组分(总人参皂苷及总丹酚酸)诱导大鼠骨髓间充质干细胞(BMSCs)类心肌细胞分化过程中的差异表达基因, 并对其进行聚类分析, 在基因水平研究了双龙方组分对大鼠BMSCs分化的影响. 对大鼠BMSCs进行分组培养, 分别收集10, 20, 30及40 d的细胞样本, 提取tRNA, 经基因芯片检测, 筛选出BMSCs变化过程中的差异表达基因并进行生物信息学分析, 同时通过差异表达基因对样本进行Hierarchical聚类分析. 在BMSCs的分化过程中, 筛选出179条差异表达基因, 经分析发现它们与能量代谢和信号传导等多类基因密切相关. 对样本进行聚类分析发现其聚为两大类: 10和20 d的样本聚为一类, 30和40 d的样本聚为一类. 说明BMSCs在20~30 d之间可能发生了显著的改变.  相似文献   

12.
In recent years, differentially expressed small RNAs have been widely used to identify the compositions of forensically relevant biological samples, and a vast number of such RNA candidates have been proposed. Nevertheless, when assessing the expression levels of target small RNAs using relative quantitative analysis methods, credible internal controls are usually required for reliable data normalization. Therefore, the identification of optimal reference genes is an important task. In this study, the expression profile of 18 small RNA reference genes was characterized in the Chinese Han population using TaqMan real-time quantitative PCR. Systematic evaluations of these candidate genes were performed based on their expression levels and stability in several common types of body fluids (i.e., venous blood, menstrual blood, saliva, semen, and vaginal secretions). Analysis results from the ΔCq method, BestKeeper, NormFinder, and geNorm were integrated by RefFinder for ranking and comparing the candidates in each type of body fluid. Among all the candidates, miR-191 was identified as the most suitable reference gene because it had a favorable ranking value in all tested samples. In addition, miR-423, miR-93, miR-484, and let-7i were also shown to be applicable reference genes. Overall, this study provides detailed assessment results of these candidate genes in different body fluids; thus, it could be used as a guide for the selection of reference genes according to their performance in the sample of choice.  相似文献   

13.
Qi Shen  Wei-Min Shi  Bao-Xian Ye 《Talanta》2007,71(4):1679-1683
In the analysis of gene expression profiles, the number of tissue samples with genes expression levels available is usually small compared with the number of genes. This can lead either to possible overfitting or even to a complete failure in analysis of microarray data. The selection of genes that are really indicative of the tissue classification concerned is becoming one of the key steps in microarray studies. In the present paper, we have combined the modified discrete particle swarm optimization (PSO) and support vector machines (SVM) for tumor classification. The modified discrete PSO is applied to select genes, while SVM is used as the classifier or the evaluator. The proposed approach is used to the microarray data of 22 normal and 40 colon tumor tissues and showed good prediction performance. It has been demonstrated that the modified PSO is a useful tool for gene selection and mining high dimension data.  相似文献   

14.
15.
Juan HF  Lin JY  Chang WH  Wu CY  Pan TL  Tseng MJ  Khoo KH  Chen ST 《Electrophoresis》2002,23(15):2490-2504
A biomic approach by integrating three independent methods, DNA microarray, proteomics and bioinformatics, is used to study the differentiation of human myeloid leukemia cell line HL-60 into macrophages when induced by 12-O-tetradecanoyl-phorbol-13-acetate (TPA). Analysis of gene expression changes at the RNA level using cDNA against an array of 6033 human genes showed that 5950 (98.6%) of the genes were expressed in the HL-60 cells. A total of 624 genes (10.5%) were found to be regulated during HL-60 cell differentiation. Most of these genes have not been previously associated with HL-60 cells and include genes encoded for secreted proteins as well as genes involved in cell adhesion, signaling transduction, and metabolism. Protein analysis using two-dimensional gel electrophoresis showed a total of 682 distinct protein spots; 136 spots (19.9%) exhibited quantitative changes between HL-60 control and macrophages. These differentially expressed proteins were identified by mass spectrometry. We developed a bioinformatics program, the Bulk Gene Search System (BGSS, http://www.sinica.edu.tw:8900/perl/genequery.pl) to search for the functions of genes and proteins identified by cDNA microarrays and proteomics. The identified regulated proteins and genes were classified into seven groups according to subcellular locations and functions. This powerful holistic biomic approach using cDNA microarray, proteomics coupled to bioinformatics can provide in-depth information on the impact and importance of the regulated genes and proteins for HL-60 differentiation.  相似文献   

16.
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.  相似文献   

17.
For cancer prediction using large-scale gene expression data, it often helps to incorporate gene interactions in the model. However it is not straightforward to simultaneously select important genes while modeling gene interactions. Some heuristic approaches have been proposed in the literature. In this paper, we study a unified modeling approach based on the ℓ1 penalized likelihood estimation that can simultaneously select important genes and model gene interactions. We will illustrate its competitive performance through simulation studies and applications to public microarray data.  相似文献   

18.
19.
Gene expression profiles based on high-throughput technologies contribute to molecular classifications of different cell lines and consequently to clinical diagnostic tests for cancer types and other diseases. Statistical techniques and dimension reduction methods have been devised for identifying minimal gene subset with maximal discriminative power. For sets of in silico candidate genes, assuming a unique gene signature or performing a parsimonious signature evaluation seems to be too restrictive in the context of in vitro signature validation. This is mainly due to the high complexity of largely correlated expression measurements and the existence of various oncogenic pathways. Consequently, it might be more advantageous to identify and evaluate multiple gene signatures with a similar good predictive power, which are referred to as near-optimal signatures, to be made available for biological validation. For this purpose we propose the bead-chain-plot approach originating from swarm intelligence techniques, and a small scale computational experiment is conducted in order to convey our vision. We simulate the acquisition of candidate genes by using a small pool of differentially expressed genes derived from microarray-based CNS tumour data. The application of the bead-chain-plot provides experimental evidence for improved classifications by using near-optimal signatures in validation procedures.  相似文献   

20.
Du W  Gu T  Tang LJ  Jiang JH  Wu HL  Shen GL  Yu RQ 《Talanta》2011,85(3):1689-1694
As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号