首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many microarray experiments involve examining the time elapsed prior to the occurrence of a specific event. One purpose of these studies is to relate the gene expressions to the survival times. The Cox proportional hazards model has been the major tool for analyzing such data. The transformation model provides a viable alternative to the classical Cox's model. We investigate the use of transformation models in microarray survival data in this paper. The transformation model, which can be viewed as a generalization of proportional hazards model and the proportional odds model, is more robust than the proportional hazards model, because it is not susceptible to erroneous results for cases when the assumption of proportional hazards is violated. We analyze a gene expression dataset from Beer et al. [Beer, D.G., Kardia, S.L., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M., Iannettoni, M.D., Orringer, M.B., Hanash, S., 2002. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8 (8), 816-824] and show that the transformation model provides higher prediction precision than the proportional hazards model.  相似文献   

2.
The development and diverse application of microarray and next generation sequencing technologies has made the meta-analysis widely used in expression data analysis. Although it is commonly accepted that pathway, network and systemic level approaches are more reproducible than reductionism analyses, the meta-analysis of prostate cancer associated molecular signatures at the pathway level remains unexplored. In this article, we performed a meta-analysis of 10 prostate cancer microarray expression datasets to identify the common signatures at both the gene and pathway levels. As the enrichment analysis result of GeneGo's database and KEGG database, 97.8% and 66.7% of the signatures show higher similarity at pathway level than that at gene level, respectively. Analysis by using gene set enrichment analysis (GSEA) method also supported the hypothesis. Further analysis of PubMed citations verified that 207 out of 490 (42%) pathways from GeneGo and 48 out of 74 (65%) pathways from KEGG were related to prostate cancer. An overlap of 15 enriched pathways was observed in at least eight datasets. Eight of these pathways were first described as being associated with prostate cancer. In particular, endothelin-1/EDNRA transactivation of the EGFR pathway was found to be overlapped in nine datasets. The putative novel prostate cancer related pathways identified in this paper were indirectly supported by PubMed citations and would provide essential information for further development of network biomarkers and individualized therapy strategy for prostate cancer.  相似文献   

3.
4.
Accurately and reliably identifying the actual number of clusters present with a dataset of gene expression profiles, when no additional information on cluster structure is available, is a problem addressed by few algorithms. GeneMCL transforms microarray analysis data into a graph consisting of nodes connected by edges, where the nodes represent genes, and the edges represent the similarity in expression of those genes, as given by a proximity measurement. This measurement is taken to be the Pearson correlation coefficient combined with a local non-linear rescaling step. The resulting graph is input to the Markov Cluster (MCL) algorithm, which is an elegant, deterministic, non-specific and scalable method, which models stochastic flow through the graph. The algorithm is inherently affected by any cluster structure present, and rapidly decomposes a graph into cohesive clusters. The potential of the GeneMCL algorithm is demonstrated with a 5,730 gene subset (IGS) of the Van't Veer breast cancer database, for which the clusterings are shown to reflect underlying biological mechanisms.  相似文献   

5.
Normalization is an essential step in microarray data mining and analysis. For cDNA microarray data, the primary purpose of normalization is removing the intensity-dependent bias across different slides within an experimental group or between multiple groups. The locally weighted regression (lowess) procedure has been widely used for this purpose but can be comparatively time consuming when the dataset becomes relatively large. In this study, we applied wavelet regressions, a new smoothing method for recovering a regression function from data that is supposed to outperform other methods in many cases, such as spline or local polynomial fitting, to normalize two cDNA microarray datasets. Relative to the lowess procedure, we found that wavelet regressions not only produced reliable normalization results but also ran much faster. The computing speed represents one of the most important advantages over other algorithms, especially when one is interested in analyzing a large microarray experiment involving hundreds of slides.  相似文献   

6.
Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.  相似文献   

7.
8.
Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.  相似文献   

9.
Microfluidic DNA microarray analysis: a review   总被引:1,自引:0,他引:1  
Microarray DNA hybridization techniques have been used widely from basic to applied molecular biology research. Generally, in a DNA microarray, different probe DNA molecules are immobilized on a solid support in groups and form an array of microspots. Then, hybridization to the microarray can be performed by applying sample DNA solutions in either the bulk or the microfluidic manner. Because the immobilized probe DNA binds and retains its complementary target DNA, detection is achieved through the read-out of the tagged markers on the sample target molecules. The recent microfluidic hybridization method shows the advantages of less sample usage and reduced incubation time. Here, sample solutions are confined in microfabricated channels and flow through the probe microarray area. The high surface-to-volume ratio in microchannels of nanolitre volume greatly enhanced the sensitivity as obtained with the bulk solution method. To generate nanolitre flows, different techniques have been developed, and this including electrokinetic control, vacuum suction and syringe pumping. The latter two are pressure-driven methods which are more flexible without the need of considering the physicochemical properties of solutions. Recently, centrifugal force is employed to drive liquid movement in microchannels. This method utilizes the body force from the liquid itself and there are no additional solution interface contacts such as from electrodes or syringes and tubing. Centrifugal force driven flow also features the ease of parallel hybridizations. In this review, we will summarize the recent advances in microfluidic microarray hybridization and compare the applications of various flow methods.  相似文献   

10.
Microarrays are becoming a ubiquitous tool of research in life sciences. However, the working principles of microarray-based methodologies are often misunderstood or apparently ignored by the researchers who actually perform and interpret experiments. This in turn seems to lead to a common over-expectation regarding the explanatory and/or knowledge-generating power of microarray analyses. In this note we intend to explain basic principles of five (5) major groups of analytical techniques used in studies of microarray data and their interpretation: the principal component analysis (PCA), the independent component analysis (ICA), the t-test, the analysis of variance (ANOVA), and self organizing maps (SOM). We discuss answers to selected practical questions related to the analysis of microarray data. We also take a closer look at the experimental setup and the rules, which have to be observed in order to exploit microarrays efficiently. Finally, we discuss in detail the scope and limitations of microarray-based methods. We emphasize the fact that no amount of statistical analysis can compensate for (or replace) a well thought through experimental setup. We conclude that microarrays are indeed useful tools in life sciences but by no means should they be expected to generate complete answers to complex biological questions. We argue that even well posed questions, formulated within a microarray-specific terminology, cannot be completely answered with the use of microarray analyses alone.  相似文献   

11.
Disposable screen-printed electrodes (SPCE) were modified using a cosmetic product to partially block the electrode surface in order to obtain a microelectrode array. The microarrays formed were electropolymerized with aniline. Scanning electron microscopy was used to evaluate the modified and polymerized electrode surface. Electrochemical characteristics of the constructed sensor for cadmium analysis were evaluated by cyclic and square-wave voltammetry. Optimized stripping procedure in which the preconcentration of cadmium was achieved by depositing at ?1.20 V (vs. Ag/AgCl) resulted in a well defined anodic peak at approximately ?0.7 V at pH 4.6. The achieved limit of detection was 4 × 10?9 mol dm?3. Spray modified and polymerized microarray electrodes were successfully applied to quantify cadmium in fish sample digests.  相似文献   

12.
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, nave Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.  相似文献   

13.
Wu Z  Luo J  Ge Q  Zhang D  Wang Y  Jia C  Lu Z 《Analytica chimica acta》2007,603(2):199-204
Aberrant DNA methylation of CpG site in the gene promoter region has been confirmed to be closely associated with carcinogenesis. In this present study, a new method based on the allele-specific extension on microarray technique for detecting changes of DNA methylation in cancer was developed. The target gene regions were amplified from the bisulfite treated genomic DNA (gDNA) with modified primers and treated with exonuclease to generate single-strand targets. Allele-specific extension of the immobilized primers took place along a stretch of target sequence with the presence of DNA polymerase and Cy5-labeled dGTP. To control the false positive signals, the hybridization condition, DNA polymerase, extension time and primers design were optimized. Two breast tumor-related genes (P16 and E-cadherin) were analyzed with this present method successfully and all the results were compatible with that of traditional methylation-specific PCR. The experiments results demonstrated that this DNA microarray-based method could be applied as a high throughput tool for methylation status analysis of the cancer-related genes, which could be widely used in cancer diagnosis or the detection of recurrence.  相似文献   

14.
With the proliferation of related microarray studies by independent groups, a natural approach to analysis would be to combine the results across studies. In this article, we address a meta-analysis of the gene expression data on imatinib resistance in chronic myelogenous leukemia. First, an analysis of the overlapping among 6 published studies revealed that only 3 genes were coincident between 2 studies. A later reprocessing using different methods on 4 publicly available datasets revealed that 2 extra genes were overlapped between two sets. Both poor overlappings may be due to large differences in the sample source, the microarray platforms used, and a small difference in gene expression between the imatinib non-responder and responder patients. A search of common genes inside 4 public datasets afforded 404 well defined genes. Nevertheless, this necessary condition for meta-analysis caused the loss of many genes of possible interest. The expression signals of the common genes in the four datasets were reanalyzed using three summary statistical methods for combining quantitative information: Fisher, Stouffer and effect-size. Taking the three methods together and using an FDR < 0.10 threshold, a gene-list with 33 differentially expressed genes was found. Considering all the reanalysis approaches used in this work, a final gene-list with 38 differentially expressed genes is reported. Despite the important limitations to this microarray meta-analysis, the presented procedures and integrated gene-list may have some potential value as regards imatinib resistance in CML patients since it is the first attempt to integrate evidence about gene-lists in this area.  相似文献   

15.
Gene expression data sets hold the promise to provide cancer diagnosis on the molecular level. However, using all the gene profiles for diagnosis may be suboptimal. Detection of the molecular signatures not only reduces the number of genes needed for discrimination purposes, but may elucidate the roles they play in the biological processes. Therefore, a central part of diagnosis is to detect a small set of tumor biomarkers which can be used for accurate multiclass cancer classification. This task calls for effective multiclass classifiers with built-in biomarker selection mechanism. We propose the sparse optimal scoring (SOS) method for multiclass cancer characterization. SOS is a simple prototype classifier based on linear discriminant analysis, in which predictive biomarkers can be automatically determined together with accurate classification. Thus, SOS differentiates itself from many other commonly used classifiers, where gene preselection must be applied before classification. We obtain satisfactory performance while applying SOS to several public data sets.  相似文献   

16.
Microarrays are used to simultaneously determine the expressions of thousands of genes. An important application of microarrays is in the classification of samples into classes of interest (e.g. either healthy cells or tumour cells). Discriminant partial least squares (DPLS) has often been used for this purpose. In this paper, we describe an improvement to DPLS that uses kernel-based probability density functions and the Bayes rule to classify samples whilst keeping the option of not classifying the sample if this cannot be done with sufficient confidence. With this approach, those samples outside the boundaries of the known classes or from the ambiguity region between classes are rejected and only samples with a high probability of being correctly classified are indeed classified. The optimal model is found by simultaneously minimizing the misclassification and rejection costs. The method (p-DPLS with reject option) was tested with two datasets. For the human cancers dataset the accuracy (obtained by leave-one-out cross-validation) was improved from 97% to 99% when compared to p-DPLS without reject option. For the breast cancer dataset, p-DPLS with reject option was able to reject 100% of the test samples that did not belong to any of the modelled classes. These samples would have been misclassified if the reject option had not been considered.  相似文献   

17.
Applied Biochemistry and Biotechnology - Reaction rate determination from experimental data is generally an essential part of evaluating enzyme or microorganism growth kinetics and the effects on...  相似文献   

18.
DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications.  相似文献   

19.
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subsp...  相似文献   

20.
Dimension reduction is a crucial technique in machine learning and data mining, which is widely used in areas of medicine, bioinformatics and genetics. In this paper, we propose a two-stage local dimension reduction approach for classification on microarray data. In first stage, a new L1-regularized feature selection method is defined to remove irrelevant and redundant features and to select the important features (biomarkers). In the next stage, PLS-based feature extraction is implemented on the selected features to extract synthesis features that best reflect discriminating characteristics for classification. The suitability of the proposal is demonstrated in an empirical study done with ten widely used microarray datasets, and the results show its effectiveness and competitiveness compared with four state-of-the-art methods. The experimental results on St Jude dataset shows that our method can be effectively applied to microarray data analysis for subtype prediction and the discovery of gene coexpression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号