共查询到14条相似文献,搜索用时 0 毫秒
1.
Gene expression data are characterized by thousands even tens of thousands of measured genes on only a few tissue samples. This can lead either to possible overfitting and dimensional curse or even to a complete failure in analysis of microarray data. Gene selection is an important component for gene expression-based tumor classification systems. In this paper, we develop a hybrid particle swarm optimization (PSO) and tabu search (HPSOTS) approach for gene selection for tumor classification. The incorporation of tabu search (TS) as a local improvement procedure enables the algorithm HPSOTS to overleap local optima and show satisfactory performance. The proposed approach is applied to three different microarray data sets. Moreover, we compare the performance of HPSOTS on these datasets to that of stepwise selection, the pure TS and PSO algorithm. It has been demonstrated that the HPSOTS is a useful tool for gene selection and mining high dimension data. 相似文献
2.
Gene regulatory networks inference is currently a topic under heavy research in the systems biology field. In this paper, gene regulatory networks are inferred via evolutionary model based on time-series microarray data. A non-linear differential equation model is adopted. Gene expression programming (GEP) is applied to identify the structure of the model and least mean square (LMS) is used to optimize the parameters in ordinary differential equations (ODEs). The proposed work has been first verified by synthetic data with noise-free and noisy time-series data, respectively, and then its effectiveness is confirmed by three real time-series expression datasets. Finally, a gene regulatory network was constructed with 12 Yeast genes. Experimental results demonstrate that our model can improve the prediction accuracy of microarray time-series data effectively. 相似文献
3.
Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of features needed. The classification accuracy obtained by the proposed method has the highest classification accuracy in nine of the 11 gene expression data test problems, and is comparative to the classification accuracy of the two other test problems, as compared to the best results previously published. 相似文献
4.
Breast cancer therapy with classical chemotherapy is unable to eradicate breast cancer stem cells (BCSCs). Loss of p53 function causes growth and differentiation in cancer stem cells (CSCs); therefore, p53-targeted compounds can be developed for BCSCs-targeted drugs. Previously, hesperidin (HES), a citrus flavonoid, showed anticancer activities and increased efficacy of chemotherapy in several types of cancer in vitro and in vivo. This study was aimed to explore the key protein and molecular mechanism of hesperidin in the inhibition of BCSCs using bioinformatics and in vitro study. Bioinformatics analysis revealed about 75 potential therapeutic target proteins of HES in BCSCs (TH), in which TP53 was the only direct target protein (DTP) with a high degree score. Furthermore, the results of GO enrichment analysis showed that TH was taken part in the biological process of regulation of apoptosis and cell cycle. The KEGG pathway enrichment analysis also showed that TH is involved in several pathways, including cell cycle, p53 signaling pathway. In vitro experiment results showed that HES inhibited cell proliferation, mammosphere, and a colony formation, and migration in on MCF-7 3D cells (mammospheres). HES induced G0/G1 cell cycle arrest and apoptosis in MCF-7 cells 3D. In addition, HES treatment reduced the mRNA level of p21 but increased the mRNA level of cyclin D1 and p53 in the mammosphere. HES inhibits BCSCs in mammospheres. More importantly, this study highlighted p53 as a key protein in inhibition of BCSCs by HES. Future studies on the molecular mechanism are needed to validate the results of this study. 相似文献
5.
Clustering of gene expression data collected across time is receiving growing attention in the biological literature since time-course experiments allow one to understand dynamic biological processes and identify genes governed by the same processes. It is believed that genes demonstrating similar expression profiles over time might give an informative insight into how underlying biological mechanisms work. In this paper, we propose a method based on functional data analysis (FNDA) to cluster time-dependent gene expression profiles. Consideration of clustering problems using the FNDA setting provides ways to take time dependency into account by using basis function expansion to describe the partially observed curves. We also discuss how to choose the number of bases in the basis function expansion in FNDA. A synthetic cycle data and a real data are used to demonstrate the proposed method and some comparisons between the proposed and existing approaches using the adjusted Rand indices are made. 相似文献
6.
Rival penalized competitive learning (RPCL): a topology-determining algorithm for analyzing gene expression data 总被引:4,自引:0,他引:4
Nair TM Zheng CL Fink JL Stuart RO Gribskov M 《Computational Biology and Chemistry》2003,27(6):565-574
DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar. 相似文献
7.
8.
Antonov AV Tetko IV Kosykh D Surmeli D Mewes HW 《Computational Biology and Chemistry》2005,29(4):9072-293
Most studies concerning expression data analyses usually exploit information on the variability of gene intensity across samples. This information is sensitive to initial data processing, which affects the final conclusions. However expression data contains scale-free information, which is directly comparable between different samples. We propose to use the pairwise ratio of gene expression values rather than their absolute intensities for a classification of expression data. This information is stable to data processing and thus more attractive for classification analyses. In proposed schema of data analyses only information on relative gene expression levels in each sample is exploited. Testing on publicly available datasets leads to superior classification results. 相似文献
9.
DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications. 相似文献
10.
11.
A gene regulatory network (GRN) is a large and complex network consisting of interacting elements that, over time, affect each other’s state. The dynamics of complex gene regulatory processes are difficult to understand using intuitive approaches alone. To overcome this problem, we propose an algorithm for inferring the regulatory interactions from knock-out data using a Gaussian model combines with Pearson Correlation Coefficient (PCC). There are several problems relating to GRN construction that have been outlined in this paper. We demonstrated the ability of our proposed method to (1) predict the presence of regulatory interactions between genes, (2) their directionality and (3) their states (activation or suppression). The algorithm was applied to network sizes of 10 and 50 genes from DREAM3 datasets and network sizes of 10 from DREAM4 datasets. The predicted networks were evaluated based on AUROC and AUPR. We discovered that high false positive values were generated by our GRN prediction methods because the indirect regulations have been wrongly predicted as true relationships. We achieved satisfactory results as the majority of sub-networks achieved AUROC values above 0.5. 相似文献
12.
LIU JingJing CAI WenSheng & SHAO XueGuang Research Center for Analytical Sciences College of Chemistry Nankai University Tianjin China 《中国科学B辑(英文版)》2011,(5)
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subsp... 相似文献
13.
14.