首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome-wide expression study is a powerful genomic technology to quantify expression dynamics of genes in a genome. In gene expression study, gene set analysis has become the first choice to gain insights into the underlying biology of diseases or stresses in plants. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results from the primary downstream differential expression analysis. The gene set analysis approaches are well developed in microarrays and RNA-seq gene expression data analysis. These approaches mainly focus on analyzing the gene sets with gene ontology or pathway annotation data. However, in plant biology, such methods may not establish any formal relationship between the genotypes and the phenotypes, as most of the traits are quantitative and controlled by polygenes. The existing Quantitative Trait Loci (QTL)-based gene set analysis approaches only focus on the over-representation analysis of the selected genes while ignoring their associated gene scores. Therefore, we developed an innovative statistical approach, GSQSeq, to analyze the gene sets with trait enriched QTL data. This approach considers the associated differential expression scores of genes while analyzing the gene sets. The performance of the developed method was tested on five different crop gene expression datasets obtained from real crop gene expression studies. Our analytical results indicated that the trait-specific analysis of gene sets was more robust and successful through the proposed approach than existing techniques. Further, the developed method provides a valuable platform for integrating the gene expression data with QTL data.  相似文献   

2.
陈婧  张振兴 《光子学报》2021,50(2):103-114
由于高光谱图像分类中现有的无监督波段选择方法无法计算出波段之间的相似性,以及在选择过程中存在的高维度特性,本文提出了一种基于变精度粗糙集的贪婪无监督高光谱波段选择方法.首先利用变精度粗糙集定义了一种新的依赖度量,使得它对变精度粗糙集中误分类参数变得不敏感,从而充分利用波段之间的相似性.其次,提出了一种新的判别准则,找出...  相似文献   

3.
The availability of massive gene expression data has been challenging in terms of how to cure, process, and extract useful information. Here, we describe the use of entropic measures as discriminating criteria in cancer using the whole data set of gene expression levels. These methods were applied in classifying samples between tumor and normal type for 13 types of tumors with a high success ratio. Using gene expression, ordered by pathways, results in complexity–entropy diagrams. The map allows the clustering of the tumor and normal types samples, with a high success rate for nine of the thirteen, studied cancer types. Further analysis using information distance also shows good discriminating behavior, but, more importantly, allows for discriminating between cancer types. Together, our results allow the classification of tissues without the need to identify relevant genes or impose a particular cancer model. The used procedure can be extended to classification problems beyond the reported results.  相似文献   

4.
Multi-label learning is dedicated to learning functions so that each sample is labeled with a true label set. With the increase of data knowledge, the feature dimensionality is increasing. However, high-dimensional information may contain noisy data, making the process of multi-label learning difficult. Feature selection is a technical approach that can effectively reduce the data dimension. In the study of feature selection, the multi-objective optimization algorithm has shown an excellent global optimization performance. The Pareto relationship can handle contradictory objectives in the multi-objective problem well. Therefore, a Shapley value-fused feature selection algorithm for multi-label learning (SHAPFS-ML) is proposed. The method takes multi-label criteria as the optimization objectives and the proposed crossover and mutation operators based on Shapley value are conducive to identifying relevant, redundant and irrelevant features. The comparison of experimental results on real-world datasets reveals that SHAPFS-ML is an effective feature selection method for multi-label classification, which can reduce the classification algorithm’s computational complexity and improve the classification accuracy.  相似文献   

5.
Feature selection (FS) is a vital step in data mining and machine learning, especially for analyzing the data in high-dimensional feature space. Gene expression data usually consist of a few samples characterized by high-dimensional feature space. As a result, they are not suitable to be processed by simple methods, such as the filter-based method. In this study, we propose a novel feature selection algorithm based on the Explosion Gravitation Field Algorithm, called EGFAFS. To reduce the dimensions of the feature space to acceptable dimensions, we constructed a recommended feature pool by a series of Random Forests based on the Gini index. Furthermore, by paying more attention to the features in the recommended feature pool, we can find the best subset more efficiently. To verify the performance of EGFAFS for FS, we tested EGFAFS on eight gene expression datasets compared with four heuristic-based FS methods (GA, PSO, SA, and DE) and four other FS methods (Boruta, HSICLasso, DNN-FS, and EGSG). The results show that EGFAFS has better performance for FS on gene expression data in terms of evaluation metrics, having more than the other eight FS algorithms. The genes selected by EGFAGS play an essential role in the differential co-expression network and some biological functions further demonstrate the success of EGFAFS for solving FS problems on gene expression data.  相似文献   

6.
In this paper, a novel feature selection algorithm for inference from high-dimensional data (FASTENER) is presented. With its multi-objective approach, the algorithm tries to maximize the accuracy of a machine learning algorithm with as few features as possible. The algorithm exploits entropy-based measures, such as mutual information in the crossover phase of the iterative genetic approach. FASTENER converges to a (near) optimal subset of features faster than other multi-objective wrapper methods, such as POSS, DT-forward and FS-SDS, and achieves better classification accuracy than similarity and information theory-based methods currently utilized in earth observation scenarios. The approach was primarily evaluated using the earth observation data set for land-cover classification from ESA’s Sentinel-2 mission, the digital elevation model and the ground truth data of the Land Parcel Identification System from Slovenia. For land cover classification, the algorithm gives state-of-the-art results. Additionally, FASTENER was tested on open feature selection data sets and compared to the state-of-the-art methods. With fewer model evaluations, the algorithm yields comparable results to DT-forward and is superior to FS-SDS. FASTENER can be used in any supervised machine learning scenario.  相似文献   

7.
8.
By using functional integral methods we determine new evolution equations satisfied by the joint response-excitation probability density function (PDF) associated with the stochastic solution to first-order nonlinear partial differential equations (PDEs). The theory is presented for both fully nonlinear and for quasilinear scalar PDEs subject to random boundary conditions, random initial conditions or random forcing terms. Particular applications are discussed for the classical linear and nonlinear advection equations and for the advection–reaction equation. By using a Fourier–Galerkin spectral method we obtain numerical solutions of the proposed response-excitation PDF equations. These numerical solutions are compared against those obtained by using more conventional statistical approaches such as probabilistic collocation and multi-element probabilistic collocation methods. It is found that the response-excitation approach yields accurate predictions of the statistical properties of the system. In addition, it allows to directly ascertain the tails of probabilistic distributions, thus facilitating the assessment of rare events and associated risks. The computational cost of the response-excitation method is order magnitudes smaller than the one of more conventional statistical approaches if the PDE is subject to high-dimensional random boundary or initial conditions. The question of high-dimensionality for evolution equations involving multidimensional joint response-excitation PDFs is also addressed.  相似文献   

9.
In order to increase the classification accuracy, a new feature selection method, RFFIM-PCA, based on the random forest feature importance measure (RFFIM) and principal component analysis (PCA) for analyzing the near-infrared (NIR) spectra of tobacco, is presented in this paper. We applied the method to the classification of cigarettes' qualitative evaluation and also compared it with other methods. The result showed that RFFIM-PCA discriminates the high-dimensional data effectively and can be used to identify the cigarettes' quality. The feature selection filters the noises, while PCA eliminates the redundant features and reduces the dimensionalities as well. The experimental results showed that RFFIM-PCA successfully eliminated the noises and redundant features in high-dimensional data, leading to a promising improvement on the feature selection and classification accuracy.  相似文献   

10.
由于临床治疗剂量对正常组织造成严重损伤,使得放疗存在一些不足。刚刚兴起的肿瘤基因治疗同样存在一些弊端,如对肿瘤组织缺乏特异性,治疗基因表达水平有限,有潜在的生物危险性等。在一定程度上,辐射靶向诱导自杀基因和p53基因及p53靶基因靶向基因治疗可弥补以上两种疗法的不足。该治疗方法不仅可以弥补单独放疗或单独基因治疗的不足,而且可在降低各自治疗剂量的基础上提高疗效。目前已有几种符合要求的表达载体进入临床试验。着重介绍了离子辐射介导自杀基因或p53基因及p53靶基因的辐射靶向基因治疗的研究进展。Radiotherapy has some disadvantages due to the severe side-effect on the normal tissues at a curative dose of ionizing radiation (IR). Similarly, as a new developing approach, gene therapy also has some disadvantages, such as lack of specificity for tumors, limited expression of therapeutic gene, potential biological risk. To certain extent, above problems would be solved by the suicide genes or p53 gene and its target genes the rapies targeted by ionizing radiation. This strategy not only makes up the disadvantages from radiotherapy or gene therapy alone, but also promotes success rate on the base of lower dose. By present, there have been several vectors measuring up to be reaching clinical trials. This review focused on the development of the cancer gene therapy through suicide genes or p53 and its target genes mediated by IR.  相似文献   

11.
The filter feature selection algorithm is habitually used as an effective way to reduce the computational cost of data analysis by selecting and implementing only a subset of original features into the study. Mutual information (MI) is a popular measurement adopted to quantify the dependence among features. MI-based greedy forward methods (MIGFMs) have been widely applied to escape from computational complexity and exhaustion of high-dimensional data. However, most MIGFMs are parametric methods that necessitate proper preset parameters and stopping criteria. Improper parameters may lead to ignorance of better results. This paper proposes a novel nonparametric feature selection method based on mutual information and mixed-integer linear programming (MILP). By forming a mutual information network, we transform the feature selection problem into a maximum flow problem, which can be solved with the Gurobi solver in a reasonable time. The proposed method attempts to prevent negligence on obtaining a superior feature subset while keeping the computational cost in an affordable range. Analytical comparison of the proposed method with six feature selection methods reveals significantly better results compared to MIGFMs, considering classification accuracy.  相似文献   

12.
Cluster Analysis of Gene Expression Data   总被引:1,自引:0,他引:1  
The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample—such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50–100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by clustering and how we analyze the massive amounts of data from such experiments, and present results obtained from analysis of data from colon cancer, brain tumors and breast cancer.  相似文献   

13.
Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments’ results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.  相似文献   

14.
新生儿疼痛面部表情识别方法的研究   总被引:3,自引:0,他引:3  
卢官明  李晓南  李海波 《光学学报》2008,28(11):2109-2114
针对新生儿的疼痛与非疼痛面部表情识别,提出将Gabor变换和支持向量机(SVM)相结合的分类识别方法.对归一化后的大小为112 pixel×92 pixel的新生儿面部图像进行二维Gabor小波变换,提取出412160维Gabor特征;针对Gabor特征向量维数高、冗余大的特点,采用Adaboost算法作为特征选择工具,去除冗余的Gabor特征,从412160维特征中选取出900维Gabor特征;对选取出的Gabor特征用SVM进行疼痛表情的分类识别.该方法综合运用Gabor特征对于面部表情的良好表征能力、AdaBoost算法的特征选择能力以及SVM在处理少样本、高维数问题中的优势.对510幅新生儿的表情图像进行测试的结果表明,疼痛与非疼痛表情的分类识别率达到85.29%,疼痛与安静表情的分类识别率达到94.24%,疼痛与哭表情的分类识别率达到78.24%.  相似文献   

15.
16.
Kim HS  Ahn CH  Park TS  Park HD  Koh KS  Ryoo ZY  Park SC  Lee S 《Cryo letters》2012,33(1):1-11
To identify genes that are modulated under cold-stress conditions in the earthworm Eisenia andrei, we performed a genome-wide analysis of gene expression in cold-shocked earthworms by using Serial Analysis of Gene Expression (SAGE). We identified 5,977 and 5,407 unique SAGE tags under normal and cold-stressed conditions, respectively. The majority of the SAGE tags did not match to any known expressed sequences, due to a paucity of expression data in earthworms. We converted the statistically significant SAGE tags for the cold-stressed condition into expressed sequence tags (ESTs), and the results showed that particular genes associated with energy homeostasis, cellular defense mechanisms, and ion balance were up-regulated or down-regulated. We constructed a regulatory network of some of these genes and identified rps-6 as a core gene in the cold-response regulatory-gene network. Our data provide a baseline for gene expression studies of cold shock in the Lumbricidae.  相似文献   

17.
We study convex empirical risk minimization for high-dimensional inference in binary linear classification under both discriminative binary linear models, as well as generative Gaussian-mixture models. Our first result sharply predicts the statistical performance of such estimators in the proportional asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit to prove bounds on the best achievable performance. Notably, we show that the proposed bounds are tight for popular binary models (such as signed and logistic) and for the Gaussian-mixture model by constructing appropriate loss functions that achieve it. Our numerical simulations suggest that the theory is accurate even for relatively small problem dimensions and that it enjoys a certain universality property.  相似文献   

18.
19.
Artificial intelligence in healthcare can potentially identify the probability of contracting a particular disease more accurately. There are five common molecular subtypes of breast cancer: luminal A, luminal B, basal, ERBB2, and normal-like. Previous investigations showed that pathway-based microarray analysis could help in the identification of prognostic markers from gene expressions. For example, directed random walk (DRW) can infer a greater reproducibility power of the pathway activity between two classes of samples with a higher classification accuracy. However, most of the existing methods (including DRW) ignored the characteristics of different cancer subtypes and considered all of the pathways to contribute equally to the analysis. Therefore, an enhanced DRW (eDRW+) is proposed to identify breast cancer prognostic markers from multiclass expression data. An improved weight strategy using one-way ANOVA (F-test) and pathway selection based on the greatest reproducibility power is proposed in eDRW+. The experimental results show that the eDRW+ exceeds other methods in terms of AUC. Besides this, the eDRW+ identifies 294 gene markers and 45 pathway markers from the breast cancer datasets with better AUC. Therefore, the prognostic markers (pathway markers and gene markers) can identify drug targets and look for cancer subtypes with clinically distinct outcomes.  相似文献   

20.
利用高光谱数据进行地物识别分类研究   总被引:5,自引:4,他引:1  
分析了传统统计分类方法在高光谱影像地物分类中的弊端,提出并详细讨论了基于端元的监督分类技术.利用端元监督分类技术对LASIS高光谱影像进行分类,同时应用IsoData非监督分类技术即自动迭代聚类对高光谱影像进行分类.分析比较了两种分类结果,表明基于端元的监督分类技术更能满足对地物识别分类的需要.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号