首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
如何分离出少量区别不同组织类型的特异性基因是DNA微阵列数据分析中的主要问题,特别是构建恰当的统计模型来刻画这些不同组织类型的DNA表达形式尤为重要.为此,基于基因DNA微阵列数据的特点,我们假定对数变换后的微阵列数据服从混合正态分布.我们采用分级Bayesian先验刻画不同基因的相关性,利用分级Bayesian方法构建模型,给出了刻画不同组织基因表达的差异的一个标准,用MCMC迭代计算该标准.模拟计算表明我们的模型具有较好的识别能力.  相似文献   

2.
判别分析方法在医学应用中的进展   总被引:1,自引:0,他引:1  
本文对医学领域中判别分析方法的新进展做一综述,介绍了微阵列基因表达数据判别分析中偏最小二乘法降维、离散小波变换法降维、logitboost算法、随机森林、模糊核判别分析以及时间序列多元数据有序判别分析法、自身有变化规律数据的变系数logistic回归模型判别分析法的基本思想、算法和适用条件。  相似文献   

3.
采用统计检验的方法对基因表达数据的特征选取和冗余去除展开研究,为此提出了相应模型及算法,与已有文献中的模型与算法相比较,该模型所提方法思路直观,易于理解,算法构造简单,且运行效率高.数值实验选取3个两分类基因表达数据集,实验结果表明该方法对特征选取和冗余去除均有较好的效果.在此基础上,采用类中心距离法对选取的特征基因进行了分类实验,结果进一步表明,本文提出的方法对两分类基因表达数据具有较高的分类精确度.  相似文献   

4.
提出一种基于基因表达谱数据筛选差异表达基因的新方法;介绍了筛选差异表达基因常见方法-错误发现率方法(False Discovery Rate,FDR,),分析了多重假设检验p值性质,并根据p值性质提出了一种筛选差异表达基因新方法-单位γ度量法(Unit Measure-γ,UM-γ),建立了计算机模拟基因表达谱数据模型,制定了假阴性率、假阳性率、灵敏度、特异度以及总体错误率等作为考核指标,并使用基因表达谱模拟数据进行计算、比较;单位γ度量法估计非差异表达基因个数具有较高的稳定性和准确性;单位γ度量法既能够同时控制假阳性、假阴性以及总体错事件的发生,又能在一定程度上提高筛选结果的灵敏度和变异度;新提出的方法能有效、准确且稳定的对模拟数据差异表达基因进行筛选.  相似文献   

5.
基因表达数据蕴含着大量的生物信息,在生物基因信息研究中,筛选表达水平发生显著变化的差异基因是认识疾病形成机理和辅助靶点药物研究的关键问题.根据急性髓细胞白血病(AML)的基因表达数据,构造基因均值差序列,建立贝叶斯分层混合模型,并为模型的参数赋予具有基因生物特征的先验信息.采用马尔可夫链蒙特卡洛(MCMC)算法对模型参数进行估计,并筛选出急性髓细胞白血病差异表达基因.在实际数据分析中,从美国生物信息中心(NCBI)的高通量基因表达数据库中获取急性髓细胞白血病基因数据集,从经过非特异滤波预处理的14688个急性髓细胞白血病基因中筛选出711个差异表达基因,差异表达基因数仅占急性髓细胞白血病基因总数的4.84%,这一结果与基因差异表达的生物学原理相吻合.  相似文献   

6.
随着新一代测序技术的广泛使用,单细胞RNA数据逐渐成为研究的主流对象.然而,直接从生物体上获取单细胞RNA数据往往需要付出不小的成本.如何简单快捷地获取这些数据便是一个重要的问题.为了满足对比实验的需要,单细胞RNA数据的模拟方法通常除了模拟数据的统计量和原始数据接近以外,还需要在模拟数据中能够保留原数据的基因和细胞样本.在这里我们介绍了一种基于数据的模拟方法,在保留原数据的基因和细胞样本的基础上,不但可以低成本地模拟单细胞RNA数据,同时保证模拟结果和原数据在大部分特征上相似.通过大量数值实验证明,本文介绍的方法在基因表达的离散程度、0表达比例、表达异常值等方面都优于其他模拟方法,而且和实际数据更加接近.  相似文献   

7.
基因表达是细胞内部过程的中心环节,其建模与分析一直是人们关注的焦点.转录水平上的两类代表性基因表达模型(即构成式表达模型和爆发式表达模型)已经被广泛研究,特别是m RNA的分布已经解析地给出,但对于同时考虑转录和翻译的完整基因表达模型,如何解析地导出m RNA和蛋白质的分布以及它们的联合分布的问题至今未得到解决.本文应用作者自己发展起来的二项矩方法解决了此数学问题,尤其是给出了基因产物分布及有关统计量的解析表示.此外,本文还讨论了基因表达噪声的可调性.  相似文献   

8.
运用多重检验方法对高维数据进行推断统计分析.首先将最小一乘估计算法应用在多重检验分析中,构造出新的估计真实零假设个数的方法.其次对最小一乘与最小二乘方法估计真实零假设个数的准确性进行模拟比较分析,模拟结果表明前者较后者估算结果更准确.最后,将上述估计方法应用于乳腺癌微阵列数据的分析中寻找有表达差异的基因.检验结果共找到118个差异基因,其中85个基因在生物学上是有效基因,实证表明该方法具有一定的实用性.  相似文献   

9.
《义务教育数学课程标准(2022年版)》将百分数由原来的“数与代数”领域调整到“统计与概率”领域作为统计量进行教学,更加凸显了其统计意义.教学中需要立足统计视角,落实“会观察、会思考、会表达”的三会目标,帮助学生理解百分数的本质意义,体会百分数表示随机现象,感悟百分数对随机现象和数据表达的统计价值,培育数据意识.  相似文献   

10.
医学研究表明约30%的扩张型心肌病与遗传因素有关,因此从基因水平寻找其病因及发病机制越来越引起国内外学者的重视.采用针对超高维数据的序贯模型平均(SMA)方法对扩张型心肌病转基因小鼠微阵列数据建立回归模型,确定哪些基因对小鼠中G蛋白偶联受体的过表达有影响从而导致小鼠的心肌病,结果发现Msa.2877.0,Msa.741.0,Msa.768.0和Msa.2604.0四个基因是影响小鼠扩张型心肌病的主要基因,且SMA对该数据的拟合和预测都明显优于以往常用的SIS,L2boost及Lasso等变量选择方法.研究结果对进一步了解人类心脏病的发病机理有一定的借鉴意义.  相似文献   

11.
Microarray technology is a current approach for detecting alterations in the expression of thousands of genes simultaneously between two different biological conditions. Genes of interest are selected on the basis of an obtained p-value, and, thus, the list of candidates may vary depending on the data processing steps taken and statistical tests applied. Using standard approaches to the statistical analysis of microarray data from individuals with Autism Spectrum Disorder (ASD), several genes have been proposed as candidates. However, the lists of genes detected as differentially regulated in published mRNA expression analyses of Autism often do not overlap, owed at least in part to (i) the multifactorial nature of ASD, (ii) the high inter-individual variability of the gene expression in ASD cases, and (iii) differences in the statistical analysis approaches applied. Game theory recently has been proposed as a new method to detect the relevance of gene expression in different conditions. In this work, we test the ability of Game theory, specifically the Shapley value, to detect candidate ASD genes using a microarray experiment in which only a few genes can be detected as dysregulated using conventional statistical approaches. Our results showed that coalitional games significantly increased the power to identify candidates. A further functional analysis demonstrated that groups of these genes were associated with biological functions and disorders previously shown to be related to ASD.  相似文献   

12.
Recently developed SAGE technology enables us to simultaneously quantify the expression levels of thousands of genes in a population of cells. SAGE data is helpful in classification of different types of cancers. However, one main challenge in this task is the availability of a smaller number of samples compared to huge number of genes, many of which are irrelevant for classification. Another main challenge is that there is a lack of appropriate statistical methods that consider the specific properties of SAGE data. We propose an efficient solution by selecting relevant genes by information gain and building a multinomial event model for SAGE data. Promising results, in terms of accuracy, were obtained for the model proposed.   相似文献   

13.
Statistical modeling is an important area of biomarker research of important genes for new drug targets, drug candidate validation, disease diagnoses, personalized treatment, and prediction of clinical outcome of a treatment. A widely adopted technology is the use of microarray data that are typically very high dimensional. After screening chromosomes for relative genes using methods such as quantitative trait locus mapping, there may still be a few thousands of genes related to the clinical outcome of interest. On the other hand, the sample size (the number of subjects) in a clinical study is typically much smaller. Under the assumption that only a few important genes are actually related to the clinical outcome, we propose a variable screening procedure to eliminate genes having negligible effects on the clinical outcome. Once the dimension of microarray data is reduced to a manageable number relative to the sample size, one can select a final set of genes via a well-known variable selection method such as the cross-validation. We establish the asymptotic consistency of the proposed variable screening procedure. Some simulation results are also presented.  相似文献   

14.
Cluster analysis has been widely used to explore thousands of gene expressions from microarray analysis and identify a small number of similar genes (objects) for further detailed biological investigation. However, most clustering algorithms tend to identify loose clusters with too many genes. In this paper, we propose a Bayesian tight clustering method for time course gene expression data, which selects a small number of closely-related genes and constructs tight clusters only with these closely-related genes.  相似文献   

15.
This is a summary of the author’s Ph.D. thesis supervised by Fioravante Patrone and Stefano Bonassi and defended on 25 May 2006 at the Università degli Studi di Genova. The thesis in written in English and a copy is available from the author upon request. This work deals with the discussion and the application of a methodology based on Game Theory for the analysis of gene expression data. Nowadays, microarray technology is available for taking “pictures” of gene expressions. Within a single experiment of this sophisticated technology, the level of expression of thousands of genes can be estimated in a sample of cells under given conditions. Roughly speaking, the starting point is the observation of a “picture” of gene expressions in a sample of cells under a biological condition of interest, for example a tumor. Then, Game Theory plays a primary role to quantitatively evaluate the relevance of each gene in regulating or provoking the condition of interest, taking into account the observed relationships in all subgroups of genes.   相似文献   

16.
Clustering and classification are important tasks for the analysis of microarray gene expression data. Classification of tissue samples can be a valuable diagnostic tool for diseases such as cancer. Clustering samples or experiments may lead to the discovery of subclasses of diseases. Clustering genes can help identify groups of genes that respond similarly to a set of experimental conditions. We also need validation tools for clustering and classification. Here, we focus on the identification of outliers—units that may have been misallocated, or mislabeled, or are not representative of the classes or clusters.We present two new methods: DDclust and DDclass, for clustering and classification. These non-parametric methods are based on the intuitively simple concept of data depth. We apply the methods to several gene expression and simulated data sets. We also discuss a convenient visualization and validation tool—the relative data depth plot.  相似文献   

17.
Finding predictive gene groups from microarray data   总被引:1,自引:0,他引:1  
Microarray experiments generate large datasets with expression values for thousands of genes, but not more than a few dozens of samples. A challenging task with these data is to reveal groups of genes which act together and whose collective expression is strongly associated with an outcome variable of interest. To find these groups, we suggest the use of supervised algorithms: these are procedures which use external information about the response variable for grouping the genes. We present Pelora, an algorithm based on penalized logistic regression analysis, that combines gene selection, gene grouping and sample classification in a supervised, simultaneous way. With an empirical study on six different microarray datasets, we show that Pelora identifies gene groups whose expression centroids have very good predictive potential and yield results that can keep up with state-of-the-art classification methods based on single genes. Thus, our gene groups can be beneficial in medical diagnostics and prognostics, but they may also provide more biological insights into gene function and regulation.  相似文献   

18.
在基因的杂交试验中,传统的方法是在一个大的探针集中选择每条探针与成千上万条基因进行杂交,通过获得的杂交信号来区分所有的信息,这样不仅耗时长,而且从成本上考虑也是不划算的.建立了一个使得信息增量最大化的数学模型,依据该模型,可以从一个大的探针集中挑选出尽可能少的探针并达到区分所有信息的目的,节省了杂交试验的时间,也节省了成本,通过实例计算证明是有效的.  相似文献   

19.
Among the large amount of genes presented in microarray gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. In this regard, a new feature selection algorithm is presented based on rough set theory. It selects a set of genes from microarray data by maximizing the relevance and significance of the selected genes. A theoretical analysis is presented to justify the use of both relevance and significance criteria for selecting a reduced gene set with high predictive accuracy. The importance of rough set theory for computing both relevance and significance of the genes is also established. The performance of the proposed algorithm, along with a comparison with other related methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on five cancer and two arthritis microarray data sets. Among seven data sets, the proposed algorithm attains 100% predictive accuracy for three cancer and two arthritis data sets, while the rough set based two existing algorithms attain this accuracy only for one cancer data set.  相似文献   

20.
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号