首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
采用统计检验和机器学习的方法来研究SNP或基因与疾病(可测性状)的关联性.先对SNP选择合适的数值编码方式,并设计了相应的统计检验流程,随后通过P值初步筛选出了与疾病或性状相关联的位点.在此基础上,对筛选出的位点,采用随机森林,XGBoost等机器学习方法,从样本外预测的角度判断SNP与疾病或性状的关联度.相关结果,显示发现运用该分析框架能较好地筛选出与疾病或性状关联的SNP(基因).并且框架由于考虑了多种分类模型,有着稳健性高,计算开销较小以及可以交叉比对等优势.框架未来在还可在金融,社交网络等方面发挥作用.  相似文献   

2.
为便于进行数据分析,首先将数据中的位点信息由原来字母编码方式转换为数值编码的方式,根据位点的编码信息和患病信息,采用Logistic回归的方法,找出某种疾病最有可能的一个或几个致病位点,同时采用显著性检验进一步对建立的模型进行检验,证明了建立结果的合理性。此外,通过主成分分析,从原有的300个主成分中取出了225个主成分尽可能多地反映原来基因变量的信息,再通过主成分Logistic回归分析找出与疾病最有可能相关的一个或几个基因。最后,采用典型相关分析找出与相关性状有关联的基因位点。  相似文献   

3.
单核苷酸多态性引起的DNA序列的改变造成了整个生物界染色体基因组的多样性,对SNP的深入研究对于识别人类基因表型和疾病关联具有重要的意义.标签SNP集的选择是生物信息学中的关键问题,少量的标签SNP所代表基因的遗传信息可以大大降低基因分型和全基因组关联研究的成本.本文详细介绍了SNP相关理论以及标签SNP集的选择方法,并针对标签SNP的应用以及未来的研究方向进行了简要分析.  相似文献   

4.
本文建立了23个综合考虑水稻单产的时间效应、空间效应和时空交互效应的时空模型。利用湖北省1991-2007年县级水稻单产数据,借助WinBUGS进行Gibbs抽样估计,根据DIC准则进行选择最优模型,预测1992-2009年各县市的水稻单产。基于2008年预测单产及其后验分布,厘定县级水稻产量保险的纯费率。研究的主要结论:1992-2007年的预测值与实际值比较接近,且预测的水稻单产的标准差都比较小,表明所选择模型具有良好的短期预测能力;相邻地域的纯费率比较接近;所厘定的纯费率与蒙特卡洛(Monte Carlo,MC)误差正相关,Pearson相关系数为0.6793,MC误差包含来自时间、空间以及两者相互作用带来的不确定性。  相似文献   

5.
为了描述基因型差异,本文从卡方检验入手求解相关度,并利用多位点连锁不平衡度,建立比较函数来区分个体差异;从1000人基因组计划数据库中收集了25个SNP位点,建立基于遗传算法改进BP神经网络进行人群分类,实现了输入某样本的SNP基因型,输出该样本的地区来源.  相似文献   

6.
准确地预测人口总量发展趋势,对我国社会稳定发展具有重要意义.通过分析GM(1,1)模型背景值的构造理论,利用Newton插值公式和线性分段函数优化GM(1,1)模型的背景值,得到新的GM(1,1)模型,并结合BP神经网络模型,再利用遗传算法优化GM(1,1)-BP组合模型的权重系数,并将组合模型应用到新疆人口预测中.最后,分别应用不同的模型,以及改进的GM(1,1)-BP组合模型进行计算和平均相对误差对比,结果表明,改进的GM(1,1)-BP组合模型有效地提高了预测精度.  相似文献   

7.
基于分数阶反向累加生成构建一种新的GM(1,2)模型,为使所构建模型能更好贴近和反映两个累加生成序列指标之间的真实关联关系,提出了基于不同序列采用不同累加阶数的GOM((p,q))(1,2)模型.首先通过灰关联模型识别并筛选与特征序列关联度最大的相关因素序列,然后建立不同累加阶数的灰色模型,通过带压缩因子的粒子群优化算法求解模型最优阶数p和q,最后运用BP神经网络修正GOM((p,q))(1,2)的模型值,构建GOM((p,q))(1,2)-BP神经网络组合模型.模型应用于武汉市空气质量指数的预测,结果表明与单一模型相比,组合模型具有更好的性能和建模精度.  相似文献   

8.
垂直关联产业生产性服务业和制造业间认知距离存在差异性.基于这一基本假设,将多样化指标进行熵值分解,从两大产业内多样化和产业间多样化两个方面对产业多样化水平进行重新界定.在此基础上,以长江经济带11省市为研究对象,建立了以知识溢出和投资组合为门槛变量的面板门槛模型.研究表明:产业内多样化对经济增长和经济稳定有明显的促进作用,产业间多样化则显著阻碍了经济增长和经济稳定.考虑门槛变量的影响,在知识溢出效应跨越第二个门槛之前,产业间多样化和经济增长显著负相关,当其跨越第三个门槛以后,产业内多样化和经济增长显著正相关;在投资组合效应跨越唯一门槛值以后,产业间多样化对经济稳定的阻碍作用占据主导,产业内多样化对经济稳定的促进作用逐渐消失.最后总结长江经济带6种发展动力类型,上海和浙江的发展动力最足,受到了知识溢出和投资组合的双重驱动.  相似文献   

9.
基于降维思想的客观组合评价模型   总被引:3,自引:0,他引:3  
为了解决多方法评价结论的非一致性问题,产生了组合评价方法.最大限度地利用多评价结论的信息是组合评价的关键.为此提出了一个基于降维思想的客观组合评价模型,它能够最大程度地保留多评价结论所包含的评价信息,充分地整合多评价结论中的共性信息.鉴于该模型求解属复杂非线性优化问题,常规方法难以直接处理,建立了微粒群改进算法,进行全局寻优.最后,通过实例说明了模型的可操作性和实用性.  相似文献   

10.
在预测模型的均值和稳定性基础上,建立了多目标组合优化模型,并以黑龙江九三地区35年的大豆产量数据为例,利用该地区大豆单产的Logistic模型和大豆产量与气象因子的逐步回归模型建立了多目标组合优化模型,并计算出最优解.结果表明,该组合模型没有最优点,而有非劣解.该方法对提高模型的精度,指导大豆生产具有重要意义.  相似文献   

11.
对候选基因的关联检验,都是针对性状在候选基因内使用多个SNP标记,并通过检验SNP单倍型来完成的,众所周知,多标记单倍型方法往往要比单标记方法表达出更多的信息,但是,单倍型的数量往往会随着所标记的SNP的数目增多而急剧的增加,这又会大大增加检验统计量的自由度,使用统计学中的主成分分析法来降低单倍型空间的维数,并构造关联检验来检验一个数量性状与多个单倍型的关联情况,模拟结果显示,此检验方法是较合理的.  相似文献   

12.
随着基因分型技术的不断发展,遗传学家可以获得大量遗传标记的基因型和单体型数据,这为鉴定人类复杂疾病基因提供了前所未有的机会。当不能直接获得单体型数据时,可以使用基因型数据的统计方法来进行关联分析.使用基因型数据对疾病基因进行关联分析的统计方法可以扩充到定位数量性状位点(QTL)。本文扩充了对疾病基因进行关联分析的主成份分析统计量PG咒和熵统计量%。到数量性状,利用选择基因型对QTL进行关联分析。计算机模拟考察了两个统计量的I型错误率.基于10个遗传性血色病(Hereditaryhaemochromatosis)单体型频率的计算机模拟调查了两个统计量的统计功效.结果表明两个统计量PCTt和TGE可以有效地对QTL进行关联分析.  相似文献   

13.
Single nucleotide polymorphisms (SNPs) are useful markers for locating genes since they occur throughout the human genome and thousands can be scored at once using DNA microarrays. Here, we use branching processes and coalescent theory to show that if one uses Kruglyak's (Nature Gen. 12 (1999) 139–144) model of the growth of the human population and one assumes an average mutation rate of 1×10−8 per nucleotide per generation then there are about 5.7 million SNP's in the human genome, or one every 526 base pairs. We also obtain results for the number of SNPs that will be found in samples of sizes n⩾2 to gain insight into the number that will be found by various experimental procedures.  相似文献   

14.
In genetic studies of complex diseases, particularly mental illnesses, and behavior disorders, two distinct characteristics have emerged in some data sets. First, genetic data sets are collected with a large number of phenotypes that are potentially related to the complex disease under study. Second, each phenotype is collected from the same subject repeatedly over time. In this study, we present a nonparametric regression approach to study multivariate and time-repeated phenotypes together by using the technique of the multivariate adaptive regression splines for analysis of longitudinal data (MASAL), which makes it possible to identify genes, gene-gene and gene-environment, including time, interactions associated with the phenotypes of interest. Furthermore, we propose a permutation test to assess the associations between the phenotypes and selected markers. Through simulation, we demonstrate that our proposed approach has advantages over the existing methods that examine each longitudinal phenotype separately or analyze the summarized values of phenotypes by compressing them into one-time-point phenotypes. Application of the proposed method to the Framingham Heart Study illustrates that the use of multivariate longitudinal phenotypes enhanced the significance of the association test.  相似文献   

15.
Genome-wide association studies (GWAS) aim to assess relationships between single nucleotide polymorphisms (SNPs) and diseases. They are one of the most popular problems in genetics, and have some peculiarities given the large number of SNPs compared to the number of subjects in the study. Individuals might not be independent, especially in animal breeding studies or genetic diseases in isolated populations with highly inbred individuals. We propose a family-based GWAS model in a two-stage approach comprising a dimension reduction and a subsequent model selection. The first stage, in which the genetic relatedness between the subjects is taken into account, selects the promising SNPs. The second stage uses Bayes factors for comparison among all candidate models and a random search strategy for exploring the space of all the regression models in a fully Bayesian approach. A simulation study shows that our approach is superior to Bayesian lasso for model selection in this setting. We also illustrate its performance in a study on Beta-thalassemia disorder in an isolated population from Sardinia. Supplementary Material describing the implementation of the method proposed in this article is available online.  相似文献   

16.
The study of genetic properties of a disease requires the collection of information concerning the subjects in a set of pedigrees. The main focus of this study was the detection of susceptible genes. However, even with large pedigrees, the heterogeneity of phenotypes in complex diseases such as Schizophrenia, Bipolar and Autism, makes the detection of susceptible genes difficult to accomplish. This is mainly due to a genetic heterogeneity: many genes phenomena are involved in the disease. In order to reduce this heterogeneity, our idea consists in sub-typing the disease and in partitioning the population into more alike sub-groups. We developed a probabilistic model based on a Latent Class Analysis (LCA) that takes into account the familial dependence inside a pedigree, even for large pedigrees. It also takes into account individuals with missing and partially missing measurements. Estimation of model parameters is performed by an EM algorithm, and computations for the E step inside a pedigree are achieved using a pedigree peeling algorithm. When more than one model are fitted, we use model selection strategies such as cross-validation or/and BIC approaches to choose the suitable model among a set of candidates. Moreover, we present a simulation based on a genetic disease class model and we show that our model leads to better individual classification than the model that assumes independence among subjects. An application of our model to a Schizophrenia-Bipolar pedigree data set from Eastern Quebec is also performed.  相似文献   

17.
We deal with the problem of finding a suitable model to predict survival of patients suffering from glial tumours as a function of several covariates. Estimation is based upon a retrospective study on 192 patients. Data were collected in the Hospital of Bordeaux and are analysed by Commenges and Dartigues1 using a Cox model. In the present paper we use dynamic Bayesian models which allow effects of the covariates to change with time through a stochastic structure. The survival function at one year is also calculated as a function of the covariates with the highest prognostic values and two factors (linear combinations of the covariates) are identified which synthesize information related to the general state of the patient (age, first symptom, etc.) and the characteristics of the tumour (diameter, localization, etc.), respectively. Survival at one year is then calculated as function of the two factors. Results are reported in tabular and graphic forms.  相似文献   

18.

Partially linear models (PLMs) have been widely used in statistical modeling, where prior knowledge is often required on which variables have linear or nonlinear effects in the PLMs. In this paper, we propose a model-free structure selection method for the PLMs, which aims to discover the model structure in the PLMs through automatically identifying variables that have linear or nonlinear effects on the response. The proposed method is formulated in a framework of gradient learning, equipped with a flexible reproducing kernel Hilbert space. The resultant optimization task is solved by an efficient proximal gradient descent algorithm. More importantly, the asymptotic estimation and selection consistencies of the proposed method are established without specifying any explicit model assumption, which assure that the true model structure in the PLMs can be correctly identified with high probability. The effectiveness of the proposed method is also supported by a variety of simulated and real-life examples.

  相似文献   

19.
In this contribution we investigate the mechanical behaviour of polyurethane over a range of different but constant temperatures from the glass to the viscoelastic state. Therefore uniaxial tension tests are performed on dogbone specimens under different isothermal conditions. In this manner an experimental data set is provided. As a theoretical basis we present the well known thermomechanically coupled one dimensional linear viscoelastic material model which is able to display the experimentally observed material behaviour. For this we adopt temperature dependent relaxation times. The introduced model parameters are identified via a standard parameter identification tool. Finally, the experimental results are compared with the ones of simulations of the identified model parameters. (© 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号