首页 | 本学科首页   官方微博 | 高级检索  
     

基于统计检验与机器学习的全基因组关联研究框架
引用本文:钱亦欣,毛宁,王阔. 基于统计检验与机器学习的全基因组关联研究框架[J]. 数学的实践与认识, 2017, 0(14): 99-106
作者姓名:钱亦欣  毛宁  王阔
作者单位:1. 上海大学经济学院,上海,200444;2. 上海大学理学院,上海,200444
摘    要:
采用统计检验和机器学习的方法来研究SNP或基因与疾病(可测性状)的关联性.先对SNP选择合适的数值编码方式,并设计了相应的统计检验流程,随后通过P值初步筛选出了与疾病或性状相关联的位点.在此基础上,对筛选出的位点,采用随机森林,XGBoost等机器学习方法,从样本外预测的角度判断SNP与疾病或性状的关联度.相关结果,显示发现运用该分析框架能较好地筛选出与疾病或性状关联的SNP(基因).并且框架由于考虑了多种分类模型,有着稳健性高,计算开销较小以及可以交叉比对等优势.框架未来在还可在金融,社交网络等方面发挥作用.

关 键 词:全基因关联分析(GWAS)  统计检验  机器学习

A GWAS Framework based on Statistical Test and Machine Learning
QIAN Yi-xin,MAO Ning,WANG Kuo. A GWAS Framework based on Statistical Test and Machine Learning[J]. Mathematics in Practice and Theory, 2017, 0(14): 99-106
Authors:QIAN Yi-xin  MAO Ning  WANG Kuo
Abstract:
This paper studied the correlation of SNPs(Gene) and diseases by statistical test and machine learning after recording proper number of SNPs.Firstly,corresponding statistical test process was designed and some SNPs related to diseases or nature were selected preliminarily.Secondly,based on these selected SNPs,random tree,XGBoost and other machine learning methods were used to evaluate the correlation of SNPs and diseases from out-of-sample prediction accuracy.Related results showed the analysis framework could select important SNPs better of affecting diseases or nature.In addition,this analysis framework has higher degree of robustness and less computation cost as well as advantage of cross-contrast due to considering several classifiers,which made this framework play an important role in other fields like finance or social network.
Keywords:GWAS  statistical test  machine learning
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号