首页 | 本学科首页   官方微博 | 高级检索  
     


Variable selection in model-based clustering using multilocus genotype data
Authors:Wilson Toussile  Elisabeth Gassiat
Affiliation:1. UR016, Institut de Recherche pour le Développement (IRD), Laboratoire de Mathématique d’Orsay (LMO), Ecole Nationale Supérieure Polytechnique de Yaoundé, Bat 425, 91405, Orsay Cedex, France
2. Laboratoire de Mathématique d’Orsay, Bat 425, 91405, Orsay Cedex, France
Abstract:We propose a variable selection procedure in model-based clustering using multilocus genotype data. Indeed, it may happen that some loci are not relevant for clustering into statistically different populations. Inferring the number K of clusters and the relevant clustering subset S of loci is seen as a model selection problem. The competing models are compared using penalized maximum likelihood criteria. Under weak assumptions on the penalty function, we prove the consistency of the resulting estimator ${(widehat{K}_n, widehat{S}_n)}$ . An associated algorithm named Mixture Model for Genotype Data (MixMoGenD) has been implemented using c++ programming language and is available on http://www.math.u-psud.fr/~toussile. To avoid an exhaustive search of the optimum model, we propose a modified Backward-Stepwise algorithm, which enables a better search of the optimum model among all possible cardinalities of S. We present numerical experiments on simulated and real datasets that highlight the interest of our loci selection procedure.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号