Variable selection in model-based clustering using multilocus genotype data |
| |
Authors: | Wilson Toussile Elisabeth Gassiat |
| |
Affiliation: | 1. UR016, Institut de Recherche pour le Développement (IRD), Laboratoire de Mathématique d’Orsay (LMO), Ecole Nationale Supérieure Polytechnique de Yaoundé, Bat 425, 91405, Orsay Cedex, France 2. Laboratoire de Mathématique d’Orsay, Bat 425, 91405, Orsay Cedex, France
|
| |
Abstract: | We propose a variable selection procedure in model-based clustering using multilocus genotype data. Indeed, it may happen that some loci are not relevant for clustering into statistically different populations. Inferring the number K of clusters and the relevant clustering subset S of loci is seen as a model selection problem. The competing models are compared using penalized maximum likelihood criteria. Under weak assumptions on the penalty function, we prove the consistency of the resulting estimator ${(widehat{K}_n, widehat{S}_n)}$ . An associated algorithm named Mixture Model for Genotype Data (MixMoGenD) has been implemented using c++ programming language and is available on http://www.math.u-psud.fr/~toussile. To avoid an exhaustive search of the optimum model, we propose a modified Backward-Stepwise algorithm, which enables a better search of the optimum model among all possible cardinalities of S. We present numerical experiments on simulated and real datasets that highlight the interest of our loci selection procedure. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|