首页 | 本学科首页   官方微博 | 高级检索  
     


Binary clustering with missing data
Authors:M. Nadif  G. Govaert
Affiliation:1. L.R.I.M. Université Metz Ile du Saulcy 57045 Metz, France;2. UTC-URA CNRS 817 Compiègne, France
Abstract:A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.
Keywords:Mixture  Missing data  Classification maximum likelihood  Binary clustering
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号