Binary clustering with missing data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Binary clustering with missing data

Authors:	M. Nadif G. Govaert

Affiliation:	1. L.R.I.M. Université Metz Ile du Saulcy 57045 Metz, France;2. UTC-URA CNRS 817 Compiègne, France

Abstract:	A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert³ has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L₁ distance characterizing the MNDBIN method (Marchetti⁸). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.

Keywords:	Mixture Missing data Classification maximum likelihood Binary clustering