Binary clustering with missing data |
| |
Authors: | M. Nadif G. Govaert |
| |
Affiliation: | 1. L.R.I.M. Université Metz Ile du Saulcy 57045 Metz, France;2. UTC-URA CNRS 817 Compiègne, France |
| |
Abstract: | A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data. |
| |
Keywords: | Mixture Missing data Classification maximum likelihood Binary clustering |
|
|