首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Finding the Number of Normal Groups in Model-Based Clustering via Constrained Likelihoods
Authors:Andrea Cerioli  Luis Angel García-Escudero  Agustín Mayo-Iscar  Marco Riani
Institution:1. Dipart. di Scienze Economiche e Aziendali, Università di Parma, Parma, Italy;2. Dpto. de Estadística e I.O. and IMUVA, Universidad de Valladolid, Valladolid, Spain
Abstract:Deciding the number of clusters k is one of the most difficult problems in cluster analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well-known Bayesian information criterion and integrated complete likelihood criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional EM and CEM algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c ? 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails is proposed. Based on this criterion, a novel and fully automated procedure, leading to a small ranked list of optimal (k, c) couples is provided. A new plot called “car-bike,” which provides a concise summary of the solutions, is introduced. The performance of the procedure is assessed both in empirical examples and through a simulation study as a function of cluster overlap. Supplementary materials for the article are available online.
Keywords:BIC  CEM algorithm  Clustering  EM algorithm  ICL  Mixtures
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号