首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Estimating the Number of Clusters via the GUD Statistic
Authors:Jiyao Kou
Abstract:Estimating the number of clusters is one of the most difficult problems in cluster analysis. Most previous approaches require knowing the data matrix and may not work when only a Euclidean distance matrix is available. Other approaches also suffer from the curse of dimensionality and work poorly in high dimension. In this article, we develop a new statistic, called the GUD statistic, based on the idea of the Gap method, but use the determinant of the pooled within-group scatter matrix instead of the within-cluster sum of squared distances. Some theory is developed to show this statistic can work well when only the Euclidean distance matrix is known. More generally, this statistic can even work for any dissimilarity matrix that satisfies some properties. We also propose a modification for high-dimensional datasets, called the R-GUD statistic, which can give a robust estimation in high-dimensional settings. The simulation shows our method needs less information but is generally found to be more accurate and robust than other methods considered in the study, especially in many difficult settings.
Keywords:Euclidean distance matrix  Gap statistic  Hierarchy clustering  High-dimensional data  Uniform distribution
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号