首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A local search approximation algorithm for k-means clustering
Authors:Tapas Kanungo  David M Mount  Nathan S Netanyahu  Christine D Piatko  Ruth Silverman  Angela Y Wu  
Institution:

a IBM Almaden Research Center, San Jose, CA 95120, USA

b Department of Computer Science, University of Maryland, College Park, MD, USA

c Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel

d Center for Automation Research, University of Maryland, College Park, MD, USA

e The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA

f Department of Computer Science, American University, Washington, DC, USA

Abstract:In k-means clustering we are given a set of n data points in d-dimensional space Image and an integer k, and the problem is to determine a set of k points in Image , called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance.

We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9+var epsilon)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9−var epsilon) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.

Keywords:Clustering  k-means  Approximation algorithms  Local search  Computational geometry
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号