k-均值算法的初始化方法综述 A survey on the initialization methods for the k-means algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

k-均值算法的初始化方法综述

引用本文：	徐大川,许宜诚,张冬梅. k-均值算法的初始化方法综述[J]. 运筹学学报, 2018, 22(2): 31-40. DOI: 10.15960/j.cnki.issn.1007-6093.2018.02.003

作者姓名：	徐大川许宜诚张冬梅

作者单位：	1. 北京工业大学应用数理学院, 北京 100124;2. 中国科学院深圳先进技术研究院, 广东深圳 518055;3. 山东建筑大学计算机科学与技术学院, 济南 250101

基金项目：	国家自然科学基金(No. 11531014), 山东省济南市科技发展计划项目(No. 201401211)

摘要：	k-均值问题自提出以来一直吸引组合优化和计算机科学领域的广泛关注, 是经典的NP-难问题之一. 给定N个d维实向量构成的观测集, 目标是把这N个观测点划分到k(leq N)个集合中, 使得所有集合中的点到对应的聚类中心距离的平方和最小, 一个集合的聚类中心指的是该集合中所有观测点的均值. k-均值算法作为解决k-均值问题的启发式算法，在实际应用中因其出色的收敛速度而倍受欢迎. k-均值算法可描述为: 给定问题的初始化分组, 交替进行指派(将观测点分配到离其最近的均值点)和更新(计算新的聚类的均值点)直到收敛到某一解. 该算法通常被认为几乎是线性收敛的. 但缺点也很明显, 无法保证得到的是全局最优解, 并且算法结果好坏过于依赖初始解的选取. 于是学者们纷纷提出不同的初始化方法来提高k-均值算法的质量. 现筛选和罗列了关于选取初始解的k-均值算法的初始化方法供读者参考.
关键词：	k-均值算法初始化方法
收稿时间：	2017-10-16
A survey on the initialization methods for the k-means algorithm

XU Dachuan,XU Yicheng,ZHANG Dongmei. A survey on the initialization methods for the k-means algorithm[J]. OR Transactions, 2018, 22(2): 31-40. DOI: 10.15960/j.cnki.issn.1007-6093.2018.02.003

Authors:	XU Dachuan XU Yicheng ZHANG Dongmei

Affiliation:	1. College of Applied Sciences, Beijing University of Technology, Beijing 100124, China; 2. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China;3. School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China

Abstract:	The k-means problem which is one of the classical NP-hard problems keeps attracting wide attention in combinatorial optimization and computer science area since been proposed. Given a set of observations consisting of N d-dimensional real vectors, the objective is to partition the N observations into k(leqslant N) sets, so as to minimize the within-cluster sum of squares, where the center of a cluster is defined as the mean of all the observations in it. As one of the heuristic algorithms for solving k-means problem, k-means algorithm is quite popular in practice because of its excellent perform of convergence. The k-means algorithm can be described as: given initial solution for k-means problem, repeat assignment (assign observations to its nearest center) and update (compute the new centers in the new clusters) step until convergence to a solution. Although the algorithm is usually considered as linearly convergency, its shortcomings are obvious. The k-means algorithm is unable to ensure the global optimality and the output is rely badly on the choice of initial solutions. Therefore, variants of initialization methods are proposed to improve the performance of the k-means algorithm. In this paper we mainly filter and list some of them for readers.

Keywords:	k-means algorithm initialization method
本文献已被 CNKI 等数据库收录！
	点击此处可从《运筹学学报》浏览原始摘要信息
	点击此处可从《运筹学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏