期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

谭冰王骁力李志军卢鹏《数学的实践与认识》2015,(13):140-145

针对传统k-均值聚类算法事先必须获知类别数和难以确定初始聚类中心的缺点,建立了关于聚类中心和类别数k的双层规划模型,结合粒子群算法确定出聚类中心,通过在迭代过程中不断更新准则函数的方法搜索并确定出最佳类别数惫,基于所建模型,提出了一种改进的k-均值聚类算法,并将算法应用于冰脊表面形态分析中.结果表明,算法得到的聚类结果不但具有相邻类别边界清晰的优点,而且能够较好地反映出地理位置和生长环境对冰脊形成的影响. 相似文献

2.

κ-均值聚类算法的改进及其在冰脊表面形态分析中的应用

《数学的实践与认识》2015,(13)

针对传统k-均值聚类算法事先必须获知类别数和难以确定初始聚类中心的缺点,建立了关于聚类中心和类别数k的双层规划模型,结合粒子群算法确定出聚类中心,通过在迭代过程中不断更新准则函数的方法搜索并确定出最佳类别数惫,基于所建模型,提出了一种改进的k-均值聚类算法,并将算法应用于冰脊表面形态分析中.结果表明,算法得到的聚类结果不但具有相邻类别边界清晰的优点,而且能够较好地反映出地理位置和生长环境对冰脊形成的影响. 相似文献

3.

基于方差与改进群智能算法的K-means聚类优化

《系统科学与数学》2018,(10)

利用K-means进行数据聚类时,借用不同处理手段其统计距离和聚类中心等会有所差异,从而影响聚类结果,尤其是当数据维度增高时,这种现象更为明显.对此,文章提出一种基于样本方差的多元统计距离算法,并引入改进人工蜂群算法及评价准则函数确定聚类中心和最佳聚类数,优化K-means算法.理论上,该方法可以克服原算法易陷入局部最优和固定聚类数等缺陷.最后,通过特异值检测,人工数据集以及UCI真实数据集测试验证该优化算法性能. 相似文献

4.

一个新的模糊聚类有效性指标

《系统科学与数学》2014,(9)

提出了一个判别模糊聚类中聚类数有效性的新指标.首先利用FCM算法对数据集进行模糊聚类,通过隶属度矩阵和聚类中心构建加权二分网络.然后通过改进加权二分网络的模函数,定义一个新的聚类有效性指标.为了检验该有效性指标的性能,选取了三个常见的有效性指标在十五个数据集上进行了对比.实验结果表明,该有效性指标具有较好的性能. 相似文献

5.

一种基于IPCM的自适应曲线检测算法

《数学的实践与认识》2015,(13)

模糊球壳聚类(FCSS)算法和基于改进型可能性C-均值聚类(IPCM)的球壳聚类(IPCSS)算法都是基于梯度的交错寻优方法,在检测圆或圆弧曲线时容易陷入局部极小值,从而得到错误的检测结果,同时其不能自动识别曲线的条数.针对上述两个缺点,在IPCM的基础上用拟合法计算半径和圆心,很大程度上克服了陷入局部极小值的缺点,同时引入特征间隙的方法,实现了曲线条数的自动识别.大量数值仿真实验和实际数据实验表明,提出的算法对圆或圆弧型曲线具有良好的自适应检测效果. 相似文献

6.

基于微分进化算法的FCM图像分割算法 总被引：1，自引：1，他引：0

李艳灵李刚武津刚《数学的实践与认识》2009,39(9)

为提高模糊C均值(FCM)算法的自动化程度,提出基于微分进化算法的FCM图像分割算法(DEFCM),利用微分进化算法全局性和鲁棒性的特点自动确定分类数和初始聚类中心,再将其作为模糊c均值聚类的初始聚类中心,弥补FCM算法的不足.实验表明该算法不仅能够正确地对图像分类,而且能获得较好的图像分割效果和质量. 相似文献

7.

k-均值算法的初始化方法综述

徐大川许宜诚张冬梅《运筹学学报》2018,22(2):31-40

k-均值问题自提出以来一直吸引组合优化和计算机科学领域的广泛关注, 是经典的NP-难问题之一. 给定N个d维实向量构成的观测集, 目标是把这N个观测点划分到k(\leq N)个集合中, 使得所有集合中的点到对应的聚类中心距离的平方和最小, 一个集合的聚类中心指的是该集合中所有观测点的均值. k-均值算法作为解决k-均值问题的启发式算法,在实际应用中因其出色的收敛速度而倍受欢迎. k-均值算法可描述为: 给定问题的初始化分组, 交替进行指派(将观测点分配到离其最近的均值点)和更新(计算新的聚类的均值点)直到收敛到某一解. 该算法通常被认为几乎是线性收敛的. 但缺点也很明显, 无法保证得到的是全局最优解, 并且算法结果好坏过于依赖初始解的选取. 于是学者们纷纷提出不同的初始化方法来提高k-均值算法的质量. 现筛选和罗列了关于选取初始解的k-均值算法的初始化方法供读者参考. 相似文献

8.

κ-均值算法的初始化方法综述

《运筹学学报》2018,(2)

k-均值问题自提出以来一直吸引组合优化和计算机科学领域的广泛关注,是经典的NP-难问题之一.给定N个d维实向量构成的观测集,目标是把这N个观测点划分到k(≤N)个集合中,使得所有集合中的点到对应的聚类中心距离的平方和最小,一个集合的聚类中心指的是该集合中所有观测点的均值.k-均值算法作为解决k-均值问题的启发式算法,在实际应用中因其出色的收敛速度而倍受欢迎.k-均值算法可描述为:给定问题的初始化分组,交替进行指派(将观测点分配到离其最近的均值点)和更新(计算新的聚类的均值点)直到收敛到某一解.该算法通常被认为几乎是线性收敛的.但缺点也很明显,无法保证得到的是全局最优解,并且算法结果好坏过于依赖初始解的选取.于是学者们纷纷提出不同的初始化方法来提高k-均值算法的质量.现筛选和罗列了关于选取初始解的k-均值算法的初始化方法供读者参考. 相似文献

9.

基于遗传算法的模糊聚类分析 总被引：1，自引：0，他引：1

宫尚宝郭玉翠《模糊系统与数学》2010,24(6)

针对模糊C-均值算法容易收敛于局部极小点的缺陷,将遗传算法应用于算法的优化计算.同时针对算法中,聚类效果往往受到聚类数目和初始聚类中心的影响,提出了基于平均信息熵确定聚类数目的方法,并采用密度函数来获得初始聚类中心.实验证明,基于遗传算法的模糊聚类方法能够避免产生局部极小值,较好的解决聚类结果对初值的依赖. 相似文献

10.

一种改进的遗传k-means聚类算法 总被引：8，自引：0，他引：8

刘婷郭海湘诸克军高思维《数学的实践与认识》2007,37(8):104-111

在经典的k-means聚类算法中,聚类数k必须事先给定,然而在现实中k很难被精确的确定.本文提出了一种改进的遗传k-means聚类算法,并构造了一个用来评价分类程度好坏的适应度函数,该适应度函数考虑的是在提高紧凑度(类内距)和分离度(类间距)的同时使得分类个数尽可能少.最后采用两个人工数据集和三个UCI数据集对k-means聚类算法(KM),遗传聚类算法(GA),遗传k-means聚类算法(GKM)和改进的遗传k-means聚类算法(IGKM)进行比较研究,比较的指标有类间距、类内距和分类正确率.研究证明改进的遗传k-means算法能够自动获取最佳聚类数k并且保持较高的正确率. 相似文献

11.

Clustering Motion

Sariel?Har-Peled Email author 《Discrete and Computational Geometry》2004,31(4):545-565

Given a set of moving points in ^d, we show how to cluster them in advance, using a small number of clusters, so that at any time this static clustering is competitive with the optimal k-center clustering at that time. The advantage of this approach is that it avoids updating the clustering as time passes. We also show how to maintain this static clustering efficiently under insertions and deletions. To implement this static clustering efficiently, we describe a simple technique for speeding up clustering algorithms and apply it to achieve faster clustering algorithms for several problems. In particular, we present a linear time algorithm for computing a 2-approximation to the k-center clustering of a set of n points in ^d. This slightly improves the algorithm of Feder and Greene, that runs in (n log k) time (which is optimal in the algebraic decision tree model). 相似文献

12.

Discrete Mobile Centers

Gao Guibas Hershberger Zhang Zhu 《Discrete and Computational Geometry》2008,30(1):45-63

Abstract. We propose a new randomized algorithm for maintaining a set of clusters among moving nodes in the plane. Given a specified cluster radius, our algorithm selects and maintains a variable subset of the nodes as cluster centers. This subset has the property that (1) balls of the given radius centered at the chosen nodes cover all the others and (2) the number of centers selected is a constant-factor approximation of the minimum possible. As the nodes move, an event-based kinetic data structure updates the clustering as necessary. This kinetic data structure is shown to be responsive, efficient, local, and compact. The produced cover is also smooth, in the sense that wholesale cluster re-arrangements are avoided. This clustering algorithm is distributed in nature and can enable numerous applications in ad hoc wireless networks, where mobile devices must be interconnected to perform various tasks collaboratively. 相似文献

13.

Discrete Mobile Centers 总被引：1，自引：0，他引：1

Gao Guibas Hershberger Zhang Zhu 《Discrete and Computational Geometry》2003,30(1):45-63

Abstract. We propose a new randomized algorithm for maintaining a set of clusters among moving nodes in the plane. Given a specified cluster radius, our algorithm selects and maintains a variable subset of the nodes as cluster centers. This subset has the property that (1) balls of the given radius centered at the chosen nodes cover all the others and (2) the number of centers selected is a constant-factor approximation of the minimum possible. As the nodes move, an event-based kinetic data structure updates the clustering as necessary. This kinetic data structure is shown to be responsive, efficient, local, and compact. The produced cover is also smooth, in the sense that wholesale cluster re-arrangements are avoided. This clustering algorithm is distributed in nature and can enable numerous applications in ad hoc wireless networks, where mobile devices must be interconnected to perform various tasks collaboratively. 相似文献

14.

Data Clustering Based on Maximization of Outlier Factor

Vydunas Saltenis 《Journal of Global Optimization》2006,35(4):625-635

There exist many data clustering algorithms, but they can not adequately handle the number of clusters or cluster shapes. Their performance mainly depends on a choice of algorithm parameters. Our approach to data clustering and algorithm does not require the parameter choice; it can be treated as a natural adaptation to the existing structure of distances between data points. The outlier factor introduced by the author specifies a degree of being an outlier for each data point. The outlier factor notion is based on the difference between the frequency distribution of interpoint distances in a given dataset and the corresponding distribution of uniformly distributed points. Then data clusters can be determined by maximizing the outlier factor function. The data points in dataset are divided into clusters according to the attractor regions of local optima. An experimental evaluation of the proposed algorithm shows that the proposed method can identify complex cluster shapes. Key advantages of the approach are: good clustering properties for datasets with comparatively large amount of noise (an additional data points), and an absence of important parameters which adequate choice determines the quality of results. 相似文献

15.

Sequential clustering with radius and split criteria

Nenad Mladenovic Pierre Hansen Jack Brimberg 《Central European Journal of Operations Research》2013,21(1):95-115

Sequential clustering aims at determining homogeneous and/or well-separated clusters within a given set of entities, one at a time, until no more such clusters can be found. We consider a bi-criterion sequential clustering problem in which the radius of a cluster (or maximum dissimilarity between an entity chosen as center and any other entity of the cluster) is chosen as a homogeneity criterion and the split of a cluster (or minimum dissimilarity between an entity in the cluster and one outside of it) is chosen as a separation criterion. An O(N ³) algorithm is proposed for determining radii and splits of all efficient clusters, which leads to an O(N ⁴) algorithm for bi-criterion sequential clustering with radius and split as criteria. This algorithm is illustrated on the well known Ruspini data set. 相似文献

16.

A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem

Hanif?D.?Sherali Email author Jitamitra?Desai 《Journal of Global Optimization》2005,32(2):281-306

The field of cluster analysis is primarily concerned with the sorting of data points into different clusters so as to optimize a certain criterion. Rapid advances in technology have made it possible to address clustering problems via optimization theory. In this paper, we present a global optimization algorithm to solve the hard clustering problem, where each data point is to be assigned to exactly one cluster. The hard clustering problem is formulated as a nonlinear program, for which a tight linear programming relaxation is constructed via the Reformulation-Linearization Technique (RLT) in concert with additional valid inequalities that serve to defeat the inherent symmetry in the problem. This construct is embedded within a specialized branch-and-bound algorithm to solve the problem to global optimality. Pertinent implementation issues that can enhance the efficiency of the branch-and-bound algorithm are also discussed. Computational experience is reported using several standard data sets found in the literature as well as using synthetically generated larger problem instances. The results validate the robustness of the proposed algorithmic procedure and exhibit its dominance over the popular k-means clustering technique. Finally, a heuristic procedure to obtain a good quality solution at a relative ease of computational effort is also described. 相似文献

17.

A Global Optimization RLT-based Approach for Solving the Fuzzy Clustering Problem

Hanif?D.?Sherali Email author Jitamitra?Desai 《Journal of Global Optimization》2005,33(4):597-615

The field of cluster analysis is primarily concerned with the partitioning of data points into different clusters so as to optimize a certain criterion. Rapid advances in technology have made it possible to address clustering problems via optimization theory. In this paper, we present a global optimization algorithm to solve the fuzzy clustering problem, where each data point is to be assigned to (possibly) several clusters, with a membership grade assigned to each data point that reflects the likelihood of the data point belonging to that cluster. The fuzzy clustering problem is formulated as a nonlinear program, for which a tight linear programming relaxation is constructed via the Reformulation-Linearization Technique (RLT) in concert with additional valid inequalities. This construct is embedded within a specialized branch-and-bound (B&B) algorithm to solve the problem to global optimality. Computational experience is reported using several standard data sets from the literature as well as using synthetically generated larger problem instances. The results validate the robustness of the proposed algorithmic procedure and exhibit its dominance over the popular fuzzy c-means algorithmic technique and the commercial global optimizer BARON. 相似文献

18.

Probabilistic clustering via Pareto solutions and significance tests

María Teresa Gallegos Gunter Ritter 《Advances in Data Analysis and Classification》2018,12(2):179-202

The present paper proposes a new strategy for probabilistic (often called model-based) clustering. It is well known that local maxima of mixture likelihoods can be used to partition an underlying data set. However, local maxima are rarely unique. Therefore, it remains to select the reasonable solutions, and in particular the desired one. Credible partitions are usually recognized by separation (and cohesion) of their clusters. We use here the p values provided by the classical tests of Wilks, Hotelling, and Behrens–Fisher to single out those solutions that are well separated by location. It has been shown that reasonable solutions to a clustering problem are related to Pareto points in a plot of scale balance vs. model fit of all local maxima. We briefly review this theory and propose as solutions all well-fitting Pareto points in the set of local maxima separated by location in the above sense. We also design a new iterative, parameter-free cutting plane algorithm for the multivariate Behrens–Fisher problem. 相似文献

19.

基于核方法的数据描述及其在企业关系评价中的应用

肖健华《数学的实践与认识》2005,35(11):83-91

数据描述又称为一类分类方法,用于描述现有数据的分布特征,以研究待测试数据是否与该分布相吻合.首先简要叙述了基于核方法的数据描述原理,指出:选择适当的核函数以及与之对应的参数,数据描述可应用于模式聚类中,并且这种聚类方法具有边界紧致、易剔除噪声的优势.针对基于数据描述的聚类方法在确定类别数目和具体样本类别归属上所存在的问题,提出了基于搜索的解决方法,理论分析和实例计算都验证了该方法的可行性.最后将该聚类算法应用到企业关系评价中,取得了较为合理的结果. 相似文献

20.

An incremental clustering algorithm based on hyperbolic smoothing

A.?M.?Bagirov Email author B.?Ordin G.?Ozturk A.?E.?Xavier 《Computational Optimization and Applications》2015,61(1):219-241

Clustering is an important problem in data mining. It can be formulated as a nonsmooth, nonconvex optimization problem. For the most global optimization techniques this problem is challenging even in medium size data sets. In this paper, we propose an approach that allows one to apply local methods of smooth optimization to solve the clustering problems. We apply an incremental approach to generate starting points for cluster centers which enables us to deal with nonconvexity of the problem. The hyperbolic smoothing technique is applied to handle nonsmoothness of the clustering problems and to make it possible application of smooth optimization algorithms to solve them. Results of numerical experiments with eleven real-world data sets and the comparison with state-of-the-art incremental clustering algorithms demonstrate that the smooth optimization algorithms in combination with the incremental approach are powerful alternative to existing clustering algorithms. 相似文献