首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A modified approach had been developed in this study by combining two well-known algorithms of clustering, namely fuzzy c-means algorithm and entropy-based algorithm. Fuzzy c-means algorithm is one of the most popular algorithms for fuzzy clustering. It could yield compact clusters but might not be able to generate distinct clusters. On the other hand, entropy-based algorithm could obtain distinct clusters, which might not be compact. However, the clusters need to be both distinct as well as compact. The present paper proposes a modified approach of clustering by combining the above two algorithms. A genetic algorithm was utilized for tuning of all three clustering algorithms separately. The proposed approach was found to yield both distinct as well as compact clusters on two data sets.  相似文献   

2.
The field of cluster analysis is primarily concerned with the sorting of data points into different clusters so as to optimize a certain criterion. Rapid advances in technology have made it possible to address clustering problems via optimization theory. In this paper, we present a global optimization algorithm to solve the hard clustering problem, where each data point is to be assigned to exactly one cluster. The hard clustering problem is formulated as a nonlinear program, for which a tight linear programming relaxation is constructed via the Reformulation-Linearization Technique (RLT) in concert with additional valid inequalities that serve to defeat the inherent symmetry in the problem. This construct is embedded within a specialized branch-and-bound algorithm to solve the problem to global optimality. Pertinent implementation issues that can enhance the efficiency of the branch-and-bound algorithm are also discussed. Computational experience is reported using several standard data sets found in the literature as well as using synthetically generated larger problem instances. The results validate the robustness of the proposed algorithmic procedure and exhibit its dominance over the popular k-means clustering technique. Finally, a heuristic procedure to obtain a good quality solution at a relative ease of computational effort is also described.  相似文献   

3.
Automatic clustering using genetic algorithms   总被引:2,自引:0,他引:2  
In face of the clustering problem, many clustering methods usually require the designer to provide the number of clusters as input. Unfortunately, the designer has no idea, in general, about this information beforehand. In this article, we develop a genetic algorithm based clustering method called automatic genetic clustering for unknown K (AGCUK). In the AGCUK algorithm, noising selection and division-absorption mutation are designed to keep a balance between selection pressure and population diversity. In addition, the Davies-Bouldin index is employed to measure the validity of clusters. Experimental results on artificial and real-life data sets are given to illustrate the effectiveness of the AGCUK algorithm in automatically evolving the number of clusters and providing the clustering partition.  相似文献   

4.
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.  相似文献   

5.
The field of cluster analysis is primarily concerned with the partitioning of data points into different clusters so as to optimize a certain criterion. Rapid advances in technology have made it possible to address clustering problems via optimization theory. In this paper, we present a global optimization algorithm to solve the fuzzy clustering problem, where each data point is to be assigned to (possibly) several clusters, with a membership grade assigned to each data point that reflects the likelihood of the data point belonging to that cluster. The fuzzy clustering problem is formulated as a nonlinear program, for which a tight linear programming relaxation is constructed via the Reformulation-Linearization Technique (RLT) in concert with additional valid inequalities. This construct is embedded within a specialized branch-and-bound (B&B) algorithm to solve the problem to global optimality. Computational experience is reported using several standard data sets from the literature as well as using synthetically generated larger problem instances. The results validate the robustness of the proposed algorithmic procedure and exhibit its dominance over the popular fuzzy c-means algorithmic technique and the commercial global optimizer BARON.  相似文献   

6.
Hierarchical hesitant fuzzy K-means clustering algorithm   总被引:1,自引:0,他引:1  
Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.  相似文献   

7.
Cluster analysis is an important tool for data exploration and it has been applied in a wide variety of fields like engineering, economics, computer sciences, life and medical sciences, earth sciences and social sciences. The typical cluster analysis consists of four steps (i.e. feature selection or extraction, clustering algorithm design or selection, cluster validation and results interpretation) with feedback pathway. These steps are closely related to each other and affect the derived clusters. In this paper, a new metaheuristic algorithm is proposed for cluster analysis. This algorithm uses an Ant Colony Optimization to feature selection step and a Greedy Randomized Adaptive Search Procedure to clustering algorithm design step. The proposed algorithm has been applied with very good results to many data sets.  相似文献   

8.
In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy.  相似文献   

9.
The partitioning clustering is a technique to classify n objects into k disjoint clusters, and has been developed for years and widely used in many applications. In this paper, a new overlapping cluster algorithm is defined. It differs from traditional clustering algorithms in three respects. First, the new clustering is overlapping, because clusters are allowed to overlap with one another. Second, the clustering is non-exhaustive, because an object is permitted to belong to no cluster. Third, the goals considered in this research are the maximization of the average number of objects contained in a cluster and the maximization of the distances among cluster centers, while the goals in previous research are the maximization of the similarities of objects in the same clusters and the minimization of the similarities of objects in different clusters. Furthermore, the new clustering is also different from the traditional fuzzy clustering, because the object–cluster relationship in the new clustering is represented by a crisp value rather than that represented by using a fuzzy membership degree. Accordingly, a new overlapping partitioning cluster (OPC) algorithm is proposed to provide overlapping and non-exhaustive clustering of objects. Finally, several simulation and real world data sets are used to evaluate the effectiveness and the efficiency of the OPC algorithm, and the outcomes indicate that the algorithm can generate satisfactory clustering results.  相似文献   

10.
文本聚类是聚类技术的重要研究领域.该技术根据文本的相似特征或相似表达式对文本进行聚类,使得属于同类的文本具有最大的相似性,而属不同类文本具有最大的差异性.与其它文字相比,蒙古文的结构和书写方式具有许多特征.本文结合K-means与克隆免疫算法提出了一种称为ICKM的新型聚类技术.四种元素集上的仿真实验说明了我们提出的方法在蒙古文聚类的有效性.  相似文献   

11.
Factor clustering methods have been developed in recent years thanks to improvements in computational power. These methods perform a linear transformation of data and a clustering of the transformed data, optimizing a common criterion. Probabilistic distance (PD)-clustering is an iterative, distribution free, probabilistic clustering method. Factor PD-clustering (FPDC) is based on PD-clustering and involves a linear transformation of the original variables into a reduced number of orthogonal ones using a common criterion with PD-clustering. This paper demonstrates that Tucker3 decomposition can be used to accomplish this transformation. Factor PD-clustering alternatingly exploits Tucker3 decomposition and PD-clustering on transformed data until convergence is achieved. This method can significantly improve the PD-clustering algorithm performance; large data sets can thus be partitioned into clusters with increasing stability and robustness of the results. Real and simulated data sets are used to compare FPDC with its main competitors, where it performs equally well when clusters are elliptically shaped but outperforms its competitors with non-Gaussian shaped clusters or noisy data.  相似文献   

12.
We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.  相似文献   

13.
区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。  相似文献   

14.
A genetic k-medoids clustering algorithm   总被引:1,自引:0,他引:1  
We propose a hybrid genetic algorithm for k-medoids clustering. A novel heuristic operator is designed and integrated with the genetic algorithm to fine-tune the search. Further, variable length individuals that encode different number of medoids (clusters) are used for evolution with a modified Davies-Bouldin index as a measure of the fitness of the corresponding partitionings. As a result the proposed algorithm can efficiently evolve appropriate partitionings while making no a priori assumption about the number of clusters present in the datasets. In the experiments, we show the effectiveness of the proposed algorithm and compare it with other related clustering methods.  相似文献   

15.
Cluster analysis is an unsupervised learning technique for partitioning objects into several clusters. Assuming that noisy objects are included, we propose a soft clustering method which assigns objects that are significantly different from noise into one of the specified number of clusters by controlling decision errors through multiple testing. The parameters of the Gaussian mixture model are estimated from the EM algorithm. Using the estimated probability density function, we formulated a multiple hypothesis testing for the clustering problem, and the positive false discovery rate (pFDR) is calculated as our decision error. The proposed procedure classifies objects into significant data or noise simultaneously according to the specified target pFDR level. When applied to real and artificial data sets, it was able to control the target pFDR reasonably well, offering a satisfactory clustering performance.  相似文献   

16.
We propose minimum volume ellipsoids (MVE) clustering as an alternative clustering technique to k-means for data clusters with ellipsoidal shapes and explore its value and practicality. MVE clustering allocates data points into clusters in a way that minimizes the geometric mean of the volumes of each cluster’s covering ellipsoids. Motivations for this approach include its scale-invariance, its ability to handle asymmetric and unequal clusters, and our ability to formulate it as a mixed-integer semidefinite programming problem that can be solved to global optimality. We present some preliminary empirical results that illustrate MVE clustering as an appropriate method for clustering data from mixtures of “ellipsoidal” distributions and compare its performance with the k-means clustering algorithm as well as the MCLUST algorithm (which is based on a maximum likelihood EM algorithm) available in the statistical package R. Research of the first author was supported in part by a Discovery Grant from NSERC and a research grant from Faculty of Mathematics, University of Waterloo. Research of the second author was supported in part by a Discovery Grant from NSERC and a PREA from Ontario, Canada.  相似文献   

17.
There are many data clustering techniques available to extract meaningful information from real world data, but the obtained clustering results of the available techniques, running time for the performance of clustering techniques in clustering real world data are highly important. This work is strongly felt that fuzzy clustering technique is suitable one to find meaningful information and appropriate groups into real world datasets. In fuzzy clustering the objective function controls the groups or clusters and computation parts of clustering. Hence researchers in fuzzy clustering algorithm aim is to minimize the objective function that usually has number of computation parts, like calculation of cluster prototypes, degree of membership for objects, computation part for updating and stopping algorithms. This paper introduces some new effective fuzzy objective functions with effective fuzzy parameters that can help to minimize the running time and to obtain strong meaningful information or clusters into the real world datasets. Further this paper tries to introduce new way for predicting membership, centres by minimizing the proposed new fuzzy objective functions. And experimental results of proposed algorithms are given to illustrate the effectiveness of proposed methods.  相似文献   

18.
A characteristic feature of many relevant real life networks, like the WWW, Internet, transportation and communication networks, or even biological and social networks, is their clustering structure. We discuss in this paper a novel algorithm to identify cluster sets of densely interconnected nodes in a network. The algorithm is based on local information and therefore it is very fast with respect other proposed methods, while it keeps a similar performance in detecting the clusters.  相似文献   

19.
鉴于航空公司在客户聚类分群中对聚类效果进行评价并确定最佳k值的轮廓系数法存在时间复杂度过高O(n2)以及准确率较低问题,文章首先采用对象与同簇或不同簇中心间距离计算来替换同类或异类对象间的距离计算,并通过聚类效果变化率确定轮廓系数调节位置及调节权重,提出一种改进的轮廓系数法;其次,基于预处理且特征选择后真实航空公司客户数据构建聚类模型,借助改进轮廓系数法确定最优客户分群,并构建用户画像;最后,针对不同航空公司分群客户进行特征描述并提出相应个性化服务措施,辅助航空公司为客户提供个性化产品与服务。实证研究结果表明:不同样本量下改进轮廓系数法的精确率和运行效率均有所提升;基于改进轮廓系数法的航空公司客户分群结果符合客观实际,所提服务措施为航空公司最大化客户需求、提高客户满意度提供借鉴。  相似文献   

20.
The problem of bivariate clustering for the simultaneous grouping of rows and columns of matrices was addressed with a mixed-integer linear programming model. The model was solved using conventional methodology for very small problems but solving even small to moderate-sized problems was a formidable challenge. Because of the NP-complete nature of this class of problems, a genetic algorithm was developed to solve realistically sized problems of larger dimensions. A commonly encountered example is the simultaneous clustering of parts into part families and machines into machine cells in a cellular manufacturing context for group technology. The attractiveness of employing the optimization model (with objective function being a sum of dissimilarity measures) to provide simultaneous grouping of part types and machine types was compared to solutions found by employing the commonly used grouping efficacy measure. For cellular manufacturing problem instances from the literature, the intrinsic differences between the objective of the proposed model and grouping efficacy is highlighted. The solution to the general model found by employing a genetic algorithm solution technique and applying a simple heuristic was shown to perform as well as other algorithms to find the commonly accepted best known solutions for grouping efficacy. Further examples in industrial purchasing behavior and market segmentation help reveal the general applicability of the model for obtaining natural clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号