首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Data clustering, also called unsupervised learning, is a fundamental issue in data mining that is used to understand and mine the structure of an untagged assemblage of data into separate groups based on their similarity. Recent studies have shown that clustering techniques that optimize a single objective may not provide satisfactory result because no single validity measure works well on different kinds of data sets. Moreover, the performance of clustering algorithms degrades with more and more overlaps among clusters in a data set. These facts have motivated us to develop a fuzzy multi-objective particle swarm optimization framework in an innovative fashion for data clustering, termed as FMOPSO, which is able to deliver more effective results than state-of-the-art clustering algorithms. The key challenge in designing FMOPSO framework for data clustering is how to resolve cluster assignments confusion with such points in the data set which have significant belongingness to more than one cluster. The proposed framework addresses this problem by identification of points having significant membership to multiple classes, excluding them, and re-classifying them into single class assignments. To ascertain the superiority of the proposed algorithm, statistical tests have been performed on a variety of numerical and categorical real life data sets. Our empirical study shows that the performance of the proposed framework (in both terms of efficiency and effectiveness) significantly outperforms the state-of-the-art data clustering algorithms.  相似文献   

2.
Clustering algorithms divide up a dataset into a set of classes/clusters, where similar data objects are assigned to the same cluster. When the boundary between clusters is ill defined, which yields situations where the same data object belongs to more than one class, the notion of fuzzy clustering becomes relevant. In this course, each datum belongs to a given class with some membership grade, between 0 and 1. The most prominent fuzzy clustering algorithm is the fuzzy c-means introduced by Bezdek (Pattern recognition with fuzzy objective function algorithms, 1981), a fuzzification of the k-means or ISODATA algorithm. On the other hand, several research issues have been raised regarding both the objective function to be minimized and the optimization constraints, which help to identify proper cluster shape (Jain et al., ACM Computing Survey 31(3):264–323, 1999). This paper addresses the issue of clustering by evaluating the distance of fuzzy sets in a feature space. Especially, the fuzzy clustering optimization problem is reformulated when the distance is rather given in terms of divergence distance, which builds a bridge to the notion of probabilistic distance. This leads to a modified fuzzy clustering, which implicitly involves the variance–covariance of input terms. The solution of the underlying optimization problem in terms of optimal solution is determined while the existence and uniqueness of the solution are demonstrated. The performances of the algorithm are assessed through two numerical applications. The former involves clustering of Gaussian membership functions and the latter tackles the well-known Iris dataset. Comparisons with standard fuzzy c-means (FCM) are evaluated and discussed.  相似文献   

3.
We propose a functional extension of fuzzy clusterwise regression, which estimates fuzzy memberships of clusters and regression coefficient functions for each cluster simultaneously. The proposed method permits dependent and/or predictor variables to be functional, varying over time, space, and other continua. The fuzzy memberships and clusterwise regression coefficient functions are estimated by minimizing an objective function that adopts a basis function expansion approach to approximating functional data. An alternating least squares algorithm is developed to minimize the objective function. We conduct simulation studies to demonstrate the superior performance of the proposed method compared to its non-functional counterpart and to examine the performance of various cluster validity measures for selecting the optimal number of clusters. We apply the proposed method to real datasets to illustrate the empirical usefulness of the proposed method.  相似文献   

4.
The partitioning clustering is a technique to classify n objects into k disjoint clusters, and has been developed for years and widely used in many applications. In this paper, a new overlapping cluster algorithm is defined. It differs from traditional clustering algorithms in three respects. First, the new clustering is overlapping, because clusters are allowed to overlap with one another. Second, the clustering is non-exhaustive, because an object is permitted to belong to no cluster. Third, the goals considered in this research are the maximization of the average number of objects contained in a cluster and the maximization of the distances among cluster centers, while the goals in previous research are the maximization of the similarities of objects in the same clusters and the minimization of the similarities of objects in different clusters. Furthermore, the new clustering is also different from the traditional fuzzy clustering, because the object–cluster relationship in the new clustering is represented by a crisp value rather than that represented by using a fuzzy membership degree. Accordingly, a new overlapping partitioning cluster (OPC) algorithm is proposed to provide overlapping and non-exhaustive clustering of objects. Finally, several simulation and real world data sets are used to evaluate the effectiveness and the efficiency of the OPC algorithm, and the outcomes indicate that the algorithm can generate satisfactory clustering results.  相似文献   

5.
One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.  相似文献   

6.
A unified presentation of classical clustering algorithms is proposed both for the hard and fuzzy pattern classification problems. Based on two types of objective functions, a new method is presented and compared with the procedures of Dunn and Ruspini. In order to determine the best, or more natural number of fuzzy clusters, two coefficients that measure the “degree of non-fuzziness” of the partition are proposed. Numerous computational results are shown.  相似文献   

7.
Medoid-based fuzzy clustering generates clusters of objects based on relational data, which records pairwise similarities or dissimilarities among objects. Compared with single-medoid based approaches, our recently proposed fuzzy clustering with multiple-weighted medoids has shown superior performance in clustering via experimental study. In this paper, we present a new version of fuzzy relational clustering in this family called fuzzy clustering with multi-medoids (FMMdd). Based on the new objective function of FMMdd, update equations can be derived more conveniently. Moreover, a unified view of FMMdd and two existing fuzzy relational approaches fuzzy c-medoids (FCMdd) and assignment-prototype (A-P) can be established, which allows us to conduct further analytical study to investigate the effectiveness and feasibility of the proposed approach as well as the limitations of existing ones. The robustness of FMMdd is also investigated. Our theoretical and numerical studies show that the proposed approach produces good quality of clusters with rich cluster-based information and it is less sensitive to noise.  相似文献   

8.
In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets.  相似文献   

9.
Functional data clustering: a survey   总被引:1,自引:0,他引:1  
Clustering techniques for functional data are reviewed. Four groups of clustering algorithms for functional data are proposed. The first group consists of methods working directly on the evaluation points of the curves. The second groups is defined by filtering methods which first approximate the curves into a finite basis of functions and second perform clustering using the basis expansion coefficients. The third groups is composed of methods which perform simultaneously dimensionality reduction of the curves and clustering, leading to functional representation of data depending on clusters. The last group consists of distance-based methods using clustering algorithms based on specific distances for functional data. A software review as well as an illustration of the application of these algorithms on real data are presented.  相似文献   

10.
Based on inter-cluster separation clustering (ICSC) fuzzy inter-cluster separation clustering (FICSC) deals with all the distances between the cluster centers, maximizes these distances and obtains the better performances of clustering. However, FICSC is sensitive to noises the same as fuzzy c-means (FCM) clustering. Possibilistic type of FICSC is proposed to combine FICSC and possibilistic c-means (PCM) clustering. Mixed fuzzy inter-cluster separation clustering (MFICSC) is presented to extend possibilistic type of FICSC because possibilistic type of FICSC is sensitive to initial cluster centers and always generates coincident clusters. MFICSC can produce both fuzzy membership values and typicality values simultaneously. MFICSC shows good performances in dealing with noisy data and overcoming the problem of coincident clusters. The experimental results with data sets show that our proposed MFICSC holds better clustering accuracy, little clustering time and the exact cluster centers.  相似文献   

11.
We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.  相似文献   

12.
Traditional c-means clustering partitions a group of objects into a number of non-overlapping sets. Rough sets provide more flexible and objective representation than classical sets with hard partition and fuzzy sets with subjective membership function for a given dataset. Rough c-means clustering and its extensions were introduced and successfully applied in many real life applications in recent years. Each cluster is represented by a reasonable pair of lower and upper approximations. However, the most available algorithms pay no attention to the influence of the imbalanced spatial distribution within a cluster. The limitation of the mean iterative calculation function, with the same weight for all the data objects in a lower or upper approximation, is analyzed. A hybrid imbalanced measure of distance and density for the rough c-means clustering is defined, and a modified rough c-means clustering algorithm is presented in this paper. To evaluate the proposed algorithm, it has been applied to several real world data sets from UCI. The validity of this algorithm is demonstrated by the results of comparative experiments.  相似文献   

13.
A modified approach had been developed in this study by combining two well-known algorithms of clustering, namely fuzzy c-means algorithm and entropy-based algorithm. Fuzzy c-means algorithm is one of the most popular algorithms for fuzzy clustering. It could yield compact clusters but might not be able to generate distinct clusters. On the other hand, entropy-based algorithm could obtain distinct clusters, which might not be compact. However, the clusters need to be both distinct as well as compact. The present paper proposes a modified approach of clustering by combining the above two algorithms. A genetic algorithm was utilized for tuning of all three clustering algorithms separately. The proposed approach was found to yield both distinct as well as compact clusters on two data sets.  相似文献   

14.
《Fuzzy Sets and Systems》2004,141(2):281-299
In this paper, we consider the issue of clustering when outliers exist. The outlier set is defined as the complement of the data set. Following this concept, a specially designed fuzzy membership weighted objective function is proposed and the corresponding optimal membership is derived. Unlike the membership of fuzzy c-means, the derived fuzzy membership does not reduce with the increase of the cluster number. With the suitable redefinition of the distance metric, we demonstrate that the objective function could be used to extract c spherical shells. A hard clustering algorithm alleviating the prototype under-utilization problem is also derived. Artificially generated data are used for comparisons.  相似文献   

15.
《Fuzzy Sets and Systems》2004,141(2):301-317
This paper presents fuzzy clustering algorithms for mixed features of symbolic and fuzzy data. El-Sonbaty and Ismail proposed fuzzy c-means (FCM) clustering for symbolic data and Hathaway et al. proposed FCM for fuzzy data. In this paper we give a modified dissimilarity measure for symbolic and fuzzy data and then give FCM clustering algorithms for these mixed data types. Numerical examples and comparisons are also given. Numerical examples illustrate that the modified dissimilarity gives better results. Finally, the proposed clustering algorithm is applied to real data with mixed feature variables of symbolic and fuzzy data.  相似文献   

16.

A framework is proposed to simultaneously cluster objects and detect anomalies in attributed graph data. Our objective function along with the carefully constructed constraints promotes interpretability of both the clustering and anomaly detection components, as well as scalability of our method. In addition, we developed an algorithm called Outlier detection and Robust Clustering for Attributed graphs (ORCA) within this framework. ORCA is fast and convergent under mild conditions, produces high quality clustering results, and discovers anomalies that can be mapped back naturally to the features of the input data. The efficacy and efficiency of ORCA is demonstrated on real world datasets against multiple state-of-the-art techniques.

  相似文献   

17.
Hesitant fuzzy sets (HFSs), which allow the membership degree of an element to a set represented by several possible values, can be considered as a powerful tool to express uncertain information in the process of group decision making. We derive some correlation coefficient formulas for HFSs and apply them to clustering analysis under hesitant fuzzy environments. Two real world examples, i.e. software evaluation and classification as well as the assessment of business failure risk, are employed to illustrate the actual need of the clustering algorithm based on HFSs, which can incorporate the difference of evaluation information provided by different experts in clustering processes. In order to extend the application domain of the clustering algorithm in the framework of HFSs, we develop the interval-valued HFSs and the corresponding correlation coefficient formulas, and then demonstrate their application in clustering with interval-valued hesitant fuzzy information through a specific numerical example.  相似文献   

18.
In this paper, two new algorithms are presented to solve multi-level multi-objective linear programming (ML-MOLP) problems through the fuzzy goal programming (FGP) approach. The membership functions for the defined fuzzy goals of all objective functions at all levels are developed in the model formulation of the problem; so also are the membership functions for vectors of fuzzy goals of the decision variables, controlled by decision makers at the top levels. Then the fuzzy goal programming approach is used to achieve the highest degree of each of the membership goals by minimizing their deviational variables and thereby obtain the most satisfactory solution for all decision makers.  相似文献   

19.
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.  相似文献   

20.
An appropriate distance is an essential ingredient in various real-world learning tasks. Distance metric learning proposes to study a metric, which is capable of reflecting the data configuration much better in comparison with the commonly used methods. We offer an algorithm for simultaneous learning the Mahalanobis like distance and K-means clustering aiming to incorporate data rescaling and clustering so that the data separability grows iteratively in the rescaled space with its sequential clustering. At each step of the algorithm execution, a global optimization problem is resolved in order to minimize the cluster distortions resting upon the current cluster configuration. The obtained weight matrix can also be used as a cluster validation characteristic. Namely, closeness of such matrices learned during a sample process can indicate the clusters readiness; i.e. estimates the true number of clusters. Numerical experiments performed on synthetic and on real datasets verify the high reliability of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号