首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 32 毫秒
1.
One-mode three-way overlapping cluster analysis   总被引:1,自引:0,他引:1  
The present paper introduces an overlapping cluster analysis model and an associated algorithm that can analyze one-mode three-way similarities. The present model is an extension of ADCLUS model, and the present algorithm is based on the MAPCLUS algorithm. In the present model, one-mode three-way similarities are represented by the sum of the numerical weights of clusters to which any triplet of objects belongs. The present model and algorithm were applied to joint purchase data, and compared the result with that of MAPCLUS to show that the present model is effective in representing one-mode three-way similarities.  相似文献   

2.
Digital circuits have grown exponentially in their sizes over the past decades. To be able to automate the design of these circuits, efficient algorithms are needed. One of the challenging stages of circuit design is the physical design where the physical locations of the components of a circuit are determined. Coarsening or clustering algorithms have become popular with physical designers due to their ability to reduce circuit sizes in the intermediate design steps such that the design can be performed faster and with higher quality. In this paper, a new clustering algorithm based on the algebraic multigrid (AMG) technique is presented. In the proposed algorithm, AMG is used to assign weights to connections between cells of a circuit and find cells that are best suited to become the initial cells for clusters, seed cells. The seed cells and the weights between them and the other cells are then used to cluster the cells of a circuit. The analysis of the proposed algorithm proves linear-time complexity, O(N), where N is the number of pins in a circuit. The numerical experiments demonstrate that AMG-based clustering can achieve high quality clusters and improve circuit placement designs with low computational cost.  相似文献   

3.
Maximal margin based frameworks have emerged as a powerful tool for supervised learning. The extension of these ideas to the unsupervised case, however, is problematic since the underlying optimization entails a discrete component. In this paper, we first study the computational complexity of maximal hard margin clustering and show that the hard margin clustering problem can be precisely solved in O(n d+2) time where n is the number of the data points and d is the dimensionality of the input data. However, since it is well known that many datasets commonly ‘express’ themselves primarily in far fewer dimensions, our interest is in evaluating if a careful use of dimensionality reduction can lead to practical and effective algorithms. We build upon these observations and propose a new algorithm that gradually increases the number of features used in the separation model in each iteration, and analyze the convergence properties of this scheme. We report on promising numerical experiments based on a ‘truncated’ version of this approach. Our experiments indicate that for a variety of datasets, good solutions equivalent to those from other existing techniques can be obtained in significantly less time.  相似文献   

4.
Clustering is an important problem in data mining. It can be formulated as a nonsmooth, nonconvex optimization problem. For the most global optimization techniques this problem is challenging even in medium size data sets. In this paper, we propose an approach that allows one to apply local methods of smooth optimization to solve the clustering problems. We apply an incremental approach to generate starting points for cluster centers which enables us to deal with nonconvexity of the problem. The hyperbolic smoothing technique is applied to handle nonsmoothness of the clustering problems and to make it possible application of smooth optimization algorithms to solve them. Results of numerical experiments with eleven real-world data sets and the comparison with state-of-the-art incremental clustering algorithms demonstrate that the smooth optimization algorithms in combination with the incremental approach are powerful alternative to existing clustering algorithms.  相似文献   

5.
One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.  相似文献   

6.
The authors provide some 2-approximation algorithm for an intractable problem to which one can reduce the problem of partitioning a vector set in Euclidean space into the two subsets (clusters) having the minimum sum of distance squares.  相似文献   

7.
8.
A non metric clustering algorithm based on a fuzzy objective function which reflects proximity based on some global dissimilarity measure is proposed.The global optimal solution is shown to be difficult to obtain and an alternative iterative procedure is presented. This procedure is easily implemented and converges to a local optimum.Some validity functionals which measure the effectiveness with which cluster structure has been identified are compared in relation with the iterative procedures described in the paper.  相似文献   

9.
In this article, we present a randomized dynamic cluster algorithm for large data sets. It is based on the restricted random walk cluster algorithm by Schöll and Schöll-Paschinger that has given good results in past studies. We discuss different approaches for the clustering of dynamic data sets. In contrast to most of these methods, dynamic restricted random walk clustering is also efficient for a small percentage of changes in the data set and has the additional advantage that the updates asymptotically produce the same clusters as a reclustering with the static variant; there is thus no need for any reclustering ever. In addition, the method has a relatively low computational complexity which enables it to cluster large data sets.  相似文献   

10.
A genetic k-medoids clustering algorithm   总被引:1,自引:0,他引:1  
We propose a hybrid genetic algorithm for k-medoids clustering. A novel heuristic operator is designed and integrated with the genetic algorithm to fine-tune the search. Further, variable length individuals that encode different number of medoids (clusters) are used for evolution with a modified Davies-Bouldin index as a measure of the fitness of the corresponding partitionings. As a result the proposed algorithm can efficiently evolve appropriate partitionings while making no a priori assumption about the number of clusters present in the datasets. In the experiments, we show the effectiveness of the proposed algorithm and compare it with other related clustering methods.  相似文献   

11.
A graph clustering problem is under study (also known as the graph approximation problem) with a constraint on cluster sizes. Some new approximation algorithm is presented for this problem, and performance guarantee of the algorithm is obtained. It is shown that the problem belongs to the class APX for every fixed p, where p is the upper bound on the cluster sizes.  相似文献   

12.
Clustering objects into groups is usually done using a statistical heuristic or an optimisation. The method depends on the size of the problem and its purpose. There may exist a number of partitions which do not differ significantly but some of which may be preferable (or equally good) when aspects of the problem not formally contained in the model are considered in the interpretation of the result. To decide between a number of good partitions they must first be enumerated and this may be done by using a number of different heuristics. In this paper an alternative method is described which uses an integer linear programming model having the number and size distribution of groups as objectives and the criteria for group membership as constraints. The model is applied to three problems each having a different measure of dissimilarity between objects and so different membership criteria. In each case a number of optimal solutions are found and expressed in two parts: a core of groups, the membership of which does not change, and the remaining objects which augment the core. The core is found to contain over three quarters of the objects and so provides a stable base for cluster definition.  相似文献   

13.
Scalability of clustering algorithms is a critical issue facing the data mining community. One method to handle this issue is to use only a subset of all instances. This paper develops an optimization-based approach to the partitional clustering problem using an algorithm specifically designed for noisy performance, which is a problem that arises when using a subset of instances. Numerical results show that computation time can be dramatically reduced by using a partial set of instances without sacrificing solution quality. In addition, these results are more persuasive as the size of the problem is larger.  相似文献   

14.
In this paper, a novel memetic algorithm (MA) named GS-MPSO is proposed by combining a particle swarm optimization (PSO) with a Gaussian mutation operator and a Simulated Annealing (SA)-based local search operator. In GS-MPSO, the particles are organized as a ring lattice. The Gaussian mutation operator is applied to the stagnant particles to prevent GS-MPSO trapping into local optima. The SA-based local search strategy is developed to combine with the cognition-only PSO model and perform a fine-grained local search around the promising regions. The experimental results show that GS-MPSO is superior to some other variants of PSO with better performance on optimizing the benchmark functions when the computing resource is limited. Data clustering is studied as a real case study to further demonstrate its optimization ability and usability, too.  相似文献   

15.
The K-means algorithm has been a widely applied clustering technique, especially in the area of marketing research. In spite of its popularity and ability to deal with large volumes of data quickly and efficiently, K-means has its drawbacks, such as its inability to provide good solution quality and robustness. In this paper, an extended study of the K-means algorithm is carried out. We propose a new clustering algorithm that integrates the concepts of hierarchical approaches and the K-means algorithm to yield improved performance in terms of solution quality and robustness. This proposed algorithm and score function are introduced and thoroughly discussed. Comparison studies with the K-means algorithm and three popular K-means initialization methods using five well-known test data sets are also presented. Finally, a business application involving segmenting credit card users demonstrates the algorithm's capability.  相似文献   

16.
Based on inter-cluster separation clustering (ICSC) fuzzy inter-cluster separation clustering (FICSC) deals with all the distances between the cluster centers, maximizes these distances and obtains the better performances of clustering. However, FICSC is sensitive to noises the same as fuzzy c-means (FCM) clustering. Possibilistic type of FICSC is proposed to combine FICSC and possibilistic c-means (PCM) clustering. Mixed fuzzy inter-cluster separation clustering (MFICSC) is presented to extend possibilistic type of FICSC because possibilistic type of FICSC is sensitive to initial cluster centers and always generates coincident clusters. MFICSC can produce both fuzzy membership values and typicality values simultaneously. MFICSC shows good performances in dealing with noisy data and overcoming the problem of coincident clusters. The experimental results with data sets show that our proposed MFICSC holds better clustering accuracy, little clustering time and the exact cluster centers.  相似文献   

17.
We explore an approach to possibilistic fuzzy clustering that avoids a severe drawback of the conventional approach, namely that the objective function is truly minimized only if all cluster centers are identical. Our approach is based on the idea that this undesired property can be avoided if we introduce a mutual repulsion of the clusters, so that they are forced away from each other. We develop this approach for the possibilistic fuzzy c-means algorithm and the Gustafson–Kessel algorithm. In our experiments we found that in this way we can combine the partitioning property of the probabilistic fuzzy c-means algorithm with the advantages of a possibilistic approach w.r.t. the interpretation of the membership degrees.  相似文献   

18.
Transfer algorithms are usually used to optimize an objective function that is defined on the set of partitions of a finite set X. In this paper we define an equivalence relation ? on the set of fuzzy equivalence relations on X and establish a bijection from the set of hierarchies on X to the set of equivalence classes with respect to ?. Thus, hierarchies can be identified with fuzzy equivalence relations and the transfer algorithm can be modified in order to optimize an objective function that is defined on the set of hierarchies on X.  相似文献   

19.
In this paper, we investigate the problem of determining the number of clusters in the k-modes based categorical data clustering process. We propose a new categorical data clustering algorithm with automatic selection of k. The new algorithm extends the k-modes clustering algorithm by introducing a penalty term to the objective function to make more clusters compete for objects. In the new objective function, we employ a regularization parameter to control the number of clusters in a clustering process. Instead of finding k directly, we choose a suitable value of regularization parameter such that the corresponding clustering result is the most stable one among all the generated clustering results. Experimental results on synthetic data sets and the real data sets are used to demonstrate the effectiveness of the proposed algorithm.  相似文献   

20.
Hierarchical hesitant fuzzy K-means clustering algorithm   总被引:1,自引:0,他引:1  
Due to the limitation and hesitation in one's knowledge, the membership degree of an element to a given set usually has a few different values, in which the conventional fuzzy sets are invalid. Hesitant fuzzy sets are a powerful tool to treat this case. The present paper focuses on investigating the clustering technique for hesitant fuzzy sets based on the K-means clustering algorithm which takes the results of hierarchical clustering as the initial clusters. Finally, two examples demonstrate the validity of our algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号