首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper centres on clustering approaches that deal with multiple DNA microarray datasets. Four clustering algorithms for deriving a clustering solution from multiple gene expression matrices studying the same biological phenomenon are considered: two unsupervised cluster techniques based on information integration, a hybrid consensus clustering method combining Particle Swarm Optimization and k-means that can be referred to supervised clustering, and a supervised consensus clustering algorithm enhanced by Formal Concept Analysis (FCA), which initially produces a list of different clustering solutions, one per each experiment and then these solutions are transformed by portioning the cluster centres into a single overlapping partition, which is further analyzed by employing FCA. The four algorithms are evaluated on gene expression time series obtained from a study examining the global cell-cycle control of gene expression in fission yeast Schizosaccharomyces pombe.  相似文献   

2.
One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.  相似文献   

3.
Fitting semiparametric clustering models to dissimilarity data   总被引:1,自引:0,他引:1  
The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities. In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilarity data. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partition is chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a Multidimensional Scaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandem analysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities. This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new object partitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.  相似文献   

4.
Motivated by the method for solving center-based Least Squares—clustering problem (Kogan in Introduction to clustering large and high-dimensional data, Cambridge University Press, 2007; Teboulle in J Mach Learn Res 8:65–102, 2007) we construct a very efficient iterative process for solving a one-dimensional center-based l 1—clustering problem, on the basis of which it is possible to determine the optimal partition. We analyze the basic properties and convergence of our iterative process, which converges to a stationary point of the corresponding objective function for each choice of the initial approximation. Given is also a corresponding algorithm, which in only few steps gives a stationary point and the corresponding partition. The method is illustrated and visualized on the example of looking for an optimal partition with two clusters, where we check all stationary points of the corresponding minimizing functional. Also, the method is tested on the basis of large numbers of data points and clusters and compared with the method for solving the center-based Least Squares—clustering problem described in Kogan (2007) and Teboulle (2007).  相似文献   

5.
Cluster analysis is an important task in data mining and refers to group a set of objects such that the similarities among objects within the same group are maximal while similarities among objects from different groups are minimal. The particle swarm optimization algorithm (PSO) is one of the famous metaheuristic optimization algorithms, which has been successfully applied to solve the clustering problem. However, it has two major shortcomings. The PSO algorithm converges rapidly during the initial stages of the search process, but near global optimum, the convergence speed will become very slow. Moreover, it may get trapped in local optimum if the global best and local best values are equal to the particle’s position over a certain number of iterations. In this paper we hybridized the PSO with a heuristic search algorithm to overcome the shortcomings of the PSO algorithm. In the proposed algorithm, called PSOHS, the particle swarm optimization is used to produce an initial solution to the clustering problem and then a heuristic search algorithm is applied to improve the quality of this solution by searching around it. The superiority of the proposed PSOHS clustering method, as compared to other popular methods for clustering problem is established for seven benchmark and real datasets including Iris, Wine, Crude Oil, Cancer, CMC, Glass and Vowel.  相似文献   

6.
In this paper we investigate some algebraic and geometric properties of fuzzy partition spaces (convex hulls of hard or conventional partition spaces). In particular, we obtain their dimensions, and describe a number of algorithms for effecting convex decompositions. Two of these are easily programmable, and each affords a different insight about data structures suggested by the fuzzy partition decomposed. We also show how the sequence of partitions in any convex decomposition leads to a matrix for which the norm of the corresponding coefficient vector equals a scalar measure of partition fuzziness used with certain fuzzy clustering algorithms.  相似文献   

7.
In this paper we show how the notions of conductance and cutoff can be used to determine the length of the random walks in some clustering algorithms. We consider graphs which are globally sparse but locally dense. They present a community structure: there exists a partition of the set of vertices into subsets which display strong internal connections but few links between each other. Using a distance between nodes built on random walks we consider a hierarchical clustering algorithm which provides a most appropriate partition. The length of these random walks has to be chosen in advance and has to be appropriate. Finally, we introduce an extension of this clustering algorithm to dynamical sequences of graphs on the same set of vertices.  相似文献   

8.
K-平均算法属于聚类分析中的动态聚类法,但其聚类效果受初始聚类分类或初始点的影响较大。本文提出一种遗传算法(GA)来进行近代初始分类,以内部聚类准则作为评价指标,实验结果表明,该算法明显好于K-平均算法。  相似文献   

9.
石子烨  梁恒  白峰杉 《计算数学》2014,36(3):325-334
数据分割研究的基本内容是数据的分类和聚类,是数据挖掘的核心问题之一,在实际问题中应用广泛.特别是针对有向网络数据的研究更是学科发展的前沿.但由于这类问题结构的非对称性,使得模型与算法的构建存在本质困难,因此相应的研究结果较少.本文借鉴分子动力学方法的思想,提出了一类新的网络数据半监督分类模型及算法.该算法不仅适用于关系对称的无向网络数据,而且适用于关系非对称的有向网络.最后针对期刊引用网络数据进行了数值实验,结果表明了模型及算法的可行性和有效性.  相似文献   

10.
Digital circuits have grown exponentially in their sizes over the past decades. To be able to automate the design of these circuits, efficient algorithms are needed. One of the challenging stages of circuit design is the physical design where the physical locations of the components of a circuit are determined. Coarsening or clustering algorithms have become popular with physical designers due to their ability to reduce circuit sizes in the intermediate design steps such that the design can be performed faster and with higher quality. In this paper, a new clustering algorithm based on the algebraic multigrid (AMG) technique is presented. In the proposed algorithm, AMG is used to assign weights to connections between cells of a circuit and find cells that are best suited to become the initial cells for clusters, seed cells. The seed cells and the weights between them and the other cells are then used to cluster the cells of a circuit. The analysis of the proposed algorithm proves linear-time complexity, O(N), where N is the number of pins in a circuit. The numerical experiments demonstrate that AMG-based clustering can achieve high quality clusters and improve circuit placement designs with low computational cost.  相似文献   

11.
介绍一些网络聚类算法及其基本原理,简述了其在生物信息学的应用。本文不是一个网络聚类算法的全面综述,只介绍这些网络聚类算法的基本思路,体会其数学建模的基本思想。  相似文献   

12.
In this paper, we propose a new kernel-based fuzzy clustering algorithm which tries to find the best clustering results using optimal parameters of each kernel in each cluster. It is known that data with nonlinear relationships can be separated using one of the kernel-based fuzzy clustering methods. Two common fuzzy clustering approaches are: clustering with a single kernel and clustering with multiple kernels. While clustering with a single kernel doesn’t work well with “multiple-density” clusters, multiple kernel-based fuzzy clustering tries to find an optimal linear weighted combination of kernels with initial fixed (not necessarily the best) parameters. Our algorithm is an extension of the single kernel-based fuzzy c-means and the multiple kernel-based fuzzy clustering algorithms. In this algorithm, there is no need to give “good” parameters of each kernel and no need to give an initial “good” number of kernels. Every cluster will be characterized by a Gaussian kernel with optimal parameters. In order to show its effective clustering performance, we have compared it to other similar clustering algorithms using different databases and different clustering validity measures.  相似文献   

13.
Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.  相似文献   

14.
This paper presents two effective algorithms for clustering n entities into p mutually exclusive and exhaustive groups where the ‘size’ of each group is restricted. As its objective, the clustering model minimizes the sum of distance between each entity and a designated group median. Empirical results using both a primal heuristic and a hybrid heuristic-subgradient method for problems having n ? 100 (i.e. 10 100 binary variables) show that the algorithms locate close to optimal solutions without resorting to tree enumeration. The capacitated clustering model is applied to the problem of sales force territorial design.  相似文献   

15.
《Fuzzy Sets and Systems》2004,141(2):301-317
This paper presents fuzzy clustering algorithms for mixed features of symbolic and fuzzy data. El-Sonbaty and Ismail proposed fuzzy c-means (FCM) clustering for symbolic data and Hathaway et al. proposed FCM for fuzzy data. In this paper we give a modified dissimilarity measure for symbolic and fuzzy data and then give FCM clustering algorithms for these mixed data types. Numerical examples and comparisons are also given. Numerical examples illustrate that the modified dissimilarity gives better results. Finally, the proposed clustering algorithm is applied to real data with mixed feature variables of symbolic and fuzzy data.  相似文献   

16.
The min-edge clique partition problem asks to find a partition of the vertices of a graph into a set of cliques with the fewest edges between cliques. This is a known NP-complete problem and has been studied extensively in the scope of fixed-parameter tractability (FPT) where it is commonly known as the Cluster Deletion problem. Many of the recently-developed FPT algorithms rely on being able to solve Cluster Deletion in polynomial time on restricted graph structures.  相似文献   

17.
In this paper we present a comparison among some nonhierarchical and hierarchical clustering algorithms including SOM (Self-Organization Map) neural network and Fuzzy c-means methods. Data were simulated considering correlated and uncorrelated variables, nonoverlapping and overlapping clusters with and without outliers. A total of 2530 data sets were simulated. The results showed that Fuzzy c-means had a very good performance in all cases being very stable even in the presence of outliers and overlapping. All other clustering algorithms were very affected by the amount of overlapping and outliers. SOM neural network did not perform well in almost all cases being very affected by the number of variables and clusters. The traditional hierarchical clustering and K-means methods presented similar performance.  相似文献   

18.
Many of the datasets encountered in statistics are two-dimensional in nature and can be represented by a matrix. Classical clustering procedures seek to construct separately an optimal partition of rows or, sometimes, of columns. In contrast, co-clustering methods cluster the rows and the columns simultaneously and organize the data into homogeneous blocks (after suitable permutations). Methods of this kind have practical importance in a wide variety of applications such as document clustering, where data are typically organized in two-way contingency tables. Our goal is to offer coherent frameworks for understanding some existing criteria and algorithms for co-clustering contingency tables, and to propose new ones. We look at two different frameworks for the problem of co-clustering. The first involves minimizing an objective function based on measures of association and in particular on phi-squared and mutual information. The second uses a model-based co-clustering approach, and we consider two models: the block model and the latent block model. We establish connections between different approaches, criteria and algorithms, and we highlight a number of implicit assumptions in some commonly used algorithms. Our contribution is illustrated by numerical experiments on simulated and real-case datasets that show the relevance of the presented methods in the document clustering field.  相似文献   

19.
A unified presentation of classical clustering algorithms is proposed both for the hard and fuzzy pattern classification problems. Based on two types of objective functions, a new method is presented and compared with the procedures of Dunn and Ruspini. In order to determine the best, or more natural number of fuzzy clusters, two coefficients that measure the “degree of non-fuzziness” of the partition are proposed. Numerous computational results are shown.  相似文献   

20.
We propose in this paper two new competitive unsupervised clustering algorithms: the first algorithm deals with ultrametric data, it has a computational cost of O(n). The second algorithm has two strong features: it is fast and flexible on the processed data type as well as in terms of precision. The second algorithm has a computational cost, in the worst case, of O(n2), and in the average case, of O(n). These complexities are due to exploitation of ultrametric distance properties. In the first method, we use the order induced by an ultrametric in a given space to demonstrate how we can explore quickly data proximity. In the second method, we create an ultrametric space from a sample data, chosen uniformly at random, in order to obtain a global view of proximities in the data set according to the similarity criterion. Then, we use this proximity profile to cluster the global set. We present an example of our algorithms and compare their results with those of a classic clustering method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号