首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
This paper is concerned with a special case of the generalized minimum spanning tree problem. The problem is defined on an undirected graph, where the vertex set is partitioned into clusters, and non-negative costs are associated with the edges. The problem is to find a tree of minimum cost containing at least one vertex in each cluster. We consider a geometric case of the problem where the graph is complete, all vertices are situated in the plane, and Euclidean distance defines the edge cost. We prove that the problem is strongly -hard even in the case of a special structure of the clustering called grid clustering. We construct an exact exponential time dynamic programming algorithm and, based on this dynamic programming algorithm, we develop a polynomial time approximation scheme for the problem with grid clustering.  相似文献   

2.
Clustering is a popular data analysis and data mining technique. Since clustering problem have NP-complete nature, the larger the size of the problem, the harder to find the optimal solution and furthermore, the longer to reach a reasonable results. A popular technique for clustering is based on K-means such that the data is partitioned into K clusters. In this method, the number of clusters is predefined and the technique is highly dependent on the initial identification of elements that represent the clusters well. A large area of research in clustering has focused on improving the clustering process such that the clusters are not dependent on the initial identification of cluster representation. Another problem about clustering is local minimum problem. Although studies like K-Harmonic means clustering solves the initialization problem trapping to the local minima is still a problem of clustering. In this paper we develop a new algorithm for solving this problem based on a tabu search technique—Tabu K-Harmonic means (TabuKHM). The experiment results on the Iris and the other well known data, illustrate the robustness of the TabuKHM clustering algorithm.  相似文献   

3.
The minimum sum-of-squares clustering problem is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the former problem based on nonsmooth optimization techniques is developed. The issue of applying this algorithm to large data sets is discussed. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm.  相似文献   

4.
We propose minimum volume ellipsoids (MVE) clustering as an alternative clustering technique to k-means for data clusters with ellipsoidal shapes and explore its value and practicality. MVE clustering allocates data points into clusters in a way that minimizes the geometric mean of the volumes of each cluster’s covering ellipsoids. Motivations for this approach include its scale-invariance, its ability to handle asymmetric and unequal clusters, and our ability to formulate it as a mixed-integer semidefinite programming problem that can be solved to global optimality. We present some preliminary empirical results that illustrate MVE clustering as an appropriate method for clustering data from mixtures of “ellipsoidal” distributions and compare its performance with the k-means clustering algorithm as well as the MCLUST algorithm (which is based on a maximum likelihood EM algorithm) available in the statistical package R. Research of the first author was supported in part by a Discovery Grant from NSERC and a research grant from Faculty of Mathematics, University of Waterloo. Research of the second author was supported in part by a Discovery Grant from NSERC and a PREA from Ontario, Canada.  相似文献   

5.
Cluster analysis of genome-wide expression data from DNA microarray hybridization studies is a useful tool for identifying biologically relevant gene groupings (DeRisi et al. 1997; Weiler et al. 1997). It is hence important to apply a rigorous yet intuitive clustering algorithm to uncover these genomic relationships. In this study, we describe a novel clustering algorithm framework based on a variant of the Generalized Benders Decomposition, denoted as the Global Optimum Search (Floudas et al. 1989; Floudas 1995), which includes a procedure to determine the optimal number of clusters to be used. The approach involves a pre-clustering of data points to define an initial number of clusters and the iterative solution of a Linear Programming problem (the primal problem) and a Mixed-Integer Linear Programming problem (the master problem), that are derived from a Mixed Integer Nonlinear Programming problem formulation. Badly placed data points are removed to form new clusters, thus ensuring tight groupings amongst the data points and incrementing the number of clusters until the optimum number is reached. We apply the proposed clustering algorithm to experimental DNA microarray data centered on the Ras signaling pathway in the yeast Saccharomyces cerevisiae and compare the results to that obtained with some commonly used clustering algorithms. Our algorithm compares favorably against these algorithms in the aspects of intra-cluster similarity and inter-cluster dissimilarity, often considered two key tenets of clustering. Furthermore, our algorithm can predict the optimal number of clusters, and the biological coherence of the predicted clusters is analyzed through gene ontology.  相似文献   

6.
Clustering is a fundamental problem in many scientific applications. Standard methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work, we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient. This article has supplementary materials available online.  相似文献   

7.
A divide-and-conquer approach for the feedback arc set is presented. The divide step is performed by solving a minimum bisection problem. Two strategies are used to solve minimum bisection problem: A heuristic based on the stochastic evolution methodology, and a heuristic based on dynamic clustering. Empirical results are presented to compare our method with other approaches. An algorithm to construct test cases for the feedback arc set problem with known optimal number of feedback arcs, is also presented.  相似文献   

8.
Given a graph and costs of assigning to each vertex one of k different colors, we want to find a minimum cost assignment such that no color q induces a subgraph with more than a given number (γq) of connected components. This problem arose in the context of contiguity-constrained clustering, but also has a number of other possible applications. We show the problem to be NP-hard. Nevertheless, we derive a dynamic programming algorithm that proves the case where the underlying graph is a tree to be solvable in polynomial time. Next, we propose mixed-integer programming formulations for this problem that lead to branch-and-cut and branch-and-price algorithms. Finally, we introduce a new class of valid inequalities to obtain an enhanced branch-and-cut. Extensive computational experiments are reported.  相似文献   

9.
Automatic clustering using genetic algorithms   总被引:2,自引:0,他引:2  
In face of the clustering problem, many clustering methods usually require the designer to provide the number of clusters as input. Unfortunately, the designer has no idea, in general, about this information beforehand. In this article, we develop a genetic algorithm based clustering method called automatic genetic clustering for unknown K (AGCUK). In the AGCUK algorithm, noising selection and division-absorption mutation are designed to keep a balance between selection pressure and population diversity. In addition, the Davies-Bouldin index is employed to measure the validity of clusters. Experimental results on artificial and real-life data sets are given to illustrate the effectiveness of the AGCUK algorithm in automatically evolving the number of clusters and providing the clustering partition.  相似文献   

10.
在给定的度量空间中, 单位聚类问题就是寻找最少的单位球来覆盖给定的所有点。这是一个众所周知的组合优化问题, 其在线版本为: 给定一个度量空间, 其中的n个点会一个接一个的到达任何可能的位置, 在点到达的时候必须给该点分配一个单位聚类, 而此时未来点的相关信息都是未知的, 问题的目标是最后使用的单位聚类数目最少。本文考虑的是带如下假设的一类一维在线单位聚类问题: 在相应离线问题的最优解中任意两个相邻聚类之间的距离都大于0.5。本文首先给出了两个在线算法和一些引理, 接着通过0.5的概率分别运行两个在线算法得到一个组合随机算法, 最后证明了这个组合随机算法的期望竞争比不超过1.5。  相似文献   

11.
在给定的度量空间中, 单位聚类问题就是寻找最少的单位球来覆盖给定的所有点。这是一个众所周知的组合优化问题, 其在线版本为: 给定一个度量空间, 其中的n个点会一个接一个的到达任何可能的位置, 在点到达的时候必须给该点分配一个单位聚类, 而此时未来点的相关信息都是未知的, 问题的目标是最后使用的单位聚类数目最少。本文考虑的是带如下假设的一类一维在线单位聚类问题: 在相应离线问题的最优解中任意两个相邻聚类之间的距离都大于0.5。本文首先给出了两个在线算法和一些引理, 接着通过0.5的概率分别运行两个在线算法得到一个组合随机算法, 最后证明了这个组合随机算法的期望竞争比不超过1.5。  相似文献   

12.
Sequential clustering aims at determining homogeneous and/or well-separated clusters within a given set of entities, one at a time, until no more such clusters can be found. We consider a bi-criterion sequential clustering problem in which the radius of a cluster (or maximum dissimilarity between an entity chosen as center and any other entity of the cluster) is chosen as a homogeneity criterion and the split of a cluster (or minimum dissimilarity between an entity in the cluster and one outside of it) is chosen as a separation criterion. An O(N 3) algorithm is proposed for determining radii and splits of all efficient clusters, which leads to an O(N 4) algorithm for bi-criterion sequential clustering with radius and split as criteria. This algorithm is illustrated on the well known Ruspini data set.  相似文献   

13.
First, we present two classes of sequential algorithms for minimum flow problem: decreasing path algorithms and preflow algorithms. Then we describe another approach of the minimum flow problem, that consists of applying any maximum flow algorithm in a modified network. In section 5 we present several parallel preflow algorithms that solve the minimum flow problem. Finally, we present an application of the minimum flow problem.  相似文献   

14.
Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well‐studied clustering method involving both top‐down and bottom‐up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top‐down and bottom‐up hierarchical clustering. The top‐down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)‐time, n2‐processor Crew Pram algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom‐up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC‐complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUIT VALUE PROBLEM class. This class contains a variety of interesting problems, and now for the first time a problem from data mining as well. By proving that HCP is CC‐complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top‐down sequential approach, and the result surprisingly shows that the parallel complexities of the top‐down and bottom‐up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC‐complete problems. © 2008 Wiley Periodicals, Inc. Complexity, 2008  相似文献   

15.
For a transient process in a quasilinear system, we consider an optimization problem of finding a (multi-dimensional) control with minimum intensity. We suggest an algorithm for constructing asymptotic approximations to the solution of this problem. The main advantage of the algorithm is that an optimal control problem for a linear system is solved instead of the original essentially nonlinear problem.  相似文献   

16.
$k$-均值问题是聚类中的经典问题,亦是NP-难问题。如果允许数据点不聚类,而是支付惩罚费用,则引出带惩罚的$k$-均值问题。本文将带惩罚的$k$-均值问题从欧氏距离推广到更一般的$\mu$-相似Bregman散度,研究了带惩罚$\mu$-相似Bregman散度$k$-均值问题的初始化算法。本文给出的初始化算法,近似比与$\mu$和数据点惩罚最大值与最小值的比例$r$相关。  相似文献   

17.
In this paper we propose a hybrid heuristic for the Maximum Dispersion Problem of finding a balanced partition of a set of objects such that the shortest intra-part distance is maximized. In contrast to clustering problems, dispersion problems aim for a large spread of objects in the same group. They arise in many practical applications such as waste collection and the formation of study groups. The heuristic alternates between finding a balanced solution, and increasing the dispersion. Balancing is achieved by a combination of a minimum cost flow algorithm to find promising pairs of parts and a branch-and-bound algorithm that searches for an optimal balance, and the dispersion is increased by a local search followed by an ejection chain method for escaping local minima. We also propose new upper bounds for the problem. In computational experiments we show that the heuristic is able to find solutions significantly faster than previous approaches. Solutions are close to optimal and in many cases provably optimal.  相似文献   

18.
In this paper, we introduce the problem of computing a minimum edge ranking spanning tree (MERST); i.e., find a spanning tree of a given graph G whose edge ranking is minimum. Although the minimum edge ranking of a given tree can be computed in polynomial time, we show that problem MERST is NP-hard. Furthermore, we present an approximation algorithm for MERST, which realizes its worst case performance ratio where n is the number of vertices in G and Δ* is the maximum degree of a spanning tree whose maximum degree is minimum. Although the approximation algorithm is a combination of two existing algorithms for the restricted spanning tree problem and for the minimum edge ranking problem of trees, the analysis is based on novel properties of the edge ranking of trees.  相似文献   

19.
Each clustering algorithm usually optimizes a qualification metric during its progress. The qualification metric in conventional clustering algorithms considers all the features equally important; in other words each feature participates in the clustering process equivalently. It is obvious that some features have more information than others in a dataset. So it is highly likely that some features should have lower importance degrees during a clustering or a classification algorithm; due to their lower information or their higher variances and etc. So it is always a desire for all artificial intelligence communities to enforce the weighting mechanism in any task that identically uses a number of features to make a decision. But there is always a certain problem of how the features can be participated in the clustering process (in any algorithm, but especially in clustering algorithm) in a weighted manner. Recently, this problem is dealt with by locally adaptive clustering (LAC). However, like its traditional competitors the LAC suffers from inefficiency in data with imbalanced clusters. This paper solves the problem by proposing a weighted locally adaptive clustering (WLAC) algorithm that is based on the LAC algorithm. However, WLAC algorithm suffers from sensitivity to its two parameters that should be tuned manually. The performance of WLAC algorithm is affected by well-tuning of its parameters. Paper proposes two solutions. The first is based on a simple clustering ensemble framework to examine the sensitivity of the WLAC algorithm to its manual well-tuning. The second is based on cluster selection method.  相似文献   

20.
$k$-均值问题是聚类中的经典问题,亦是NP-难问题。如果允许数据点不聚类,而是支付惩罚费用,则引出带惩罚的$k$-均值问题。本文将带惩罚的$k$-均值问题从欧氏距离推广到更一般的$\mu$-相似Bregman散度,研究了带惩罚$\mu$-相似Bregman散度$k$-均值问题的初始化算法。本文给出的初始化算法,近似比与$\mu$和数据点惩罚最大值与最小值的比例$r$相关。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号