首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.  相似文献   

2.
Some problems of partitioning a finite set of points of Euclidean space into two clusters are considered. In these problems, the following criteria are minimized: (1) the sum over both clusters of the sums of squared pairwise distances between the elements of the cluster and (2) the sum of the (multiplied by the cardinalities of the clusters) sums of squared distances from the elements of the cluster to its geometric center, where the geometric center (or centroid) of a cluster is defined as the mean value of the elements in that cluster. Additionally, another problem close to (2) is considered, where the desired center of one of the clusters is given as input, while the center of the other cluster is unknown (is the variable to be optimized) as in problem (2). Two variants of the problems are analyzed, in which the cardinalities of the clusters are (1) parts of the input or (2) optimization variables. It is proved that all the considered problems are strongly NP-hard and that, in general, there is no fully polynomial-time approximation scheme for them (unless P = NP).  相似文献   

3.
We consider a strongly NP-hard problem of partitioning a finite sequence of points in Euclidean space into the two clustersminimizing the sum over both clusters of intra-cluster sums of squared distances from the clusters elements to their centers. The sizes of the clusters are fixed. The centroid of the first cluster is defined as the mean value of all vectors in the cluster, and the center of the second cluster is given in advance and equals 0. Additionally, the partition must satisfy the restriction that for all vectors in the first cluster the difference between the indices of two consequent points from this cluster is bounded from below and above by some given constants.We present a fully polynomial-time approximation scheme for the case of fixed space dimension.  相似文献   

4.
The strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters of given sizes (cardinalities) minimizing the sum (over both clusters) of the intracluster sums of squared distances from the elements of the clusters to their centers is considered. It is assumed that the center of one of the sought clusters is specified at the desired (arbitrary) point of space (without loss of generality, at the origin), while the center of the other one is unknown and determined as the mean value over all elements of this cluster. It is shown that unless P = NP, there is no fully polynomial-time approximation scheme for this problem, and such a scheme is substantiated in the case of a fixed space dimension.  相似文献   

5.
NP-completeness of two clustering (partition) problems is proved for a finite sequence of Euclidean vectors. In the optimization versions of both problems it is required to partition the elements of the sequence into a fixed number of clusters minimizing the sum of squares of the distances from the cluster elements to their centers. In the first problem the sizes of clusters are the part of input, while in the second they are unknown (they are the variables for optimization). Except for the center of one (special) cluster, the center of each cluster is the mean value of all vectors contained in it. The center of the special cluster is zero. Also, the partition must satisfy the following condition: The difference between the indices of two consecutive vectors in every nonspecial cluster is bounded below and above by two given constants.  相似文献   

6.
We consider the problem of partitioning a finite sequence of Euclidean points into a given number of clusters (subsequences) using the criterion of the minimal sum (over all clusters) of intercluster sums of squared distances from the elements of the clusters to their centers. It is assumed that the center of one of the desired clusters is at the origin, while the center of each of the other clusters is unknown and determined as the mean value over all elements in this cluster. Additionally, the partition obeys two structural constraints on the indices of sequence elements contained in the clusters with unknown centers: (1) the concatenation of the indices of elements in these clusters is an increasing sequence, and (2) the difference between an index and the preceding one is bounded above and below by prescribed constants. It is shown that this problem is strongly NP-hard. A 2-approximation algorithm is constructed that is polynomial-time for a fixed number of clusters.  相似文献   

7.
Problems of partitioning a finite set of Euclidean points (vectors) into clusters are considered. The criterion is to minimize the sum, over all clusters, of (1) squared norms of the sums of cluster elements normalized by the cardinality, (2) squared norms of the sums of cluster elements, and (3) norms of the sum of cluster elements. It is proved that all these problems are strongly NP-hard if the number of clusters is a part of the input and are NP-hard in the ordinary sense if the number of clusters is not a part of the input (is fixed). Moreover, the problems are NP-hard even in the case of dimension 1 (on a line).  相似文献   

8.
Problems of partitioning a finite set of Euclidean points (vectors) into clusters are considered. The criterion is to minimize the sum, over all clusters, of (1) squared norms of the sums of cluster elements normalized by the cardinality, (2) squared norms of the sums of cluster elements, and (3) norms of the sum of cluster elements. It is proved that all these problems are strongly NP-hard if the number of clusters is a part of the input and are NP-hard in the ordinary sense if the number of clusters is not a part of the input (is fixed). Moreover, the problems are NP-hard even in the case of dimension 1 (on a line).  相似文献   

9.
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of minimum sum-of-squares of distances from the elements of clusters to their centers. We assume that the cardinalities of the clusters are fixed. The center of one cluster has to be optimized and is defined as the average value over all vectors in this cluster. The center of the other cluster lies at the origin. The partition satisfies the condition: the difference of the indices of the next and previous vectors in the first cluster is bounded above and below by two given constants. We propose a 2-approximation polynomial algorithm to solve this problem.  相似文献   

10.
Summary DCT Given a finite set of points in an Euclidean space the \emph{spanning tree} is a tree of minimal length having the given points as vertices. The length of the tree is the sum of the distances of all connected point pairs of the tree. The clustering tree with a given length of a given finite set of points is the spanning tree of an appropriately chosen other set of points approximating the given set of points with minimal sum of square distances among all spanning trees with the given length. DCM A matrix of real numbers is said to be column monotone orderable if there exists an ordering of columns of the matrix such that all rows of the matrix become monotone after ordering. The {\emph{monotone sum of squares of a matrix}} is the minimum of sum of squares of differences of the elements of the matrix and a column monotone orderable matrix where the minimum is taken on the set of all column monotone orderable matrices. Decomposition clusters of monotone orderings of a matrix is a clustering ofthe rows of the matrix into given number of clusters such that thesum of monotone sum of squares of the matrices formed by the rowsof the same cluster is minimal.DCP A matrix of real numbers is said to be column partitionable if there exists a partition of the columns such that the elements belonging to the same subset of the partition are equal in each row. Given a partition of the columns of a matrix the partition sum of squares of the matrix is the minimum of the sum of square of differences of the elements of the matrix and a column partitionable matrix where the minimum is taken on the set of all column partitionable matrices. Decomposition of the rows of a matrix into clusters of partitions is the minimization of the corresponding partition sum of squares given the number of clusters and the sizes of the subsets of the partitions.  相似文献   

11.
Medical studies increasingly involve a large sample of independent clusters, where the cluster sizes are also large. Our motivating example from the 2010 Nationwide Inpatient Sample (NIS) has 8,001,068 patients and 1049 clusters, with average cluster size of 7627. Consistent parameter estimates can be obtained naively assuming independence, which are inefficient when the intra-cluster correlation (ICC) is high. Efficient generalized estimating equations (GEE) incorporate the ICC and sum all pairs of observations within a cluster when estimating the ICC. For the 2010 NIS, there are 92.6 billion pairs of observations, making summation of pairs computationally prohibitive. We propose a one-step GEE estimator that (1) matches the asymptotic efficiency of the fully iterated GEE; (2) uses a simpler formula to estimate the ICC that avoids summing over all pairs; and (3) completely avoids matrix multiplications and inversions. These three features make the proposed estimator much less computationally intensive, especially with large cluster sizes. A unique contribution of this article is that it expresses the GEE estimating equations incorporating the ICC as a simple sum of vectors and scalars.  相似文献   

12.
This paper proposes a scatter search-based heuristic approach to the capacitated clustering problem. In this problem, a given set of customers with known demands must be partitioned into p distinct clusters. Each cluster is specified by a customer acting as a cluster center for this cluster. The objective is to minimize the sum of distances from all cluster centers to all other customers in their cluster, such that a given capacity limit of the cluster is not exceeded and that every customer is assigned to exactly one cluster. Computational results on a set of instances from the literature indicate that the heuristic is among the best heuristics developed for this problem.  相似文献   

13.
Given a set of entities associated with points in Euclidean space, minimum sum-of-squares clustering (MSSC) consists in partitioning this set into clusters such that the sum of squared distances from each point to the centroid of its cluster is minimized. A column generation algorithm for MSSC was given by du Merle et?al. in SIAM Journal Scientific Computing 21:1485–1505. The bottleneck of that algorithm is the resolution of the auxiliary problem of finding a column with negative reduced cost. We propose a new way to solve this auxiliary problem based on geometric arguments. This greatly improves the efficiency of the whole algorithm and leads to exact solution of instances with over 2,300 entities, i.e., more than 10 times as much as previously done.  相似文献   

14.
The problem tackled in this paper is as follows: consider a set ofn interacting points in a two-dimensional space. The levels of interactions between the observations are given exogenously. It is required to cluster then observations intop groups, so that the sum of squared deviations from the cluster means is as small as possible. Further, assume that the cluster means are adjusted to reflect the interaction between the entities. (It is this latter consideration which makes the problem interesting.) A useful property of the problem is that the use of a squared distance term yields a linear system of equations for the coordinates of the cluster centroids. These equations are derived and solved repeatedly for a given set of cluster allocations. A sequential reallocation of the observations between the clusters is then performed. One possible application of this problem is to the planar hub location problem, where the interacting observations are a system of cities and the interaction effects represent the levels of flow or movement between the entities. The planar hub location problem has been limited so far to problems with fewer than 100 nodes. The use of the squared distance formulation, and a powerful supercomputer (Cray Y-MP) has enabled quick solution of large systems with 250 points and four groups. The paper includes both small illustrative examples and computational results using systems with up to 500 observations and 9 clusters.  相似文献   

15.
We study clusters of threshold exceedances caused by dependence in time series. The clusters are defined as conglomerates containing consecutive threshold exceedances of the series separated by return intervals with consecutive non-exceedances. We derive asymptotic distributions of the cluster and inter-cluster sizes for processes with the extremal index equal to zero, the asymptotic expectation of the inter-cluster size and an exponential rate of convergence of the distribution tail of the return interval between clusters to the stable distribution tail. Distributions of the cluster and inter-cluster sizes of ARMAX, MM and AR(1) processes are obtained.  相似文献   

16.
In this paper, a strongly NP-hard problem of finding a family of disjoint subsets with given cardinalities in a finite set of points from a Euclidean space is considered. Minimization of the sum over all required subsets of the sum of the squared distances from the elements of these subsets to their geometric centers is used as the search criterion. It is proved that if the coordinates of the input points are integer and the space dimension and the number of required subsets are fixed (i.e., bounded by some constants), the problem is a pseudopolynomial time solvable one.  相似文献   

17.
In the Capacitated Clustering Problem (CCP), a given set of n weighted points is to be partitioned into p clusters such that, the total weight of the points in each cluster does not exceed a given cluster capacity. The objective is to find a set of p centers that minimises total scatter of points allocated to them. In this paper a new constructive method, a general framework to improve the performance of greedy constructive heuristics, and a problem space search procedure for the CCP are proposed. The constructive heuristic finds patterns of natural subgrouping in the input data using concept of density of points. Elements of adaptive computation and periodic construction–deconstruction concepts are implemented within the constructive heuristic to develop a general framework for building efficient heuristics. The problem-space search procedure is based on perturbations of input data for which a controlled perturbation strategy, intensification and diversification strategies are developed. The implemented algorithms are compared with existing methods on a standard set of bench-marks and on new sets of large-sized instances. The results illustrate the strengths of our algorithms in terms of solution quality and computational efficiency.  相似文献   

18.
19.
Doklady Mathematics - We consider the problem of partitioning a finite set of points in Euclidean space into clusters so as to minimize the sum, over all clusters, of the intracluster sums of the...  相似文献   

20.
Doklady Mathematics - We consider the problem of clustering a finite set of N points in d-dimensional Euclidean space into two clusters minimizing the sum (over both clusters) of the intracluster...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号