首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
针对传统DBSCAN算法对高维数据集聚类效果不佳且参数的选取敏感问题,提出一种新的基于相似性度量的改进DBSCAN算法.该算法构造了测地距离和共享最近邻的数据点之间的相似度矩阵,克服欧式距离对高维数据的局限性,更好地刻画数据集的真实情况.通过分析数据的分布特征来自适应确定Eps和MinPts参数.实验结果表明,所提GS-DBSCAN算法能够有效地对复杂分布的数据进行聚类,且在高维数据的聚类准确率高于对比算法,验证了算法的准确性和可行性.  相似文献   

2.
系统聚类分析中应注意的两类问题   总被引:2,自引:0,他引:2  
给出了选用九种相似性度量,用最短距离法聚类,结果互不相同的一个有趣的例子。对该例,用欧氏距离求出距离矩阵后,除用最短距离法聚类结果唯一外,用最长距离法、重心法、类平均法、离差平方和法聚类,结果均不唯一。  相似文献   

3.
The nonnegative rank of a nonnegative matrix is the minimum number of nonnegative rank-one factors needed to reconstruct it exactly. The problem of determining this rank and computing the corresponding nonnegative factors is difficult; however it has many potential applications, e.g., in data mining and graph theory. In particular, it can be used to characterize the minimal size of any extended reformulation of a given polytope. In this paper, we introduce and study a related quantity, called the restricted nonnegative rank. We show that computing this quantity is equivalent to a problem in computational geometry, and fully characterize its computational complexity. This in turn sheds new light on the nonnegative rank problem, and in particular allows us to provide new improved lower bounds based on its geometric interpretation. We apply these results to slack matrices and linear Euclidean distance matrices and obtain counter-examples to two conjectures of Beasley and Laffey, namely we show that the nonnegative rank of linear Euclidean distance matrices is not necessarily equal to their dimension, and that the rank of a matrix is not always greater than the nonnegative rank of its square.  相似文献   

4.
马氏距离聚类分析中协方差矩阵估算的改进   总被引:1,自引:0,他引:1  
本文考虑了变量权重和样本类别的影响,建立了马氏距离聚类过程中评估协方差矩阵的迭代法。以Fisher的iris数据为样本,运用欧氏距离一般聚类、主成分聚类、改进前后的马氏距离聚类方法,进行实证分析和比较,结果表明本文所提出的新方法准确率至少提高了6.63%。最后,运用该方法对35个国家的相关指标数据进行聚类分析,确定了各国的卫生保健状况等级。  相似文献   

5.
一种基于相关系数矩阵的TOPSIS决策方法   总被引:1,自引:0,他引:1  
在多属性决策分析中,传统的TOPSIS法是基于欧氏距离来计算各方案到正负理想点的距离,但欧氏距离没有考虑各属性之间的相关性;从这一角度出发,将相关系数矩阵与欧式距离结合,从而弥补了欧氏距离的不足,最后进行了实例分析.  相似文献   

6.
The identification of reduced-order models from high-dimensional data is a challenging task, and even more so if the identified system should not only be suitable for a certain data set, but generally approximate the input-output behavior of the data source. In this work, we consider the input-output dynamic mode decomposition method for system identification. We compare excitation approaches for the data-driven identification process and describe an optimization-based stabilization strategy for the identified systems.  相似文献   

7.
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the importance of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through \(\ell _{1}\) -type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented.  相似文献   

8.
Euclidean distance embedding appears in many high-profile applications including wireless sensor network localization, where not all pairwise distances among sensors are known or accurate. The classical Multi-Dimensional Scaling (cMDS) generally works well when the partial or contaminated Euclidean Distance Matrix (EDM) is close to the true EDM, but otherwise performs poorly. A natural step preceding cMDS would be to calculate the nearest EDM to the known matrix. A crucial condition on the desired nearest EDM is for it to have a low embedding dimension and this makes the problem nonconvex. There exists a large body of publications that deal with this problem. Some try to solve the problem directly and some are the type of convex relaxations of it. In this paper, we propose a numerical method that aims to solve this problem directly. Our method is strongly motivated by the majorized penalty method of Gao and Sun for low-rank positive semi-definite matrix optimization problems. The basic geometric object in our study is the set of EDMs having a low embedding dimension. We establish a zero duality gap result between the problem and its Lagrangian dual problem, which also motivates the majorization approach adopted. Numerical results show that the method works well for the Euclidean embedding of Network coordinate systems and for a class of problems in large scale sensor network localization and molecular conformation.  相似文献   

9.
We present a proof of the theorem which states that a matrix of Euclidean distances on a set of specially distributed random points in the n-dimensional Euclidean space R n converges in probability to an ultrametric matrix as n → ∞. Values of the elements of an ultrametric distance matrix are completely determined by variances of coordinates of random points. Also we present a probabilistic algorithm for generation of finite ultrametric structures of any topology in high-dimensional Euclidean space. Validity of the algorithm is demonstrated by explicit calculations of distance matrices and ultrametricity indexes for various dimensions n.  相似文献   

10.
现有一类分类算法通常采用经典欧氏测度描述样本间相似关系,然而欧氏测度不能较好地反映一些数据集样本的内在分布结构,从而影响这些方法对数据的描述能力.提出一种用于改善一类分类器描述性能的高维空间一类数据距离测度学习算法,与已有距离测度学习算法相比,该算法只需提供目标类数据,通过引入样本先验分布正则化项和L1范数惩罚的距离测度稀疏性约束,能有效解决高维空间小样本情况下的一类数据距离测度学习问题,并通过采用分块协调下降算法高效的解决距离测度学习的优化问题.学习的距离测度能容易的嵌入到一类分类器中,仿真实验结果表明采用学习的距离测度能有效改善一类分类器的描述性能,特别能够改善SVDD的描述能力,从而使得一类分类器具有更强的推广能力.  相似文献   

11.
Nearest neighbour classification requires a good distance metric. Previous approaches try to learn a quadratic distance metric learning so that observations of different classes are well separated. For high-dimensional problems, where many uninformative variables are present, it is attractive to select a sparse distance metric, both to increase predictive accuracy but also to aid interpretation of the result. We investigate the \(\ell 1\) -regularized metric learning problem, making a connection with the Lasso algorithm in the linear least squared settings. We show that the fitted transformation matrix is close to the desired transformation matrix in \(\ell 1\) -norm by assuming a version of the compatibility condition.  相似文献   

12.
Ding  Chao  Qi  Hou-Duo 《Mathematical Programming》2017,164(1-2):341-381

Classical multidimensional scaling only works well when the noisy distances observed in a high dimensional space can be faithfully represented by Euclidean distances in a low dimensional space. Advanced models such as Maximum Variance Unfolding (MVU) and Minimum Volume Embedding (MVE) use Semi-Definite Programming (SDP) to reconstruct such faithful representations. While those SDP models are capable of producing high quality configuration numerically, they suffer two major drawbacks. One is that there exist no theoretically guaranteed bounds on the quality of the configuration. The other is that they are slow in computation when the data points are beyond moderate size. In this paper, we propose a convex optimization model of Euclidean distance matrices. We establish a non-asymptotic error bound for the random graph model with sub-Gaussian noise, and prove that our model produces a matrix estimator of high accuracy when the order of the uniform sample size is roughly the degree of freedom of a low-rank matrix up to a logarithmic factor. Our results partially explain why MVU and MVE often work well. Moreover, the convex optimization model can be efficiently solved by a recently proposed 3-block alternating direction method of multipliers. Numerical experiments show that the model can produce configurations of high quality on large data points that the SDP approach would struggle to cope with.

  相似文献   

13.
The Euclidean distance matrix (EDM) completion problem and the positive semidefinite (PSD) matrix completion problem are considered in this paper. Approaches to determine the location of a point in a linear manifold are studied, which are based on a referential coordinate set and a distance vector whose components indicate the distances from the point to other points in the set. For a given referential coordinate set and a corresponding distance vector, sufficient and necessary conditions are presented for the existence of such a point that the distance vector can be realized. The location of the point (if it exists) given by the approaches in a linear manifold is independent of the coordinate system, and is only related to the referential coordinate set and the corresponding distance vector. An interesting phenomenon about the complexity of the EDM completion problem is described. Some properties about the uniqueness and the rigidity of the conformation for solutions to the EDM and PSD completion problems are presented.  相似文献   

14.
Pairwise comparison matrices are commonly used for setting priorities among competing objects. In a leading decision making method called the analytic hierarchy process the principal right eigenvector components represent the weights of the alternatives. The direct least-squares method extracts the weight vector by first finding a rank-one matrix which minimizes the Euclidean distance from the original ratio matrix. We develop a recursive least-squares algorithm and reveal a striking correspondence between these two approaches for these matrices. The recursion applies for merely positive matrices also. We prove that a convergent iteration leads to matrices by which the Perron-eigenvectors and the Perron approximation of the original matrix may be produced. We show that certain useful properties of the recursion advance the development of reliable measures of perturbations of transitive matrices. Numerical analysis is included for a macroeconomic problem taken from the literature.  相似文献   

15.
In this paper,distributed estimation of high-dimensional sparse precision matrix is proposed based on the debiased D-trace loss penalized lasso and the hard threshold method when samples are distributed into different machines for transelliptical graphical models.At a certain level of sparseness,this method not only achieves the correct selection of non-zero elements of sparse precision matrix,but the error rate can be comparable to the estimator in a non-distributed setting.The numerical results further prove that the proposed distributed method is more effective than the usual average method.  相似文献   

16.
A proof for the positive definiteness of the Jaccard index matrix   总被引:1,自引:0,他引:1  
In this paper we provide a proof for the positive definiteness of the Jaccard index matrix used as a weighting matrix in the Euclidean distance between belief functions defined in Jousselme et al. [13]. The idea of this proof relies on the decomposition of the matrix into an infinite sum of positive semidefinite matrices. The proof is valid for any size of the frame of discernment but we provide an illustration for a frame of three elements. The Jaccard index matrix being positive definite guaranties that the associated Euclidean distance is a full metric and thus that a null distance between two belief functions implies that these belief functions are strictly identical.  相似文献   

17.
Emil Horobeţ 《代数通讯》2017,45(3):1177-1186
The generic number of critical points of the Euclidean distance function from a data point to a variety is called the Euclidean distance degree (or ED degree). The two special loci of the data points where the number of critical points is smaller than the ED degree are called the Euclidean distance data singular locus and the Euclidean distance data isotropic locus. In this article, we present connections between these two special loci of an a?ne cone and its dual cone.  相似文献   

18.
Given a partial symmetric matrix A with only certain elements specified, the Euclidean distance matrix completion problem (EDMCP) is to find the unspecified elements of A that make A a Euclidean distance matrix (EDM). In this paper, we follow the successful approach in [20] and solve the EDMCP by generalizing the completion problem to allow for approximate completions. In particular, we introduce a primal-dual interior-point algorithm that solves an equivalent (quadratic objective function) semidefinite programming problem (SDP). Numerical results are included which illustrate the efficiency and robustness of our approach. Our randomly generated problems consistently resulted in low dimensional solutions when no completion existed.  相似文献   

19.
The Weiszfeld algorithm for continuous location problems can be considered as an iteratively reweighted least squares method. It generally exhibits linear convergence. In this paper, a Newton algorithm with similar simplicity is proposed to solve a continuous multifacility location problem with the Euclidean distance measure. Similar to the Weiszfeld algorithm, the main computation can be solving a weighted least squares problem at each iteration. A Cholesky factorization of a symmetric positive definite band matrix, typically with a small band width (e.g., a band width of two for a Euclidean location problem on a plane) is performed. This new algorithm can be regarded as a Newton acceleration to the Weiszfeld algorithm with fast global and local convergence. The simplicity and efficiency of the proposed algorithm makes it particularly suitable for large-scale Euclidean location problems and parallel implementation. Computational experience suggests that the proposed algorithm often performs well in the absence of the linear independence or strict complementarity assumption. In addition, the proposed algorithm is proven to be globally convergent under similar assumptions for the Weiszfeld algorithm. Although local convergence analysis is still under investigation, computation results suggest that it is typically superlinearly convergent.  相似文献   

20.
The Euclidean distance degree of a real variety is an important invariant arising in distance minimization problems. We show that the Euclidean distance degree of an orthogonally invariant matrix variety equals the Euclidean distance degree of its restriction to diagonal matrices. We illustrate how this result can greatly simplify calculations in concrete circumstances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号