首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 339 毫秒
1.
系统聚类分析中应注意的两类问题   总被引:2,自引:0,他引:2  
给出了选用九种相似性度量,用最短距离法聚类,结果互不相同的一个有趣的例子。对该例,用欧氏距离求出距离矩阵后,除用最短距离法聚类结果唯一外,用最长距离法、重心法、类平均法、离差平方和法聚类,结果均不唯一。  相似文献   

2.
聚类结果的相似性比较是聚类分析中有着重要意义但理论又很不完善的分支,本文首先在文献[1]给出的聚类结果的相似性指标B_k的基础上,提出了一个改进的指标B_(k1),从理论上和用蒙特卡罗摸拟的结果阐明了这个改进的指标既保持了B_k的相似性意义,又弥补了B_k的不足。  相似文献   

3.
本文研究了在指标是无穷大时欧式空间情形下Sobolev函数类理论和指标是有限常数时度量空间下Sobolev类Banach空间值函数理论.利用Banach空间理论和位势理论的方法,得到了在指标是无穷大时度量测度空间中Sobolev类Banach空间值函数的各种刻画,进而比较了该Sobolev类与对应的Lipschitz类和Hajlasz-Sobolev类.所获结果推广了欧式空间和度量测度空间中Sobolev函数类相应的结论.  相似文献   

4.
针对采用经典划分思想的聚类算法以一个点来代表类的局限,提出一种基于泛化中心的分类属性数据聚类算法。该算法通过定义包含多个点的泛化中心来代表类,能够体现出类的数据分布特征,并进一步提出泛化中心距离及类间距离度量的新方法,给出泛化中心的确定方法及基于泛化中心进行对象到类分配的聚类策略,一般只需一次划分迭代就能得到最终聚类结果。将泛化中心算法应用到四个基准数据集,并与著名的划分聚类算法K-modes及其两种改进算法进行比较,结果表明泛化中心算法聚类正确率更高,迭代次数更少,是有效可行的。  相似文献   

5.
针对传统的谱聚类算法不适合处理多尺度问题,引入一种新的相似性度量—密度敏感的相似性度量,该度量可以放大不同高密度区域内数据点间距离,缩短同一高密度区域内数据点间距离,最终有效描述数据的实际聚类分布.本文引入特征间隙的概念,给出一种自动确定聚类数目的方法.数值实验验证本文所提的算法的可行性和有效性.  相似文献   

6.
聚类分析是研究对样品或指标进行综合分类的一种多元统计分析方法.聚类结果常表现为树状图的形式.如何合理确定聚类的个数,一直是一个比较困难的问题,至今没有很好的解决方案,尤其当样本量较大时,树状图层次较多,很难直观确定聚类个数.介绍一种基于贝叶斯理论的聚类方法,通过对后验似然最大化的原则确定最佳聚类个数和方案,避免了聚类个数选择的主观性.一个已知分类情况的实际数据验证了该聚类方法的有效性.  相似文献   

7.
《数理统计与管理》2019,(3):450-459
时间序列数据的聚类是对面板数据或多维时间序列根据序列相似度进行分组。聚在同一组的时间序列具有相近的模型参数,尤其是当序列较短时聚类后能够得到更精确的参数估计。现存的时间序列聚类方法的距离度量大都基于时间序列的线性假设,但是现实中时间序列通常是非线性的。本文提出了一种基于Copula距离测度的非线性时间序列数据的聚类方法,它利用了Copula函数获取时间序列的非线性相依结构。作为一种非参数的距离度量,基于Copula函数的距离度量能够识别动态相关结构的相似性。大量的模拟实验和实证研究验证了我们所提方法的有效性。  相似文献   

8.
区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。  相似文献   

9.
双基本图形的度量方程及其应用尤秀英,杨池(广东机械学院)(上海市普陀区业余大学)l超平面的夹角余弦设e是P中一个m维超平面,彦一(51,··J,元.*是与e垂直且各向量之间相互垂直的单位向量组,则称g为e的一个法向量组,指定一类法向量组(法向量组中任...  相似文献   

10.
将模糊聚类最大矩阵元原理与基于数据迭代为基础的水质模糊评价理论模型相结合,形成模糊聚类迭代方法.并用该方法对甘肃金昌市地下水质进行了分类评价,得到了今人满意的结果.  相似文献   

11.
This article proposes a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classification problem, in which we must also estimate the “true” class labels. The resulting “prediction strength” measure assesses how many groups can be predicted from the data, and how well. In the process, we develop novel notions of bias and variance for unlabeled data. Prediction strength performs well in simulation studies, and we apply it to clusters of breast cancer samples from a DNA microarray study. Finally, some consistency properties of the method are established.  相似文献   

12.
针对现有直觉模糊相似度量存在的不足,文中提出了考虑带倾向性相似度的直觉模糊相似度量公式,通过构建相似矩阵和等价相似矩阵以及截矩阵,给出新颖的直觉模糊集聚类方法,对物流中心选址问题的应用分析,验证了聚类方法的合理性与有效性。  相似文献   

13.
Similarity measures of type-2 fuzzy sets are used to indicate the similarity degree between type-2 fuzzy sets. Inclusion measures for type-2 fuzzy sets are the degrees to which a type-2 fuzzy set is a subset of another type-2 fuzzy set. The entropy of type-2 fuzzy sets is the measure of fuzziness between type-2 fuzzy sets. Although several similarity, inclusion and entropy measures for type-2 fuzzy sets have been proposed in the literatures, no one has considered the use of the Sugeno integral to define those for type-2 fuzzy sets. In this paper, new similarity, inclusion and entropy measure formulas between type-2 fuzzy sets based on the Sugeno integral are proposed. Several examples are used to present the calculation and to compare these proposed measures with several existing methods for type-2 fuzzy sets. Numerical results show that the proposed measures are more reasonable than existing measures. On the other hand, measuring the similarity between type-2 fuzzy sets is important in clustering for type-2 fuzzy data. We finally use the proposed similarity measure with a robust clustering method for clustering the patterns of type-2 fuzzy sets.  相似文献   

14.
The proportion exponent is introduced as a measure of the validity of the clustering obtained for a data set using a fuzzy clustering algorithm. It is assumed that the output of an algorithm includes a fuzzy nembership function for each data point. We show how to compute the proportion of possible memberships whose maximum entry exceeds the maximum entry of a given membership function, and use these proportions to define the proportion exponent. Its use as a validity functional is illustrated with four numerical examples and its effectiveness compared to other validity functionals, namely, classification entropy and partition coefficient.  相似文献   

15.
The goal of clustering is to detect the presence of distinct groups in a dataset and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes and assign each observation to the domain of attraction of a mode. The modal structure of a density is summarized by its cluster tree; modes of the density correspond to leaves of the cluster tree. Estimating the cluster tree is the primary goal of nonparametric cluster analysis. We adopt a plug-in approach to cluster tree estimation: estimate the cluster tree of the feature density by the cluster tree of a density estimate. For some density estimates the cluster tree can be computed exactly; for others we have to be content with an approximation. We present a graph-based method that can approximate the cluster tree of any density estimate. Density estimates tend to have spurious modes caused by sampling variability, leading to spurious branches in the graph cluster tree. We propose excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent. Excess mass can be used as a guide for pruning the graph cluster tree. We point out mathematical and algorithmic connections to single linkage clustering and illustrate our approach on several examples. Supplemental materials for the article, including an R package implementing generalized single linkage clustering, all datasets used in the examples, and R code producing the figures and numerical results, are available online.  相似文献   

16.
Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.  相似文献   

17.
Cluster analysis is used in various scientific and applied fields and is a topical subject of research. In contrast to the existing methods, the algorithms offered in this paper are intended for clustering objects described by feature vectors in a space in which the symmetry axiom is not satisfied. In this case, the clustering problem is solved using an asymmetric proximity measure. The essence of the first of the proposed clustering algorithms consists in sequential generation of clusters with simultaneous transfer of the objects clustered from previously created clusters into a current cluster if this reduces the quality criterion. In comparison with the existing algorithms of non-hierarchical clustering, such an approach to cluster generation makes it possible to reduce the computational costs. The second algorithmis a modified version of the first one andmakes it possible to reassign the main objects of clusters to further decrease the value of the proposed quality criterion.  相似文献   

18.
话题发现是网络社交平台上进行热点话题预测的一个重要研究问题。针对已有话题发现算法大多基于传统余弦相似度衡量文本数据间的相似性,无法识别各维度取值成比例变化时数据对象间的差异,文本数据相似度计算结果不准确,影响话题发现正确率的问题,提出基于双向改进余弦相似度的话题发现算法(TABOC),首先从方向和取值两个角度改进余弦相似度,提出双向改进余弦相似度,能够区分各维度取值成比例变化的数据对象,保留传统余弦相似度在方向判别上的优势,提高衡量文本相似度的准确性;进一步定义集合的双向改进余弦特征向量和双向改进余弦特征向量的加法等相关定义定理,舍弃无关信息,直接计算新合并集合的特征向量,减小话题发现过程中的时间和空间消耗;还结合增量聚类框架,高效处理新增数据。采用百度贴吧数据进行实验表明,TABOC算法进行话题发现是有效可行的,算法正确率和时间效率总体上优于其他对比算法。  相似文献   

19.
与一般相似度函数相关的谱聚类的收敛性   总被引:1,自引:0,他引:1       下载免费PDF全文
谱聚类算法由与相似度函数相关的图Laplace 算子的特征函数产生. 本文证明与一般相似度函数相关的谱聚类算法的收敛性, 并使用覆盖数方法对收敛性给出量化估计. 当相似度函数是欧氏空间子集上一个Lipschitz s > 0 函数时, O(√log(n + 1)/√n) 形式的收敛率得到证实. 我们同时指出一个相应函数集的覆盖数的增长可以表现任意差.  相似文献   

20.
在解决模糊多属性决策问题中,相似度是一种有效的方法.针对已有的相似度的不足,构造了一种新的两个矢量之间的相似度,证明其满足相似度的性质,并把它应用解决直觉梯形模糊偏好多属性决策问题.方法用语言值的直觉梯形模糊数来表示决策方案的信息,通过计算每个决策方案的期望矢量,与正理想方案和负理想方案的期望矢量的相对相似度,并由相对相似度大小来排列决策方案.最后用一案例来讨论方法的可行性,数值结果表明方法计算简单,实用性强.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号