共查询到16条相似文献,搜索用时 46 毫秒
1.
区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。 相似文献
2.
《数理统计与管理》2014,(4):634-641
基于Hausdorff距离用于定义两个紧集之间距离的考虑,将区间数视为一个紧集,定义了区间数之间的距离,并研究了区间向量的距离,从而得到聚类分析中两个样品间的距离。进一步定义了两个类之间的Hausdorff距离。为消除量纲对聚类结果的影响,研究了区间数据的标准化。基于此,给出了区间数据系统聚类算法。采用随机模拟的方法,对文中方法进行有效性评价,结论表明,Hausdorff距离法的聚类有效性在所有设计的实验条件下都要优于传统的欧式距离法。最后,基于符号数据分析的思想构造区间数据,给出了对多种动物群体按其身高、体重等生理特征进行聚类分析的算例。 相似文献
3.
对区间型符号数据进行特征选择,可以降低数据的维数,提取数据的关键特征。针对区间型符号数据的特征选择问题,本文提出了一种新的特征选择方法。首先,该方法使用区间数Hausdorff距离和区间数欧氏距离度量区间数的相似性,通过建立使得样本点与样本类中心相似性最大的优化模型来估计区间型符号数据的特征权重。其次,基于特征权重构建相应的分类器来评价所估计特征权重的优劣。最后,为了验证本文方法的有效性,分别在人工生成数据集和真实数据集上进行了数值实验,数值实验结果表明,本文方法可以有效地去除无关特征,识别出与类标号有关的特征。 相似文献
4.
目前模糊技术已经应用于许多智能系统,如模糊关系与模糊聚类.聚类是数据挖掘的重要任务,它将数据对像分成多个聚类,在同一个聚类中,对象的属性特征之间具有较高的相似度,有很大研究及应用价值.结合数据库中的挖掘技术,对属性特征为区间数的多属性决策问题,提出了一种基于区间数隶属度的区间模糊ISODATA动态聚类方法. 相似文献
5.
一种基于区间数多指标信息的FCM聚类算法 总被引:2,自引:0,他引:2
针对一类具有不确定性区间数多指标信息的聚类分析问题,依据传统的基于数值信息的FCM聚类算法的思路,提出了一种新的聚类分析算法。章首先描述了具有区间数多指标信息的聚类分析问题;其次给出了基于区间数多指标信息的关于最优划分和最优聚类中心确定的两个定理;然后给出了基于区间数多指标信息的FCM聚类算法的计算步骤。该算法的特点是聚类中心的表现形式为精确的数值,给出的两个定理说明了该聚类算法的收敛性。最后,通过给出一个算例说明了本给出的聚类算法。 相似文献
6.
二维有序样本的有约束系统聚类 总被引:4,自引:0,他引:4
二维有序样本进行聚类必须满足两个要求:(1)类内各单元的相似性和类间的差异性;(2)各单元在位置上的有序性和类内的连通性。根据这些要求,将各单元观测指标间的距离矩阵作为聚类的指示矩将各单元之间的区位联系矩阵作为聚类的约束矩阵,在约束矩阵给出的约束条件之下,以类间单元指标的最大距离作为类间相似性指标,在指示矩阵中通过逐步聚并而将全部单元合并归类,即可得出满足要求的样本分类。 相似文献
7.
8.
符号数据分析是一种新兴的数据挖掘技术,区间数是最常用的一种符号数据。研究应用区间型符号数据的PCA方法来评价股票的市场综合表现问题。首先介绍了符号数据分析的基本理论。接下来研究了区间数据样本的经验描述统计量的计算,并基于经验相关矩阵,给出了区间主成分分析的算法,该算法最终得到区间数表达形式的主成分取值。最后选取上海证券交易市场20支股票在某一周上的交易数据,进行了实证研究,基于区间主成分得分的矩形图表示,将20支股票按其市场综合表现分成了四类。 相似文献
9.
系统聚类递推公式推广 总被引:1,自引:0,他引:1
本文推广了系统聚类法的最短距离法、最长距离法,平均距离法,中国距离法,重心法,类平均法、离差平方和法、可变法、可变类平均法的递推公式,并给出了统一计算公式。 相似文献
10.
11.
Chandra Sekhar Pedamallu Linet Özdamar Tibor Csendes 《Journal of Global Optimization》2007,37(2):177-194
In bound constrained global optimization problems, partitioning methods utilizing Interval Arithmetic are powerful techniques
that produce reliable results. Subdivision direction selection is a major component of partitioning algorithms and it plays
an important role in convergence speed. Here, we propose a new subdivision direction selection scheme that uses symbolic computing
in interpreting interval arithmetic operations. We call this approach symbolic interval inference approach (SIIA). SIIA targets
the reduction of interval bounds of pending boxes directly by identifying the major impact variables and re-partitioning them
in the next iteration. This approach speeds up the interval partitioning algorithm (IPA) because it targets the pending status
of sibling boxes produced. The proposed SIIA enables multi-section of two major impact variables at a time. The efficiency
of SIIA is illustrated on well-known bound constrained test functions and compared with established subdivision direction
selection methods from the literature. 相似文献
12.
《Journal of computational and graphical statistics》2013,22(3):464-486
In recent years, hierarchical model-based clustering has provided promising results in a variety of applications. However, its use with large datasets has been hindered by a time and memory complexity that are at least quadratic in the number of observations. To overcome this difficulty, this article proposes to start the hierarchical agglomeration from an efficient classification of the data in many classes rather than from the usual set of singleton clusters. This initial partition is derived from a subgraph of the minimum spanning tree associated with the data. To this end, we develop graphical tools that assess the presence of clusters in the data and uncover observations difficult to classify. We use this approach to analyze two large, real datasets: a multiband MRI image of the human brain and data on global precipitation climatology. We use the real datasets to discuss ways of integrating the spatial information in the clustering analysis. We focus on two-stage methods, in which a second stage of processing using established methods is applied to the output from the algorithm presented in this article, viewed as a first stage. 相似文献
13.
Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms 总被引:1,自引:0,他引:1
A. P. Reynolds G. Richards B. de la Iglesia V. J. Rayward-Smith 《Journal of Mathematical Modelling and Algorithms》2006,5(4):475-504
Previous research has resulted in a number of different algorithms for rule discovery. Two approaches discussed here, the ‘all-rules’ algorithm and multi-objective metaheuristics, both result in the production of a large number of partial classification rules, or ‘nuggets’, for describing different subsets of the records in the class of interest. This paper describes the application of a number of different clustering algorithms to these rules, in order to identify similar rules and to better understand the data. 相似文献
14.
林建伟 《数学的实践与认识》2014,(21)
在公司资产价值演化服从具有一般跳幅度分布的跳扩散模型下,采用结构化方法研究具有无限到期日公司债券的定价问题,通过微分方程的方法和无套利原理获得了公司债券,股东权益和公司总价值的定价表达式以及最佳违约边界的表达式. 相似文献
15.
Vladik Kreinovich Eric J. Pauwels Scott A. Ferson Lev Ginzburg 《Numerical Algorithms》2004,37(1-4):225-232
Often, we need to divide n objects into clusters based on the value of a certain quantity x. For example, we can classify insects in the cotton field into groups based on their size and other geometric characteristics. Within each cluster, we usually have a unimodal distribution of x, with a probability density ρ(x) that increases until a certain value x 0 and then decreases. It is therefore natural, based on ρ(x), to determine a cluster as the interval between two local minima, i.e., as a union of adjacent increasing and decreasing segments. In this paper, we describe a feasible algorithm for solving this problem. 相似文献
16.
《Journal of computational and graphical statistics》2013,22(4):788-806
Many graphical methods for displaying multivariate data consist of arrangements of multiple displays of one or two variables; scatterplot matrices and parallel coordinates plots are two such methods. In principle these methods generalize to arbitrary numbers of variables but become difficult to interpret for even moderate numbers of variables. This article demonstrates that the impact of high dimensions is much less severe when the component displays are clustered together according to some index of merit. Effectively, this clustering reduces the dimensionality and makes interpretation easier. For scatterplot matrices and parallel coordinates plots clustering of component displays is achieved by finding suitable permutations of the variables. I discuss algorithms based on cluster analysis for finding permutations, and present examples using various indices of merit. 相似文献