首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到12条相似文献,搜索用时 0 毫秒
1.
针对采用经典划分思想的聚类算法以一个点来代表类的局限,提出一种基于泛化中心的分类属性数据聚类算法。该算法通过定义包含多个点的泛化中心来代表类,能够体现出类的数据分布特征,并进一步提出泛化中心距离及类间距离度量的新方法,给出泛化中心的确定方法及基于泛化中心进行对象到类分配的聚类策略,一般只需一次划分迭代就能得到最终聚类结果。将泛化中心算法应用到四个基准数据集,并与著名的划分聚类算法K-modes及其两种改进算法进行比较,结果表明泛化中心算法聚类正确率更高,迭代次数更少,是有效可行的。  相似文献   

2.
Estimating the number of clusters is one of the most difficult problems in cluster analysis. Most previous approaches require knowing the data matrix and may not work when only a Euclidean distance matrix is available. Other approaches also suffer from the curse of dimensionality and work poorly in high dimension. In this article, we develop a new statistic, called the GUD statistic, based on the idea of the Gap method, but use the determinant of the pooled within-group scatter matrix instead of the within-cluster sum of squared distances. Some theory is developed to show this statistic can work well when only the Euclidean distance matrix is known. More generally, this statistic can even work for any dissimilarity matrix that satisfies some properties. We also propose a modification for high-dimensional datasets, called the R-GUD statistic, which can give a robust estimation in high-dimensional settings. The simulation shows our method needs less information but is generally found to be more accurate and robust than other methods considered in the study, especially in many difficult settings.  相似文献   

3.
针对多指标面板数据的样品分类和历史时期划分问题,从多元统计分析理论角度提出一个多指标面板数据的融合聚类分析方法。该方法改进了多指标面板数据的因子分析和系统聚类方法,依据Fisher有序聚类理论,构造了Frobenius范数形式的离差平方和函数,提出了多指标面板数据的有序聚类方法。实证结果表明,该方法能够满足系统分析的统一性要求,保证指标之间的不相关;能够克服时间维度上均值处理造成的偏误,信息损失较少;能够解决面板数据有序聚类的问题;弥补了单一分析的片面性和局限性。  相似文献   

4.
Abstract

We present an efficient algorithm for generating exact permutational distributions for linear rank statistics defined on stratified 2 × c contingency tables. The algorithm can compute exact p values and confidence intervals for a rich class of nonparametric problems. These include exact p values for stratified two-population Wilcoxon, Logrank, and Van der Waerden tests, exact p values for stratified tests of trend across several binomial populations, exact p values for stratified permutation tests with arbitrary scores, and exact confidence intervals for odds ratios embedded in stratified 2 × c tables. The algorithm uses network-based recursions to generate stratum-specific distributions and then combines them into an overall permutation distribution by convolution. Where only the tail area of a permutation distribution is desired, additional efficiency gains are achieved by backward induction and branch-and-bound processing of the network. The algorithm is especially efficient for highly imbalanced categorical data, a situation where the asymptotic theory is unreliable. The backward induction component of the algorithm can also be used to evaluate the conditional maximum likelihood, and its higher order derivatives, for the logistic regression model with grouped data. We illustrate the techniques with an analysis of two data sets: The leukemia data on survivors of the Hiroshima atomic bomb and data from an animal toxicology experiment provided by the U.S. Food and Drug Administration.  相似文献   

5.
Abstract

This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary.  相似文献   

6.
多指标面板数据的聚类分析及其应用   总被引:8,自引:0,他引:8  
多指标面板数据的多元统计分析在国内研究中尚属空白.本文分析了面板数据的数据格式和数字特征,根据聚类分析原理,重新构造了多指标面板数据的距离函数和离差平方和函数,在此基础上,说明了多指标面板数据的聚类分析过程.最后对我国各地区工业企业生产效率进行了聚类实证分析,显示了良好的效果。  相似文献   

7.
Point clouds are one of the most primitive and fundamental manifold representations. Popular sources of point clouds are three-dimensional shape acquisition devices such as laser range scanners. Another important field where point clouds are found is in the representation of high-dimensional manifolds by samples. With the increasing popularity and very broad applications of this source of data, it is natural and important to work directly with this representation, without having to go through the intermediate and sometimes impossible and distorting steps of surface reconstruction. A geometric framework for comparing manifolds given by point clouds is presented in this paper. The underlying theory is based on Gromov-Hausdorff distances, leading to isometry invariant and completely geometric comparisons. This theory is embedded in a probabilistic setting as derived from random sampling of manifolds, and then combined with results on matrices of pairwise geodesic distances to lead to a computational implementation of the framework. The theoretical and computational results presented here are complemented with experiments for real three-dimensional shapes.  相似文献   

8.
探讨了聚类分析这一重要的数据挖掘方法在综合评价中的应用,将模糊聚类与综合评价相结合以解决待评价方案数较多的排序问题,并且文中还改进了建立模糊相似矩阵的方法.  相似文献   

9.
从两路数据聚类分析到三路数据聚类分析实质上是由平面分析到立体分析的过程。三路数据聚类方法研究的核心之一是如何把传统的两路截面数据聚类技术向三路数据聚类扩展的问题。本文基于Tucker模型的思路,提出一种先对三路数据执行矩阵分解,而后进行聚类分析的三路数据聚类方法。这种方法不但能够通过核心矩阵反映三路数据三个模式信息联系的强度大小,而且还可以在一个分解框架下对三路数据的三个模式同时进行聚类分析。实证分析结果表明,本文提出的聚类方法不但灵活、易于理解,同时也有着良好的判别性和实用性。  相似文献   

10.
区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。  相似文献   

11.
为提高具有先验知识样本的学习效率,本文在吸引子传播聚类模型基础上,引入半监督学习策略,并综合考虑样本动态信息变化,融合多指标面板数据,提出智能信息处理的多指标面板数据聚类模型。选取30家房地产业上市公司2009-2013年相关财务数据,利用此模型进行聚类和绩效评价分析。结果表明,智能信息处理的多指标面板数据聚类模型能更加有效地区分样本类别特征,可为上市公司绩效评价、金融管理与决策提供一个更加有效的方法和手段。  相似文献   

12.
To extract information from high-dimensional data efficiently, visualization tools based on data projection methods have been developed and shown useful. However, a single two-dimensional visualization is often insufficient for capturing all or most interesting structures in complex high-dimensional datasets. For this reason, Tipping and Bishop developed mixture probabilistic principal component analysis (MPPCA) that separates data into multiple groups and enables a unique projection per group; that is, one probabilistic principal component analysis (PPCA) data visualization per group. Because the group labels are assigned to observations based on their high-dimensional coordinates, MPPCA works well to reveal homoscedastic structures in data that differ spatially. In the presence of heteroscedasticity, however, MPPCA may still mask noteworthy data structures. We propose a new method called covariance-guided MPPCA (C-MPPCA) that groups subsets of observations based on covariance, not locality, and, similar to MPPCA, displays them using PPCA. PPCA projects data in the dimensions with the highest variances, thus grouping by covariance makes sense and enables some data structures to be visible that were masked originally by MPPCA. We demonstrate the performance of C-MPPCA in an extensive simulation study. We also apply C-MPPCA to a real world dataset. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号