共查询到12条相似文献,搜索用时 0 毫秒
1.
2.
Jiyao Kou 《Journal of computational and graphical statistics》2013,22(2):403-417
Estimating the number of clusters is one of the most difficult problems in cluster analysis. Most previous approaches require knowing the data matrix and may not work when only a Euclidean distance matrix is available. Other approaches also suffer from the curse of dimensionality and work poorly in high dimension. In this article, we develop a new statistic, called the GUD statistic, based on the idea of the Gap method, but use the determinant of the pooled within-group scatter matrix instead of the within-cluster sum of squared distances. Some theory is developed to show this statistic can work well when only the Euclidean distance matrix is known. More generally, this statistic can even work for any dissimilarity matrix that satisfies some properties. We also propose a modification for high-dimensional datasets, called the R-GUD statistic, which can give a robust estimation in high-dimensional settings. The simulation shows our method needs less information but is generally found to be more accurate and robust than other methods considered in the study, especially in many difficult settings. 相似文献
3.
针对多指标面板数据的样品分类和历史时期划分问题,从多元统计分析理论角度提出一个多指标面板数据的融合聚类分析方法。该方法改进了多指标面板数据的因子分析和系统聚类方法,依据Fisher有序聚类理论,构造了Frobenius范数形式的离差平方和函数,提出了多指标面板数据的有序聚类方法。实证结果表明,该方法能够满足系统分析的统一性要求,保证指标之间的不相关;能够克服时间维度上均值处理造成的偏误,信息损失较少;能够解决面板数据有序聚类的问题;弥补了单一分析的片面性和局限性。 相似文献
4.
Cyrus R. Mehta Nitin Patel Pralay Senchaudhuri 《Journal of computational and graphical statistics》2013,22(1):21-40
Abstract We present an efficient algorithm for generating exact permutational distributions for linear rank statistics defined on stratified 2 × c contingency tables. The algorithm can compute exact p values and confidence intervals for a rich class of nonparametric problems. These include exact p values for stratified two-population Wilcoxon, Logrank, and Van der Waerden tests, exact p values for stratified tests of trend across several binomial populations, exact p values for stratified permutation tests with arbitrary scores, and exact confidence intervals for odds ratios embedded in stratified 2 × c tables. The algorithm uses network-based recursions to generate stratum-specific distributions and then combines them into an overall permutation distribution by convolution. Where only the tail area of a permutation distribution is desired, additional efficiency gains are achieved by backward induction and branch-and-bound processing of the network. The algorithm is especially efficient for highly imbalanced categorical data, a situation where the asymptotic theory is unreliable. The backward induction component of the algorithm can also be used to evaluate the conditional maximum likelihood, and its higher order derivatives, for the logistic regression model with grouped data. We illustrate the techniques with an analysis of two data sets: The leukemia data on survivors of the Hiroshima atomic bomb and data from an animal toxicology experiment provided by the U.S. Food and Drug Administration. 相似文献
5.
Michael Friendly 《Journal of computational and graphical statistics》2013,22(3):373-395
Abstract This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary. 相似文献
6.
多指标面板数据的聚类分析及其应用 总被引:8,自引:0,他引:8
多指标面板数据的多元统计分析在国内研究中尚属空白.本文分析了面板数据的数据格式和数字特征,根据聚类分析原理,重新构造了多指标面板数据的距离函数和离差平方和函数,在此基础上,说明了多指标面板数据的聚类分析过程.最后对我国各地区工业企业生产效率进行了聚类实证分析,显示了良好的效果。 相似文献
7.
A Theoretical and Computational Framework for Isometry Invariant Recognition of Point Cloud Data 总被引:1,自引:0,他引:1
Point clouds are one of the most primitive and fundamental manifold
representations. Popular sources of point clouds are three-dimensional
shape acquisition devices such as laser range scanners. Another
important field where point clouds are found is in the representation
of high-dimensional manifolds by samples. With the increasing
popularity and very broad applications of this source
of data, it is natural and important to work directly with this
representation, without having to go through the intermediate and
sometimes impossible and distorting steps of surface reconstruction.
A geometric framework for comparing manifolds given by point clouds
is presented in this paper. The underlying theory is based on
Gromov-Hausdorff distances, leading to isometry invariant and
completely geometric comparisons. This theory is embedded in a
probabilistic setting as derived from random sampling of manifolds,
and then combined with results on matrices of pairwise geodesic distances
to lead to a computational implementation of the framework. The theoretical and
computational results presented here are complemented with
experiments for real three-dimensional shapes. 相似文献
8.
刘伟 《数学的实践与认识》2006,36(11):88-92
探讨了聚类分析这一重要的数据挖掘方法在综合评价中的应用,将模糊聚类与综合评价相结合以解决待评价方案数较多的排序问题,并且文中还改进了建立模糊相似矩阵的方法. 相似文献
9.
从两路数据聚类分析到三路数据聚类分析实质上是由平面分析到立体分析的过程。三路数据聚类方法研究的核心之一是如何把传统的两路截面数据聚类技术向三路数据聚类扩展的问题。本文基于Tucker模型的思路,提出一种先对三路数据执行矩阵分解,而后进行聚类分析的三路数据聚类方法。这种方法不但能够通过核心矩阵反映三路数据三个模式信息联系的强度大小,而且还可以在一个分解框架下对三路数据的三个模式同时进行聚类分析。实证分析结果表明,本文提出的聚类方法不但灵活、易于理解,同时也有着良好的判别性和实用性。 相似文献
10.
区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。 相似文献
11.
12.
Chao Han Scotland Leman Leanna House 《Journal of computational and graphical statistics》2013,22(1):66-83
To extract information from high-dimensional data efficiently, visualization tools based on data projection methods have been developed and shown useful. However, a single two-dimensional visualization is often insufficient for capturing all or most interesting structures in complex high-dimensional datasets. For this reason, Tipping and Bishop developed mixture probabilistic principal component analysis (MPPCA) that separates data into multiple groups and enables a unique projection per group; that is, one probabilistic principal component analysis (PPCA) data visualization per group. Because the group labels are assigned to observations based on their high-dimensional coordinates, MPPCA works well to reveal homoscedastic structures in data that differ spatially. In the presence of heteroscedasticity, however, MPPCA may still mask noteworthy data structures. We propose a new method called covariance-guided MPPCA (C-MPPCA) that groups subsets of observations based on covariance, not locality, and, similar to MPPCA, displays them using PPCA. PPCA projects data in the dimensions with the highest variances, thus grouping by covariance makes sense and enables some data structures to be visible that were masked originally by MPPCA. We demonstrate the performance of C-MPPCA in an extensive simulation study. We also apply C-MPPCA to a real world dataset. Supplementary materials for this article are available online. 相似文献