期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Julio?César?Hernández-Sánchez Email author José?Luis?Vicente-Villardón 《Advances in Data Analysis and Classification》2017,11(2):307-326

Classical biplot methods allow for the simultaneous representation of individuals (rows) and variables (columns) of a data matrix. For binary data, logistic biplots have been recently developed. When data are nominal, both classical and binary logistic biplots are not adequate and techniques such as multiple correspondence analysis (MCA), latent trait analysis (LTA) or item response theory (IRT) for nominal items should be used instead. In this paper we extend the binary logistic biplot to nominal data. The resulting method is termed “nominal logistic biplot”(NLB), although the variables are represented as convex prediction regions rather than vectors. Using the methods from computational geometry, the set of prediction regions is converted to a set of points in such a way that the prediction for each individual is established by its closest “category point”. Then interpretation is based on distances rather than on projections. We study the geometry of such a representation and construct computational algorithms for the estimation of parameters and the calculation of prediction regions. Nominal logistic biplots extend both MCA and LTA in the sense that they give a graphical representation for LTA similar to the one obtained in MCA. 相似文献

2.

The Distance Perspective of Generalized Biadditive Models: Scalings and Transformations

《Journal of computational and graphical statistics》2013,22(1):210-227

Recently two articles studied scalings in biplot models, and concluded that these have little impact on the interpretation. In this article again scalings are studied for generalized biadditive models and correspondence analysis, that is, special cases of the general biplot family, but from a different perspective. The generalized biadditive models, but also correspondence analysis, are often used for Gaussian ordination. In Gaussian ordination one takes a distance perspective for the interpretation of the relationship between a row and a column category. It is shown that scalings—but also nonsingular transformations—have a major impact on this interpretation. So, depending on the perspective one takes, the inner product or distance perspective, scalings and transformations do have (distance) or do not have (inner-product) impact on the interpretation. If one is willing to go along with the assumption of the author that diagrams are in practice often interpreted by a distance rule, the findings in this article influence all biplot models. 相似文献

3.

The Interval-Censored Biplot

Silvia Cecere Patrick J. F. Groenen Emmanuel Lesaffre 《Journal of computational and graphical statistics》2013,22(1):123-134

The principal components biplot is a useful visualization tool for the exploration of a samples by variables data matrix. In several data analysis situations, the data values are interval censored so that only the interval of a data value is available, but not the value itself. For such data, we propose the interval-censored biplot (IC-Biplot), a new exploratory and graphical method that is an extension of the principal component analysis biplot. It provides not only a two-dimensional graphic representation of respondents and their attributes, but also point estimates for the data values that are constrained to be in their interval. Two applications of the IC-Biplot are discussed. The first application considers data on emergence times of permanent teeth focusing on the pattern of emergence. The IC-Biplot confirms rank orders suggested earlier in the literature. Goodness-of-fit measures show that the present model seems to fit these data very well. The second application discusses a regular sample by attribute matrix from the literature on characteristics of several types of oils. This article has online supplementary materials. 相似文献

4.

Biplots of fuzzy coded data

Zerrin A?an Michael Greenacre 《Fuzzy Sets and Systems》2011,183(1):57-71

A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure-of-fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data. 相似文献

5.

因子双重信息图在识别城市重金属污染中的应用

赵慧琴石立林海明《数学的实践与认识》2021,(1):162-169

对国外流行的Beozecri对应分析法,这里用变量型数据阵指出该方法很大程度改变了数据阵的特征,不能达到对应分析目的,以致不能解决问题.为此,这里用因子双重信息图解决问题,通过比较,因子双重信息图优良地图示了数据阵中:变量之间、样品之间、样品与变量之间的关系,达到了对应分析目的,方法直接且简便,因子双重信息图较适应变量型数据阵这类问题的对应分析. 相似文献

6.

Multiple taxicab correspondence analysis 总被引：1，自引：0，他引：1

V. Choulakian 《Advances in Data Analysis and Classification》2008,2(2):177-206

We compare the statistical analysis of multidimensional contingency tables by multiple correspondence analysis (MCA) and multiple taxicab correspondence analysis (MTCA). We will show in this paper: First, MTCA and MCA can produce different results. Second, taxicab correspondence analysis of a Burt table is equivalent to centroid correspondence analysis of the indicator matrix. Third, along the first principal axis, the projected response patterns in MTCA will be clustered and the number of cluster points is less than or equal to 1+ the number of variables. Fourth, visual maps produced by MTCA seem to be clearer and more readable in the presence of rarely occurring categories of the variables than the graphical displays produced by MCA. Two well known data sets are analyzed. 相似文献

7.

Visualization of Multivariate Density Estimates With Level Set Trees

《Journal of computational and graphical statistics》2013,22(3):599-620

This article presents a method for visualization of multivariate functions. The method is based on a tree structure—called the level set tree—built from separated parts of level sets of a function. The method is applied for visualization of estimates of multivarate density functions. With different graphical representations of level set trees we may visualize the number and location of modes, excess masses associated with the modes, and certain shape characteristics of the estimate. Simulation examples are presented where projecting data to two dimensions does not help to reveal the modes of the density, but with the help of level set trees one may detect the modes. I argue that level set trees provide a useful method for exploratory data analysis. 相似文献

8.

Diagnostic Plots for Robust Multivariate Methods

《Journal of computational and graphical statistics》2013,22(2):310-329

Robust techniques for multivariate statistical methods—such as principal component analysis, canonical correlation analysis, and factor analysis—have been recently constructed. In contrast to the classical approach, these robust techniques are able to resist the effect of outliers. However, there does not yet exist a graphical tool to identify in a comprehensive way the data points that do not obey the model assumptions. Our goal is to construct such graphics based on empirical influence functions. These graphics not only detect the influential points but also classify the observations according to their robust distances. In this way the observations are divided into four different classes which are regular points, nonoutlying influential points, influential outliers, and noninfluential outliers. We thus gain additional insight in the data by detecting different types of deviating observations. Some real data examples will be given to show how these plots can be used in practice. 相似文献

9.

Orthogonal rotation in PCAMIX

Marie Chavent Vanessa Kuentz-Simonet Jér?me Saracco 《Advances in Data Analysis and Classification》2012,6(2):131-146

Kiers (Psychometrika 56:197–212, 1991) considered the orthogonal rotation in PCAMIX, a principal component method for a mixture of qualitative and quantitative variables. PCAMIX includes the ordinary principal component analysis and multiple correspondence analysis (MCA) as special cases. In this paper, we give a new presentation of PCAMIX where the principal components and the squared loadings are obtained from a Singular Value Decomposition. The loadings of the quantitative variables and the principal coordinates of the categories of the qualitative variables are also obtained directly. In this context, we propose a computationally efficient procedure for varimax rotation in PCAMIX and a direct solution for the optimal angle of rotation. A simulation study shows the good computational behavior of the proposed algorithm. An application on a real data set illustrates the interest of using rotation in MCA. All source codes are available in the R package “PCAmixdata”. 相似文献

10.

Precision Matrix Estimation by Inverse Principal Orthogonal Decomposition

下载免费PDF全文

Cheng Yong Tang Yingying Fan Yinfei Kong 《数学研究通讯：英文版》2020,36(1):68-92

We investigate the structure of a large precision matrix in Gaussian graphical models by decomposing it into a low rank component and a remainder part with sparse precision matrix.Based on the decomposition,we propose to estimate the large precision matrix by inverting a principal orthogonal decomposition(IPOD).The IPOD approach has appealing practical interpretations in conditional graphical models given the low rank component,and it connects to Gaussian graphical models with latent variables.Specifically,we show that the low rank component in the decomposition of the large precision matrix can be viewed as the contribution from the latent variables in a Gaussian graphical model.Compared with existing approaches for latent variable graphical models,the IPOD is conveniently feasible in practice where only inverting a low-dimensional matrix is required.To identify the number of latent variables,which is an objective of its own interest,we investigate and justify an approach by examining the ratios of adjacent eigenvalues of the sample covariance matrix?Theoretical properties,numerical examples,and a real data application demonstrate the merits of the IPOD approach in its convenience,performance,and interpretability. 相似文献

11.

地区恶性肿瘤死亡率的对应分析 总被引：1，自引：0，他引：1

罗盛马峻岭陈景武《数理统计与管理》2009,28(3)

目的—了解山东省某县2000-2002年恶性肿瘤的地区分布和肿瘤类型分布特征.方法—应用分组对应分析对该县恶性肿瘤死亡资料进行分析.结果—得到各地区和各肿瘤类型的公因子及其负荷系数,并根据第一、二因子负荷系数绘制因子负荷平面图,可以清楚看出恶性肿瘤死亡率的聚集性及其高发地与低发地的分布.结论—将变量与样本结合起来的对应分析是对因子分析的有益补充,它可以分析二维数据阵的行因素与列因素之关系,达到研究目的. 相似文献

12.

Variable selection in multivariate methods using global score estimation

Kaoru Fueda Masaya Iizuka Yuichi Mori 《Computational Statistics》2009,24(1):127-144

A variable selection method using global score estimation is proposed, which is applicable as a selection criterion in any multivariate method without external variables such as principal component analysis, factor analysis and correspondence analysis. This method selects a subset of variables by which we approximate the original global scores as much as possible in the context of least squares, where the global scores, e.g. principal component scores, factor scores and individual scores, are computed based on the selected variables. Global scores are usually orthogonal. Therefore, the estimated global scores should be restricted to being mutually orthogonal. According to how to satisfy that restriction, we propose three computational steps to estimate the scores. Example data is analyzed to demonstrate the performance and usefulness of the proposed method, in which the proposed algorithm is evaluated and the results obtained using four cost-saving selection procedures are compared. This example shows that combining these steps and procedures yields more accurate results quickly. 相似文献

13.

The exploratory analysis of qualitative variables by means of three-way analysis of two types of quantification matrices

Henk A. L. Kiers 《商业与工业应用随机模型》1993,9(4):301-317

A comparison is made between a number of techniques for the exploratory analysis of qualitative variables. The paper mainly focuses on a comparison between multiple correspondence analysis (MCA) and Gower's principal co-ordinates analysis (PCO), applied to qualitative variables. The main difference between these methods is in how they deal with infrequent categories. It is demonstrated that MCA solutions can be dominated by infrequent categories, and that, especially in such cases, PCO is a useful alternative to MCA, because it tends to downweight the influence of infrequent categories. Apart from studying the difference between MCA and PCO, other alternatives for the analysis of qualitative variables are discussed, and compared to MCA and PCO. 相似文献

14.

Extending Mosaic Displays: Marginal,Conditional, and Partial Views of Categorical Data

Michael Friendly 《Journal of computational and graphical statistics》2013,22(3):373-395

Abstract

This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary. 相似文献

15.

A graphical method for a class of Branin trajectories

N. Anderson G. R. Walsh 《Journal of Optimization Theory and Applications》1986,49(3):367-374

The construction of Branin trajectories for locating the stationary points of a scalar function of many variables involves, in the general case, the numerical solution of a set of simultaneous ordinary differential equations, or some equivalent numerical procedure. For a function of only two variables which is separable in either the multiplicative or additive sense, it is shown that Branin trajectories may be obtained by a graphical method due to Volterra.The authors are indebted to Dr. L. C. W. Dixon for his helpful comments on the original draft of this paper. 相似文献

16.

A parametric <Emphasis Type="Italic">k</Emphasis>-means algorithm

Tarpey T 《Computational Statistics》2007,22(1):71-89

The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm. 相似文献

17.

A New Graphical Tool for Copula Selection

Frederik Michiels Ann De Schepper 《Journal of computational and graphical statistics》2013,22(2):471-493

The selection of copulas is an important aspect of dependence modeling. In many practical applications, only a limited number of copulas is tested, and the modeling applications usually are restricted to the bivariate case. One explanation is the fact that no graphical copula tool exists that allows us to assess the goodness-of-fit of a large set of (possible higher-dimensional) copula functions at once. This article seeks to overcome this problem by developing a new graphical tool for the copula selection, based on a statistical analysis technique called “principal coordinate analysis.” The advantage is three-fold. First, when projecting the empirical copula of a modeling application on a two-dimensional (2D) copula space, it allows us to visualize the fit of a whole collection of multivariate copulas at once. Second, the visual tool allows us to identify “search” directions for potential fit improvements (e.g., through the use of copula transforms). Finally, the tool makes it also possible to give a 2D visual overview of a large number of known copula families, leading to a better understanding and a more efficient use of the different copula families. The robustness of the new graphical tool is investigated by means of a small simulation study, and the practical use of the tool is demonstrated for two 2D and two 3D (three-dimensional) fitting examples. MATLAB code through the examples is available online in the supplementary materials. 相似文献

18.

对台湾工农业产品产量的主成分分析

吴晓磊叶鹰《应用数学》2002,(Z1)

主成分析分析法是一种将多个指标化为少数几个不相关的综合指标 (即主成分 )的多元统计分析方法 .本文通过运用主成分方法对我国台湾地区 1 989 1 996工农业主要指标的原始数据的处理分析 ,表明主成分分析确是在实用中很可行的一种常用的统计方法 . 相似文献

19.

稳健性因子分析在城镇居民家庭现金消费支出中的应用

司圣音张应应《数学的实践与认识》2014,(20)

利用面向对象的稳健性因子分析R软件包Robustfa,对2011年全国除港、澳、湾以外的31个省、市、自治区的城镇居民家庭现金消费支出的8个指标进行了因子分析.通过使残差矩阵的元素平方和达到最小,发现了一个组合一主因子法与稳健性Mve估计量.通过由稳健性Mve估计量计算的马氏距离大于临界值,我们发现共有10个异常点.用经典估计量和稳健性Mve估计量计算的样本相关阵、旋转后的因子载荷矩阵、因子对原始变量的贡献、贡献率、累积贡献率、样本相关阵的特征值的碎石图、前两个因子得分的散点图、因子得分、按因子得分排序等结果均有较大的不同.最后通过组合主因子法与稳健性Mve估计量将8个指标归结为两个因子:基础消费因子和消费倾向因子,根据每个省份的两个因子得分情况对该省份的家庭现金消费支出情况作出综合评价,并根据稳健性因子分析的结果给出了相应建议. 相似文献

20.

Correspondence and canonical analysis of relational data

Stanley Wasserman Katherine Faust Joseph Galaskiewicz 《The Journal of mathematical sociology》2013,37(1):11-64

Correspondence analysis, a data analytic technique used to study two‐way cross‐classifications, is applied to social relational data. Such data are frequently termed “sociometric” or “network” data. The method allows one to model forms of relational data and types of empirical relationships not easily analyzed using either standard social network methods or common scaling or clustering techniques. In particular, correspondence analysis allows one to model:

—two‐mode networks (rows and columns of a sociomatrix refer to different objects)

—valued relations (e.g. counts, ratings, or frequencies).

In general, the technique provides scale values for row and column units, visual presentation of relationships among rows and columns, and criteria for assessing “dimensionality” or graphical complexity of the data and goodness‐of‐fit to particular models. Correspondence analysis has recently been the subject of research by Goodman, Haberman, and Gilula, who have termed their approach to the problem “canonical analysis” to reflect its similarity to canonical correlation analysis of continuous multivariate data. This generalization links the technique to more standard categorical data analysis models, and provides a much‐needed statistical justificatioa

We review both correspondence and canonical analysis, and present these ideas by analyzing relational data on the 1980 monetary donations from corporations to nonprofit organizations in the Minneapolis St. Paul metropolitan area. We also show how these techniques are related to dyadic independence models, first introduced by Holland, Leinhardt, Fienberg, and Wasserman in the early 1980's. The highlight of this paper is the relationship between correspondence and canonical analysis, and these dyadic independence models, which are designed specifically for relational data. The paper concludes with a discussion of this relationship, and some data analyses that illustrate the fart that correspondence analysis models can be used as approximate dyadic independence models. 相似文献