首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Multivariate analysis of variance (MANOVA) extends the ideas and methods of univariate ANOVA in simple and straightforward ways. But the familiar graphical methods typically used for univariate ANOVA are inadequate for showing how measures in a multivariate response vary with each other, and how their means vary with explanatory factors. Similarly, the graphical methods commonly used in multiple regression are not widely available or used in multivariate multiple regression (MMRA). We describe a variety of graphical methods for multiple-response (MANOVA and MMRA) data aimed at understanding what is being tested in a multivariate test, and how factor/predictor effects are expressed across multiple response measures.

In particular, we describe and illustrate: (a) Data ellipses and biplots for multivariate data; (b) HE plots, showing the hypothesis and error covariance matrices for a given pair of responses, and a given effect; (c) HE plot matrices, showing all pairwise HE plots; and (d) reduced-rank analogs of HE plots, showing all observations, group means, and their relations to the response variables. All of these methods are implemented in a collection of easily used SAS macro programs.  相似文献   

2.
This article introduces graphical tools for visualizing multivariate functions, specializing to the case of visualizing multivariate density estimates. We visualize a density estimate by visualizing a series of its level sets. From each connected part of a level set a shape tree is formed. A shape tree is a tree whose nodes are associated with regions of the level set. With the help of a shape tree we define a transformation of a multivariate set to a univariate function. The shape trees are visualized with the shape plots and the location plot. By studying these plots one may identify the regions of the Euclidean space where the probability mass is concentrated. An application of shape trees to visualize the distribution of stock index returns is presented.  相似文献   

3.
Abstract

This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary.  相似文献   

4.
Generating multivariate Poisson random variables is essential in many applications, such as multi echelon supply chain systems, multi‐item/multi‐period pricing models, accident monitoring systems, etc. Current simulation methods suffer from limitations ranging from computational complexity to restrictions on the structure of the correlation matrix, and therefore are rarely used in management science. Instead, multivariate Poisson data are commonly approximated by either univariate Poisson or multivariate Normal data. However, these approximations are often not adequate in practice. In this paper, we propose a conceptually appealing correction for NORTA (NORmal To Anything) for generating multivariate Poisson data with a flexible correlation structure and rates. NORTA is based on simulating data from a multivariate Normal distribution and converting it into an arbitrary continuous distribution with a specific correlation matrix. We show that our method is both highly accurate and computationally efficient. We also show the managerial advantages of generating multivariate Poisson data over univariate Poisson or multivariate Normal data. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

5.
Metamodels are used in many disciplines to replace simulation models of complex multivariate systems. To discover metamodels ‘quality-of-fit’ for simulation, simple information returned by average-based statistics, such as root-mean-square error RMSE, are often used. The sample of points used in determining these averages is restricted in size, especially for simulation models of complex multivariate systems. Obviously, decisions made based on average values can be misleading when the sample size is not adequate, and contributions made by each individual data point in such samples need to be examined. This paper presents methods that can be used to discover metamodels quality-of-fit graphically by means of two-dimensional plots. Three plot types are presented; these are the so-called circle plots, marksman plots, and ordinal plots. Such plots can be used to facilitate visual inspection of the effect on metamodel accuracy of each individual point in the data sample used for metamodel validation. The proposed methods can be used to complement quantitative validation statistics; in particular, for situations where there is not enough validation data or the validation data is too expensive to generate.  相似文献   

6.
The forward search is a powerful general method for detecting multiple masked outliers and for determining their effect on inferences about models fitted to data. From the monitoring of a series of statistics based on subsets of data of increasing size we obtain multiple views of any hidden structure. One of the problems of the forward search has always been the lack of an automatic link among the great variety of plots which are monitored. Usually it happens that a lot of interesting features emerge unexpectedly during the progression of the forward search only when a specific combination of forward plots is inspected at the same time. Thus, the analyst should be able to interact with the plots and redefine or refine the links among them. In the absence of dynamic linking and interaction tools, the analyst risks to miss relevant hidden information. In this paper we fill this gap and provide the user with a set of new robust graphical tools whose power will be demonstrated on several regression problems. Through the analysis of real and simulated data we give a series of examples where dynamic interaction with different “robust plots” is used to highlight the presence of groups of outliers and regression mixtures and appraise the effect that these hidden groups exert on the fitted model.  相似文献   

7.
Summary  In this paper we suggest a simple graphical device for assessing multivariate normality. The method is based on the characteristic that linear combinations of the sample mean and sample covariance matrix are independent if and only if the random variable is normally distributed. We demonstrate the usage of the suggested method and compare it to the classical Q-Q plot by using some multivariate data sets.  相似文献   

8.
Temporal data are information measured in the context of time. This contextual structure provides components that need to be explored to understand the data and that can form the basis of interactions applied to the plots. In multivariate time series, we expect to see temporal dependence, long term and seasonal trends, and cross-correlations. In longitudinal data, we also expect within and between subject dependence. Time series and longitudinal data, although analyzed differently, are often plotted using similar displays. We provide a taxonomy of interactions on plots that can enable exploring temporal components of these data types, and describe how to build these interactions using data transformations. Because temporal data are often accompanied other types of data we also describe how to link the temporal plots with other displays of data. The ideas are conceptualized into a data pipeline for temporal data and implemented into the R package cranvas. This package provides many different types of interactive graphics that can be used together to explore data or diagnose a model fit.  相似文献   

9.
The search for structures in real datasets, for example, in the form of bumps, components, classes, or clusters, is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without prespecifying their total number. A number of related methods already exist, yet are challenged in the context of high-dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ? n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer microarray dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online.  相似文献   

10.
An Efficient Exact Algorithm for Constraint Bipartite Vertex Cover   总被引:2,自引:0,他引:2  
The constraint bipartite vertex cover problem (CBVC for short) is as follows: given a bipartite graph G with n vertices and two positive integers k1k2, is there a vertex cover taking at most k1 vertices from one and at most k2 vertices from the other vertex set of G? CBVC is NP-complete. It formalizes the spare allocation problem for reconfigurable arrays, an important problem from VLSI manufacturing. We provide a nontrivial so-called fixed parameter algorithm for CBVC, running in O(1.3999k1 + k2 + (k1 + k2)n) time. Our algorithm is efficient and practical for small values of k1 and k2, as occurring in applications. The analysis of the search tree is based on a novel bonus point system: after the processing of the search tree (which takes time exponential in k), a polynomial-time final analysis follows. Parts of the computation that would be normally done within the search-tree phase can be postponed; nevertheless, knowledge about the size of those parts can be used to reduce the length of the search paths (and hence the depth of the search tree as a whole) by a sort of bonus points.  相似文献   

11.
A positive basis is a minimal set of vectors whose nonnegative linear combinations span the entire space \mathbb Rn{\mathbb R^{n}}. Interest in positive bases was revived in the late nineties by the introduction and analysis of some classes of direct search optimization algorithms. It is easily shown that the cardinality of every positive basis is bounded below by n + 1. There are proofs in the literature that 2n is a valid upper bound for the cardinality, but these proofs are quite technical and require several pages. The purpose of this note is to provide a simple demonstration that relies on a fundamental property of basic feasible solutions in linear programming theory.  相似文献   

12.
Development of methods for visualisation of high-dimensional data where the number of observations, n, is small compared to the number of variables, p, is of increasing importance. One major application is the burgeoning field of microarray (gene expression) experiments. Because of their high cost, the number of chips (n) is O(10 − 102) while the number (p) of genes (including expressed sequence tags) on each chip is O(103 − 104). Based on synthetic data simulated in accord with current biological interpretation of microarray data, we have adapted the biplot that simultaneously plots the genes and the chips to display relevant experimental information. Other ordination techniques are also useful for visually exploring microarray data. The biological information that can be revealed by applying these exploratory, visual techniques is illustrated using data from gene expression experiments. When ordination methods, or dimension reduction methods such as PCA and its many variants, are used, in association with gene selection methods, it is well known that “selection bias” can result. We show an application of bootstrap methodology to ordination methods that can be used to account for this bias. Such methods are invaluable when visualization methods are used for pattern recognition, such as when identifying previously unknown sub-classes of tumours in molecular classification. A colour version of the paper is available at: DOI:. The sample numbers shown on the plots can also be used for identifying the different classes if a colour version is not available. The sample numbers for the ALL B-cells are 1, 4, 5, 7, 8, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, and 27 respectively. Those for the ALL T-Cells are 2, 3, 6, 9, 10, 11, 14 and 23, and for the AML the samples are 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38.  相似文献   

13.
Abstract

This article proposes an algorithm for generating over-dispersed and under-dispersed binomial variates with specified mean and variance. The over-dispersed/under-dispersed distributions are derived from correlated binary variables with an underlying continuous multivariate distribution. Different multivariate distributions or different correlation matrices result in different over-dispersed (or under-dispersed) distributions. The over-dispersed binomial distributions that are generated from three different correlation matrices of a multivariate normal are compared with the beta-binomial distribution for various mean and over-dispersion parameters by quantile-quantile (Q-Q) plots. The two distributions appear to be similar. The under-dispersed binomial distribution is simulated to model an example data set that exhibits under-dispersed binomial variation.  相似文献   

14.
This paper presents a framework for analyzing and comparing sub-optimal performance of local search algorithms for hard discrete optimization problems. The β-acceptable solution probability is introduced that captures how effectively an algorithm has performed to date and how effectively an algorithm can be expected to perform in the future. Using this probability, the necessary conditions for a local search algorithm to converge in probability to β-acceptable solutions are derived. To evaluate and compare the effectiveness of local search algorithms, two estimators for the expected number of iterations to visit a β-acceptable solution are obtained. Computational experiments are reported with simulated annealing and tabu search applied to four small traveling salesman problem instances, and the Lin-Kernighan-Helsgaun algorithm applied to eight medium to large traveling salesman problem instances (all with known optimal solutions), to illustrate the application of these estimators.  相似文献   

15.
This research is partially a continuation of a 2007 paper by the author. Growth estimates for generalized logarithmic derivatives of Blaschke products are provided under the assumption that the zero sequences are either uniformly separated or exponential. Such Blaschke products are known as interpolating Blaschke products. The growth estimates are then proven to be sharp in a rather strong sense. The sharpness discussion yields a solution to an open problem posed by E. Fricain and J. Mashreghi in 2008. Finally, several aspects are pointed out to illustrate that interpolating Blaschke products appear naturally in studying the oscillation of solutions of a differential equation f″+A(z)f=0, where A(z) is analytic in the unit disc. In particular, a unit disc analogue of a 1988 result due to S. Bank on prescribed zero sequences for entire solutions is obtained, and a more careful analysis of a 1955 example due to B. Schwarz on the case A(z)=\frac1+4g2(1-z2)2A(z)=\frac{1+4\gamma^{2}}{(1-z^{2})^{2}} reveals that an infinite zero sequence is always a union of two exponential sequences.  相似文献   

16.
A weighted multivariate signed-rank test is introduced for an analysis of multivariate clustered data. Observations in different clusters may then get different weights. The test provides a robust and efficient alternative to normal theory based methods. Asymptotic theory is developed to find the approximate p-value as well as to calculate the limiting Pitman efficiency of the test. A conditionally distribution-free version of the test is also discussed. The finite-sample behavior of different versions of the test statistic is explored by simulations and the new test is compared to the unweighted and weighted versions of Hotelling’s T2 test and the multivariate spatial sign test introduced in [D. Larocque, J. Nevalainen, H. Oja, A weighted multivariate sign test for cluster-correlated data, Biometrika 94 (2007) 267-283]. Finally, a real data example is used to illustrate the theory.  相似文献   

17.
An extension of univariate quantiles in the multivariate set-up has been proposed and studied. The proposed approach is affine equivariant, and it is based on an adaptive transformation retransformation procedure. Behadur type linear representations of the proposed quantiles are established and consequently asymptotic distributions are also derived. As applications of these multivariate quantiles, we develop some affine equivariant quantile contour plots which can be used to study the geometry of the data cloud as well as the underlying probability distribution and to detect outliers. These quantiles can also be used to construct affine invariant versions of multivariate Q-Q plots which are useful in checking how well a given multivariate probability distribution fits the data and for comparing the distributions of two data sets. We illustrate these applications with some simulated and real data sets. We also indicate a way of extending the notion of univariate L-estimates and trimmed means in the multivariate set-up using these affine equivariant quantiles.  相似文献   

18.
We investigate the relationships between smooth and strongly smooth points of the unit ball of an order continuous symmetric function space E, and of the unit ball of the space of τ-measurable operators E(M,t){E(\mathcal{M},\tau)} associated to a semifinite von Neumann algebra (M, t){(\mathcal{M}, \tau)}. We prove that x is a smooth point of the unit ball in E(M, t){E(\mathcal{M}, \tau)} if and only if the decreasing rearrangement μ(x) of the operator x is a smooth point of the unit ball in E, and either μ(∞; f) = 0, for the function f ? SE×{f\in S_{E^{\times}}} supporting μ(x), or s(x *) = 1. Under the assumption that the trace τ on M{\mathcal{M}} is σ-finite, we show that x is strongly smooth point of the unit ball in E(M, t){E(\mathcal{M}, \tau)} if and only if its decreasing rearrangement μ(x) is a strongly smooth point of the unit ball in E. Consequently, for a symmetric function space E, we obtain corresponding relations between smoothness or strong smoothness of the function f and its decreasing rearrangement μ(f). Finally, under suitable assumptions, we state results relating the global properties such as smoothness and Fréchet smoothness of the spaces E and E(M,t){E(\mathcal{M},\tau)}.  相似文献   

19.
Many graphical methods for displaying multivariate data consist of arrangements of multiple displays of one or two variables; scatterplot matrices and parallel coordinates plots are two such methods. In principle these methods generalize to arbitrary numbers of variables but become difficult to interpret for even moderate numbers of variables. This article demonstrates that the impact of high dimensions is much less severe when the component displays are clustered together according to some index of merit. Effectively, this clustering reduces the dimensionality and makes interpretation easier. For scatterplot matrices and parallel coordinates plots clustering of component displays is achieved by finding suitable permutations of the variables. I discuss algorithms based on cluster analysis for finding permutations, and present examples using various indices of merit.  相似文献   

20.
The paper considers a problem of packing the maximal number of congruent nD hyperspheres of given radius into a larger nD hypersphere of given radius where n = 2, 3, . . . , 24. Solving the problem is reduced to solving a sequence of packing subproblems provided that radii of hyperspheres are variable. Mathematical models of the subproblems are constructed. Characteristics of the mathematical models are investigated. On the ground of the characteristics we offer a solution approach. For n ≤ 3 starting points are generated either in accordance with the lattice packing of circles and spheres or in a random way. For n > 3 starting points are generated in a random way. A procedure of perturbation of lattice packings is applied to improve convergence. We use the Zoutendijk feasible direction method to search for local maxima of the subproblems. To compute an approximation to a global maximum of the problem we realize a non-exhaustive search of local maxima. Our results are compared with the benchmark results for n = 2. A number of numerical results for 2 ≤ n ≤ 24 are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号