首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Abstract

This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary.  相似文献   

2.
Abstract

A method of statistical graphics consists of two parts: a selection of statistical information to be displayed and a selection of a visual display method to encode the information. Some display methods lead to efficient, accurate visual decoding of encoded information, and others lead to inefficient, inaccurate decoding. It is only through rigorous studies of visual decoding that informed judgments can be made about how to choose display methods. A model has been developed to provide a framework for the study of visual decoding. The model consists of three parts: (1) a two-way classification of information on displays—quantitative-scale, quantitative-physical, categorical-scale, and categorical-physical; (2) a division of the visual processing of graphical displays into pattern perception and table look-up; (3) a specification of visual operations that are employed to carry out pattern perception and table look-up. Display methods are assessed by studying the visual operations to which they lead. Studies use the theory and experimental technique of various areas of vision research including psychophysics, cognitive psychology, and computational vision. This process is illustrated by studies of three display methods: visual reference grids for graphs with juxtaposed panels and common scales, encoding a categorical variable on a scatterplot by the type of plotting symbol, and choosing the aspect ratio of a factor-response graph.  相似文献   

3.
Abstract

Graphical selection of data views is a fundamental task in interactive statistical graphics. Linked plots provide a form of indirect selection, where direct manipulation of objects displayed in one plot indirectly selects objects in other plots. A pointing device or brush is typically used for direct manipulation, and so this indirect selection is commonly known as linked brushing. Most commonly, linked brushing is applied to two or more scatterplots showing various pairs of variables from a multivariable dataset. This article describes a generalization of linked brushing for the setting where plots display different, though related datasets. With this form of linking, we can graphically explore relationships between datasets. Our linking system is extensible and handles any kind of display of any kind of dataset, as well as arbitrary relationships between those datasets.  相似文献   

4.
Abstract

XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system (GIS) Arc View?, and the interactive multidimensional scaling program XGvis. XGobi is implemented in the X Window System? for portability as well as the ability to run across a network.  相似文献   

5.
Abstract

There are many examples of text data bases, including literary corpora and computer source code, in which statistics are associated with each line. A visualization technique for this class of data represents the text lines as thin colored rows within columns. The position, length, and indentation of each row corresponds to that of the text. The color of each row is determined by a statistic associated with each line. The display looks like a miniature picture of the text with the color showing the spatial distribution of the statistic within the text. Using this technique, SeeSoft?, a dynamic graphics software tool, can easily display 50,000 lines of text simultaneously on a high-resolution monitor.  相似文献   

6.
Abstract

This article describes constructing interactive and dynamic linked data views using the Java programming language. The data views are designed for data that have a multivariate component. The approach to displaying data comes from earlier research on building statistical graphics based on data pipelines, in which different aspects of data processing and graphical rendering are organized conceptually into segments of a pipeline. The software design takes advantage of the object-oriented nature of the Java language to open up the data pipeline, allowing developers to have greater control over their visualization applications. Importantly, new types of data views coded to adhere to a few simple design requirements can easily be integrated with existing pipe sections. This allows access to sophisticated linking and dynamic interaction across all (new and existing) view types. Pipe segments can be accessed from data analysis packages such as Omegahat or R, providing a tight coupling of visual and numerical methods.  相似文献   

7.
Abstract

Statistical software systems include modules for manipulating data sets, model fitting, and graphics. Because plots display data, and models are fit to data, both the model-fitting and graphics modules depend on the data. Today's statistical environments allow the analyst to choose or even build a suitable data structure for storing the data and to implement new kinds of plots. The multiplicity problem caused by many plot varieties and many data representations is avoided by constructing a plot-data interface. The interface is a convention by which plots communicate with data sets, allowing plots to be independent of the actual data representation. This article describes the components of such a plot-data interface. The same strategy may be used to deal with the dependence of model-fitting procedures on data.  相似文献   

8.
9.
Abstract

Hidden Markov models (HMM) can be applied to the study of time varying unobserved categorical variables for which only indirect measurements are available. An S-Plus module to fit HMMs in continuous time to this type of longitudinal data is presented. Covariates affecting the transition intensities of the hidden Markov process or the conditional distribution of the measured response (given the hidden states of the process) are handled under a generalized regression framework. Users can provide C subroutines specifying the parameterization of the model to adapt the software to a wide variety of data types. HMM analysis using the S-Plus module is illustrated on a dataset from a prospective study of human papillomavirus infection in young women and on simulated data.  相似文献   

10.
This article develops a generalization of the scatterplot matrix based on the recognition that most datasets include both categorical and quantitative information. Traditional grids of scatterplots often obscure important features of the data when one or more variables are categorical but coded as numerical. The generalized pairs plot offers a range of displays of paired combinations of categorical and quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two categorical variables. A side-by-side boxplot, stripplot, faceted histogram, or density plot helps visualize a categorical and a quantitative variable. A traditional scatterplot is suitable for displaying a pair of numerical variables, but options also support density contours or annotating summary statistics such as the correlation and number of missing values, for example. By combining these, the generalized pairs plot may help to reveal structure in multivariate data that otherwise might go unnoticed in the process of exploratory data analysis. Two different R packages provide implementations of the generalized pairs plot, gpairs and GGally. Supplementary materials for this article are available online on the journal web site.  相似文献   

11.
Interactive graphics provide a very important tool that facilitates the process of exploratory data and model analysis which is a crucial step in real-world applied statistics. Only a very limited set of software exists that provides truly interactive graphics for data analysis, partially because it is not easy to implement. Very often specialized software is created to offer graphics for a particular problem, but many fundamental plots are omitted since it is not considered new research. In this paper we discuss a general framework that allows to create interactive graphics software on a sound foundation that offers consistent user interface, fast prototyping of new plots and extensibility to support interactive models. In addition, we also discuss one implementation of the general framework: iPlots eXtreme—next-generation interactive graphics for analysis of large data in R. It provides most fundamental plot types and allows new interactive plots to be created. The implementation raises interactive graphics performance to an entirely new level. We will discuss briefly several methods that allowed us to achieve this goal and illustrate the use of advanced programmability features in conjunction with R.  相似文献   

12.
In ridge regression and related shrinkage methods, the ridge trace plot, a plot of estimated coefficients against a shrinkage parameter, is a common graphical adjunct to help determine a favorable trade-off of bias against precision (inverse variance) of the estimates. However, standard unidimensional versions of this plot are ill-suited for this purpose because they show only bias directly and ignore the multidimensional nature of the problem.

A generalized version of the ridge trace plot is introduced, showing covariance ellipsoids in parameter space, whose centers show bias and whose size and shape show variance and covariance, respectively, in relation to the criteria for which these methods were developed. These provide a direct visualization of both bias and precision. Even two-dimensional bivariate versions of this plot show interesting features not revealed in the standard univariate version. Low-rank versions of this plot, based on an orthogonal transformation of predictor space extend these ideas to larger numbers of predictor variables, by focusing on the dimensions in the space of predictors that are likely to be most informative about the nature of bias and precision. Two well-known datasets are used to illustrate these graphical methods. The genridge package for R implements computation and display.  相似文献   

13.
Metamodels are used in many disciplines to replace simulation models of complex multivariate systems. To discover metamodels ‘quality-of-fit’ for simulation, simple information returned by average-based statistics, such as root-mean-square error RMSE, are often used. The sample of points used in determining these averages is restricted in size, especially for simulation models of complex multivariate systems. Obviously, decisions made based on average values can be misleading when the sample size is not adequate, and contributions made by each individual data point in such samples need to be examined. This paper presents methods that can be used to discover metamodels quality-of-fit graphically by means of two-dimensional plots. Three plot types are presented; these are the so-called circle plots, marksman plots, and ordinal plots. Such plots can be used to facilitate visual inspection of the effect on metamodel accuracy of each individual point in the data sample used for metamodel validation. The proposed methods can be used to complement quantitative validation statistics; in particular, for situations where there is not enough validation data or the validation data is too expensive to generate.  相似文献   

14.
The traffic on an Internet link is a packet stream: packets of varying sizes arriving for transmission on the link. Each packet has an arrival time, and contained within the packet are headers that carry many critical variables. Packet traces, which consist of captured headers and measurements of the arrival times, convey substantial information about the Internet—security, usage, network performance, and the performance of engineering protocols. This article discusses strategies for the analysis of very large databases of packet traces, and the architecture of a software system that facilitates the use of these strategies. The system has a pipeline: (1) raw packet traces; (2) a database with objects tailored to ensuing analyses; and (3) an environment with tools for data analysis: statistical methods, model fitting, and visualization. The pipeline addresses the full set of tasks in the study of packet streams, from the initial processing of raw packet traces to the final output, often a visual display. S-Net—an extensible, open-source software implementation of this architecture—is based on the R implementation of the S language for graphics and data analysis, and has been developed on Linux.  相似文献   

15.
This paper deals with the issue of estimating production frontier and measuring efficiency from a panel data set. First, it proposes an alternate method for the estimation of a production frontier on a short panel data set. The method is based on the so-called mean-and-covariance structure analysis which is closely related to the generalized method of moments. One advantage of the method is that it allows us to investigate the presence of correlations between individual effects and exogenous variables without the requirement of some available instruments uncorrelated with the individual effects as in instrumental variable estimation. Another advantage is that the method is well suited to a panel data set with a short number of periods. Second, the paper considers the question of recovering individual efficiency levels from the estimates obtained from the mean-and-covariance structure analysis. Since individual effects are here viewed as latent variables, they can be estimated as factor scores, i.e., weighted sums of the observed variables. We illustrate the proposed methods with the estimation of a stochastic production frontier on a short panel data of French fruit growers.  相似文献   

16.
Abstract

Simple superposition of individual curves, as arise in longitudinal studies, makes the detection of structure difficult due to clutter. This is especially true when the number of curves is large, but can even be exhibited in moderate sample settings when variability is high. Both Jones and Rice and Diggle, Liang, and Zeger proposed methods for identifying representative curves from a large collection that display the form and extent of variation of the curves. Here we propose the use of tree-structured regression, as adapted to longitudinal data, for this purpose. Properties of each method are described. A re-examination of atmospheric ozone data, analyzed by Jones and Rice, is also presented.  相似文献   

17.
The lattice add-on package implements Trellis graphics in R. One of the major recent changes to the package API has been to make all high level functions generic, with the traditional implementations available as the “formula” method. This allows for cleaner and more flexible implementations for certain uses that were permitted in the original S-PLUS version on a one-off basis. For example, dotplot could be used to display one-way tables, but the new approach naturally extends to multi-way tables as well. More importantly, it opens up the possibility of new Trellis displays specifically designed for previously unsupported classes. We present some examples of such extensions and describe a few issues generally relevant to the development of new Trellis-style visualizations using the lattice infrastructure.  相似文献   

18.
Abstract

The spread-location plot has often been used as a diagnostic plot suitable for many types of fitted statistical models. The spread-location plot—which plots the absolute residual or square-root absolute residual versus fitted value along with a robust loess smooth—is a useful replacement for the customary practice of plotting residuals versus fitted values. In this note, we show that neither absolute residual or square-root absolute residual is always appropriate for error distributions likely to be encountered in actual applications. Hence we recommend a multipanel display showing a suitable transformation of the absolute residual versus fitted value along with a boxplot to judge the symmetry achieved by the transformation. We conclude with an illustrative example.  相似文献   

19.
Abstract

Recognition and extraction of features in a nonparametric density estimate are highly dependent on correct calibration. The data-driven choice of bandwidth h in kernel density estimation is a difficult one that is compounded by the fact that the globally optimal h is not generally optimal for all values of x. In recognition of this fact a new type of graphical tool, the mode tree, is proposed. The basic mode tree plot relates the locations of modes in density estimates with the bandwidths of those estimates. Additional information can be included on the plot indicating factors such as the size of modes, how modes split, and the locations of antimodes and bumps. The use of a mode tree in adaptive multimodality investigations is proposed, and an example is given to show the value in using a normal kernel, as opposed to the biweight or other kernels, in such investigations. Examples of such investigations are provided for Ahrens's chondrite data and van Winkle's Hidalgo stamp data. Finally, the bivariate mode tree is introduced, together with an example using Scott's lipid data.  相似文献   

20.
Abstract

Scatterplots are the method of choice for displaying the distribution of points in two dimensions. They are used to discover patterns such as holes, outliers, modes, and association between the two variables. A common problem is overstriking—the overlap on the plotting surface of glyphs representing individual observations. Overstriking can create a misleading impression of the data distribution. The variable resolution bivariate plots (Varebi plots) proposed in this article deal with the problem of overstriking by mixing display of a density estimate and display of individual observations. The idea is to determine the display format by analyzing the actual amount of overstriking on the screen. Thus, the display format will depend on the sample size, the distribution of the observations, the size and shape of individual icons, and the size of the window. It may change automatically when the window is resized. Varebi plots reveal detail wherever possible, and show the overall trend when displaying detail is not feasible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号