首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

The grand tour and projection pursuit are two methods for exploring multivariate data. We show how to combine them into a dynamic graphical tool for exploratory data analysis, called a projection pursuit guided tour. This tool assists in clustering data when clusters are oddly shaped and in finding general low-dimensional structure in high-dimensional, and in particular, sparse data. An example shows that the method, which is projection-based, can be quite powerful in situations that may cause grief for methods based on kernel smoothing. The projection pursuit guided tour is also useful for comparing and developing projection pursuit indexes and illustrating some types of asymptotic results.  相似文献   

2.
Temporal data are information measured in the context of time. This contextual structure provides components that need to be explored to understand the data and that can form the basis of interactions applied to the plots. In multivariate time series, we expect to see temporal dependence, long term and seasonal trends, and cross-correlations. In longitudinal data, we also expect within and between subject dependence. Time series and longitudinal data, although analyzed differently, are often plotted using similar displays. We provide a taxonomy of interactions on plots that can enable exploring temporal components of these data types, and describe how to build these interactions using data transformations. Because temporal data are often accompanied other types of data we also describe how to link the temporal plots with other displays of data. The ideas are conceptualized into a data pipeline for temporal data and implemented into the R package cranvas. This package provides many different types of interactive graphics that can be used together to explore data or diagnose a model fit.  相似文献   

3.
Abstract

XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system (GIS) Arc View?, and the interactive multidimensional scaling program XGvis. XGobi is implemented in the X Window System? for portability as well as the ability to run across a network.  相似文献   

4.
Abstract

Projections of high-dimensional data onto low-dimensional subspaces provide insightful views for understanding multivariate relationships. This article discusses how to manually control the variable contributions to the projection. The user has control of the way a particular variable contributes to the viewed projection and can interactively adjust the variable's contribution. These manual controls complement the automatic views provided by a grand tour, or a guided tour, and give greatly improved flexibility to data analysts.  相似文献   

5.
Abstract

We propose a rudimentary taxonomy of interactive data visualization based on a triad of data analytic tasks: finding Gestalt, posing queries, and making comparisons. These tasks are supported by three classes of interactive view manipulations: focusing, linking, and arranging views. This discussion extends earlier work on the principles of focusing and linking and sets them on a firmer base. Next, we give a high-level introduction to a particular system for multivariate data visualization—XGobi. This introduction is not comprehensive but emphasizes XGobi tools that are examples of focusing, linking, and arranging views; namely, high-dimensional projections, linked scatterplot brushing, and matrices of conditional plots. Finally, in a series of case studies in data visualization, we show the powers and limitations of particular focusing, linking, and arranging tools. The discussion is dominated by high-dimensional projections that form an extremely well-developed part of XGobi. Of particular interest are the illustration of asymptotic normality of high-dimensional projections (a theorem of Diaconis and Freedman), the use of high-dimensional cubes for visualizing factorial experiments, and a method for interactively generating matrices of conditional plots with high-dimensional projections. Although there is a unifying theme to this article, each section—in particular the case studies—can be read separately.  相似文献   

6.
Abstract

This article introduces a new form of empirical distribution function (EDF) called the flipped empirical distribution function (FEDF), to represent univariate data graphically. Because the plot shows the location of individual points, it may be useful when we need to manipulate specific data points as with dynamic graphics. The article introduces several methods to explore multidimensional data using the FEDF. They are called a parallel FEDF, an FEDF scatterplot matrix, and an FEDF starplot. Usefulness of these plots in exploring multidimensional data becomes more prominent when they are implemented with the methods of dynamic graphics such as selecting, deleting, linking, locating, and identifying a group of data points.  相似文献   

7.
Abstract

We describe methods developed to interpolate and project flight paths of aircraft in controlled airspace over the continental United States from aperiodic position reports. There are a number of unusual features of the dynamic displays we have developed. Our visualizations can be viewed from either a fixed or moving viewpoint. The direction and distance of the focal point from the viewing point is under program control (allowing viewing in a direction other than the direction of motion of the viewpoint). The maximum and minimum depth of field is under program control (allowing viewing of selected local subsets of the data).  相似文献   

8.
Abstract

We present dynamic and static graphs for exploratory analysis of survival data. These graphs are based on a smooth semiparametric estimate of the survival probability as a function of time and a covariate. We overlay a contour plot of the conditional survival distribution on a scatterplot of time and covariate. This is augmented by plots of the estimated survival function at particular covariate values and the receiver operating characteristic curve at particular time points. In our XLisp-Stat implementation these plots are linked and the time and covariate values for the augmenting plots can be varied dynamically. The methods are illustrated on data from a clinical study of liver disease.  相似文献   

9.
This paper describes progress towards developing a platform for rapid prototyping of interactive data visualizations, using R, GGobi, rggobi and RGtk2. GGobi is a software tool for multivariate interactive graphics. At the core of GGobi is a data pipeline that incrementally transforms data through a series of stages into a plot and maps user interaction with the plot back to the data. The GGobi pipeline is extensible and mutable at runtime. The rggobi package, an interface from the R language to GGobi, has been augmented with a low-level interface that supports the customization of interactive data visualizations through the extension and manipulation of the GGobi pipeline. The large size of the GGobi API has motivated the use of the RGtk2 code generation system to create the low-level interface between R and GGobi. The software is demonstrated through an application to interactive network visualization.  相似文献   

10.
A spreadplot is a visualization that simultaneously shows several different views of a dataset or model. The individual views can be dynamic, can support high-interaction direct manipulation, and can be algebraically linked with each other, possibly via an underlying statistical model. Thus, when a data analyst changes the information shown in one view of a statistical model, the changes can be processed by the model and instantly represented in the other views. Spreadplots simplify the analyst's task when many different plots are relevant to the analysis at hand, as is the case in regression analysis, where there are many plots that can be used for model building and diagnosis. On the other hand, the development of a visualization involving many dynamic, highly interactive, directly manipulable graphics is not a trivial task. This article discusses a software architecture which simplifies the spreadplot developer's task. The architecture addresses the two main problems in constructing a spreadplot, simplifying the layout of the plots and structuring the communication between them.  相似文献   

11.
Abstract

Statistical environments such as S, R, XLisp-Stat, and others have had a dramatic effect on the way we, statistics practitioners, think about data and statistical methodology. However, the possibilities and challenges introduced by recent technological developments and the general ways we use computing conflict with the computational model of these systems. This article explores some of these challenges and identifies the need to support easy integration of functionality from other domains, and to export statistical methodology to other audiences and applications, both statically and dynamically. Existing systems can be improved in these domains with some already implemented extensions (see Section 5). However, the development of a new environment and computational model that exploits modern tools designed to handle many general aspects of these challenges appears more promising as a long-term approach. We present the architecture for such a new model named Omegahat. It lends itself to entirely new statistical computing paradigms. It is highly extensible at both the user and programmer level, and also encourages the development of new environments for different user groups. The Omegahat interactive language offers a continuity between the different programming tasks and levels via optional type checking and seamless access between the interpreted user language and the implementation language, Java. Parallel and distributed computing, network and database access, interactive graphics, and many other aspects of statistical computing are directly accessible to the user as a consequence of this seamless access. We describe the benefits of using Java as the implementation language for the environment and several innovative features of the user-level language which promise to assist development of software that can be used in many contexts. We also outline how this architecture can be integrated with existing environments such as R and S.

The ideas are drawn from work within the Omega Project for Statistical Computing. The project provides open-source software for researching and developing next generation statistical computing tools.  相似文献   

12.
Abstract

This article presents aspects of the implementation of a bidirectional link between the Geographic Information System (GIS) ArcView? and the interactive dynamic statistical graphics program XGobi. We describe the main functionality of the link, the underlying remote procedure call (RPC) mechanism, and internal data structures, and discuss topics such as security, concurrency, and linked brushing. We think that these topics are of particular interest to software authors intending to link similar software packages, and software users learning about strengths (and weaknesses) of the implementation of our link.  相似文献   

13.
Interactive web graphics are great for communication and knowledge sharing, but are difficult to leverage during the exploratory phase of a data science workflow. Even before the web, interactive graphics helped data analysts quickly gather insight from data, discover the unexpected, and develop better model diagnostics. Although web technologies make interactive graphics more accessible, they are not designed to fit inside an exploratory data analysis (EDA) workflow where rapid iteration between data manipulation, modeling, and visualization must occur. To better facilitate exploratory web graphics that are easily distributed, we need better interfaces between statistical computing environments (e.g., the R language) and client-side web technologies. We propose the R package animint for rapid creation of linked and animated web graphics through a simple extension of ggplot2’s implementation of the Grammar of Graphics. The extension allows one to write ggplot2 code and produce a standalone web page with multiple linked views. Supplementary material for this article is available online.  相似文献   

14.
Abstract

There are many examples of text data bases, including literary corpora and computer source code, in which statistics are associated with each line. A visualization technique for this class of data represents the text lines as thin colored rows within columns. The position, length, and indentation of each row corresponds to that of the text. The color of each row is determined by a statistic associated with each line. The display looks like a miniature picture of the text with the color showing the spatial distribution of the statistic within the text. Using this technique, SeeSoft?, a dynamic graphics software tool, can easily display 50,000 lines of text simultaneously on a high-resolution monitor.  相似文献   

15.
Abstract

Visualization is a critical technology for understanding complex, data-rich systems. Effective visualizations make important features of the data immediately recognizable and enable the user to discover interesting and useful results by highlighting patterns. A key element of such systems is the ability to interact with displays of data by selecting a subset for further investigation. This operation is needed for use in linked views systems and in drill-down analysis. It is a common manipulation in many other systems and is as ubiquitous as selecting icons in a desktop graphical user interface (GUI). It is therefore surprising to note that little research has been done on how selection can be implemented. This article addresses this omission, presenting a taxonomy for selection mechanisms and discussing the interactions between branches of the taxonomy.  相似文献   

16.
Abstract

In this article, we introduce a new class of contractive mappings and study analytical and computational aspects of a special case of Jungck-Khan iterative algorithm generated by this class of mappings. In particular, we improve upon strong convergence, rate of convergence and data dependence results existing in the current literature. Analytical as well numerical illustrative examples are given to support the new results.  相似文献   

17.
Abstract

This note extends the construction of the design matrix used for estimating cell probabilities with ignorable missing data described by Lipsitz, Parzen, and Molenberghs. A reformulation for the general case of an n-way table is described and implemented in a SAS macro program. The macro constructs this design matrix and offset variable, estimates the cell probabilities, and returns a table with the estimates, their standard errors, and fitted cell frequencies.  相似文献   

18.
Abstract

This article describes estimation of the cell probabilities in an R × C contingency table with ignorable missing data. Popular methods for maximizing the incomplete data likelihood are the EM-algorithm and the Newton-Raphson algorithm. Both of these methods require some modification of existing statistical software to get the MLEs of the cell probabilities as well as the variance estimates. We make the connection between the multinomial and Poisson likelihoods to show that the MLEs can be obtained in any generalized linear models program without additional programming or iteration loops.  相似文献   

19.
Abstract

Projection pursuit describes a procedure for searching high-dimensional data for “interesting” low-dimensional projections via the optimization of a criterion function called the projection pursuit index. By empirically examining the optimization process for several projection pursuit indexes, we observed differences in the types of structure that maximized each index. We were especially curious about differences between two indexes based on expansions in terms of orthogonal polynomials, the Legendre index, and the Hermite index. Being fast to compute, these indexes are ideally suited for dynamic graphics implementations.

Both Legendre and Hermite indexes are weighted L 2 distances between the density of the projected data and a standard normal density. A general form for this type of index is introduced that encompasses both indexes. The form clarifies the effects of the weight function on the index's sensitivity to differences from normality, highlighting some conceptual problems with the Legendre and Hermite indexes. A new index, called the Natural Hermite index, which alleviates some of these problems, is introduced.

A polynomial expansion of the data density reduces the form of the index to a sum of squares of the coefficients used in the expansion. This drew our attention to examining these coefficients as indexes in their own right. We found that the first two coefficients, and the lowest-order indexes produced by them, are the most useful ones for practical data exploration because they respond to structure that can be analytically identified, and because they have “long-sighted” vision that enables them to “see” large structure from a distance. Complementing this low-order behavior, the higher-order indexes are “short-sighted.” They are able to see intricate structure, but only when they are close to it.

We also show some practical use of projection pursuit using the polynomial indexes, including a discovery of previously unseen structure in a set of telephone usage data, and two cautionary examples which illustrate that structure found is not always meaningful.  相似文献   

20.
Longitudinal inspections of thickness at particular locations along a pipeline provide useful information to assess the remaining life of the pipeline. In applications with different mechanisms of corrosion processes, we have observed various types of general degradation paths. We present two applications of fitting a degradation model to describe the corrosion initiation and growth behavior in a pipeline. We use a Bayesian approach for parameter estimation for the degradation model. The failure‐time and remaining lifetime distributions are derived from the degradation model, and we compute Bayesian estimates and credible intervals of the failure‐time and remaining lifetime distributions for both individual segments and for the entire pipeline circuit.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号