首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Graphics play a crucial role in statistical analysis and data mining. Being able to quantify structure in data that is visible in plots, and how people read the structure from plots is an ongoing challenge. The lineup protocol provides a formal framework for data plots, making inference possible. The data plot is treated like a test statistic, and lineup protocol acts like a comparison with the sampling distribution of the nulls. This article describes metrics for describing structure in data plots and evaluates them in relation to the choices that human readers made during several large Amazon Turk studies using lineups. The metrics that were more specific to the plot types tended to better match subject choices, than generic metrics. The process that we followed to evaluate metrics will be useful for general development of numerically measuring structure in plots, and also in future experiments on lineups for choosing blocks of pictures. Supplementary materials for this article are available online.  相似文献   

2.
Abstract

The grand tour and projection pursuit are two methods for exploring multivariate data. We show how to combine them into a dynamic graphical tool for exploratory data analysis, called a projection pursuit guided tour. This tool assists in clustering data when clusters are oddly shaped and in finding general low-dimensional structure in high-dimensional, and in particular, sparse data. An example shows that the method, which is projection-based, can be quite powerful in situations that may cause grief for methods based on kernel smoothing. The projection pursuit guided tour is also useful for comparing and developing projection pursuit indexes and illustrating some types of asymptotic results.  相似文献   

3.
Temporal data are information measured in the context of time. This contextual structure provides components that need to be explored to understand the data and that can form the basis of interactions applied to the plots. In multivariate time series, we expect to see temporal dependence, long term and seasonal trends, and cross-correlations. In longitudinal data, we also expect within and between subject dependence. Time series and longitudinal data, although analyzed differently, are often plotted using similar displays. We provide a taxonomy of interactions on plots that can enable exploring temporal components of these data types, and describe how to build these interactions using data transformations. Because temporal data are often accompanied other types of data we also describe how to link the temporal plots with other displays of data. The ideas are conceptualized into a data pipeline for temporal data and implemented into the R package cranvas. This package provides many different types of interactive graphics that can be used together to explore data or diagnose a model fit.  相似文献   

4.
This article develops a generalization of the scatterplot matrix based on the recognition that most datasets include both categorical and quantitative information. Traditional grids of scatterplots often obscure important features of the data when one or more variables are categorical but coded as numerical. The generalized pairs plot offers a range of displays of paired combinations of categorical and quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two categorical variables. A side-by-side boxplot, stripplot, faceted histogram, or density plot helps visualize a categorical and a quantitative variable. A traditional scatterplot is suitable for displaying a pair of numerical variables, but options also support density contours or annotating summary statistics such as the correlation and number of missing values, for example. By combining these, the generalized pairs plot may help to reveal structure in multivariate data that otherwise might go unnoticed in the process of exploratory data analysis. Two different R packages provide implementations of the generalized pairs plot, gpairs and GGally. Supplementary materials for this article are available online on the journal web site.  相似文献   

5.
Abstract

XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system (GIS) Arc View?, and the interactive multidimensional scaling program XGvis. XGobi is implemented in the X Window System? for portability as well as the ability to run across a network.  相似文献   

6.
The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources.  相似文献   

7.
Summary  The paper introduces the idea of generalising a cumulative frequency curve to show arbitrary cumulative counts. For example, in demographic studies generalised cumulative curves can represent the distribution of population or area. Generalised cumulative curves can be a valuable instrument for exploratory data analysis. The use of cumulative curves in an investigation of population statistics in Northwest England allowed us to discover interesting facts about relationships between the distribution of national minorities and the degree of deprivation. We detected that, while high concentration of national minorities occurs, in general, in underprivileged districts, there are some differences related to the origin of the minorities. The paper sets the applicability conditions for generalised cumulative curves and compares them with other graphical tools for exploratory data analysis.  相似文献   

8.
Interactive web graphics are great for communication and knowledge sharing, but are difficult to leverage during the exploratory phase of a data science workflow. Even before the web, interactive graphics helped data analysts quickly gather insight from data, discover the unexpected, and develop better model diagnostics. Although web technologies make interactive graphics more accessible, they are not designed to fit inside an exploratory data analysis (EDA) workflow where rapid iteration between data manipulation, modeling, and visualization must occur. To better facilitate exploratory web graphics that are easily distributed, we need better interfaces between statistical computing environments (e.g., the R language) and client-side web technologies. We propose the R package animint for rapid creation of linked and animated web graphics through a simple extension of ggplot2’s implementation of the Grammar of Graphics. The extension allows one to write ggplot2 code and produce a standalone web page with multiple linked views. Supplementary material for this article is available online.  相似文献   

9.
This paper describes progress towards developing a platform for rapid prototyping of interactive data visualizations, using R, GGobi, rggobi and RGtk2. GGobi is a software tool for multivariate interactive graphics. At the core of GGobi is a data pipeline that incrementally transforms data through a series of stages into a plot and maps user interaction with the plot back to the data. The GGobi pipeline is extensible and mutable at runtime. The rggobi package, an interface from the R language to GGobi, has been augmented with a low-level interface that supports the customization of interactive data visualizations through the extension and manipulation of the GGobi pipeline. The large size of the GGobi API has motivated the use of the RGtk2 code generation system to create the low-level interface between R and GGobi. The software is demonstrated through an application to interactive network visualization.  相似文献   

10.
This article presents individual conditional expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. Classical partial dependence plots (PDPs) help visualize the average partial relationship between the predicted response and one or more features. In the presence of substantial interaction effects, the partial response relationship can be heterogeneous. Thus, an average curve, such as the PDP, can obfuscate the complexity of the modeled relationship. Accordingly, ICE plots refine the PDP by graphing the functional relationship between the predicted response and the feature for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate, suggesting where and to what extent heterogeneities might exist. In addition to providing a plotting suite for exploratory analysis, we include a visual test for additive structure in the data-generating model. Through simulated examples and real datasets, we demonstrate how ICE plots can shed light on estimated models in ways PDPs cannot. Procedures outlined are available in the R package ICEbox.  相似文献   

11.
The traffic on an Internet link is a packet stream: packets of varying sizes arriving for transmission on the link. Each packet has an arrival time, and contained within the packet are headers that carry many critical variables. Packet traces, which consist of captured headers and measurements of the arrival times, convey substantial information about the Internet—security, usage, network performance, and the performance of engineering protocols. This article discusses strategies for the analysis of very large databases of packet traces, and the architecture of a software system that facilitates the use of these strategies. The system has a pipeline: (1) raw packet traces; (2) a database with objects tailored to ensuing analyses; and (3) an environment with tools for data analysis: statistical methods, model fitting, and visualization. The pipeline addresses the full set of tasks in the study of packet streams, from the initial processing of raw packet traces to the final output, often a visual display. S-Net—an extensible, open-source software implementation of this architecture—is based on the R implementation of the S language for graphics and data analysis, and has been developed on Linux.  相似文献   

12.
Traditionally, supplier selection models are based on cardinal data with less emphasis on ordinal data. However, with the widespread use of manufacturing philosophies such as just-in-time (JIT), emphasis has shifted to the simultaneous consideration of cardinal and ordinal data in supplier selection process. To select the best suppliers in the presence of both cardinal and ordinal data, this paper proposes an innovative method, which is based on imprecise data envelopment analysis (IDEA). A numerical example demonstrates the application of the proposed method.  相似文献   

13.
Data concerning the vaginal bleeding patterns of women using different forms of fertility regulation are presented. The data were collected in the form of diaries which were completed by the women themselves by recording the presence or absence of vaginal bleeding on a daily basis. The object of the paper is to invite suggestions for suitable methods of presentation and analysis of such data so that the results of studies on methods of fertility regulation can be better summarized and interpreted.  相似文献   

14.
针对嫦娥三号软着陆轨道设计与控制策略问题,在合理假设的前提下,建立动力学模型,求解得到了嫦娥三号着陆准备轨道近月点和远月点的速度。针对软着陆过程的6个阶段,通过受力分析,建立了嫦娥三号运动的微分方程模型,以燃料消耗最小为优化目标,以每个阶段的起止状态为约束条件,将软着陆轨道的优化设计问题转化为主发动机推力的泛函极值问题,并将其控制函数转化为近似的多项式函数优化问题。运用四阶Runge-Kutta差分迭代方法进行求解计算,从而得到各个阶段的最优控制函数和控制策略。结果表明,嫦娥三号软着陆过程耗时695s,消耗燃料1 269.1kg。  相似文献   

15.
Computation of M. L. estimates for the parameters of a negative binomial distribution from grouped data is considered. For this problem the Scoring, Newton—Raphson and E-M algorithm is derived. Using simulated data the performance of the algorithms is compared with respect to convergence, number of iterations and computing time. Finally an empirical example drawn from actuarial science is given.  相似文献   

16.
In this article, the approximate solution of nonlinear heat diffusion and heat transfer equation are developed via homotopy analysis method (HAM). This method is a strong and easy‐to‐use analytic tool for investigating nonlinear problems, which does not need small parameters. HAM contains the auxiliary parameter ?, which provides us with a simple way to adjust and control the convergence region of solution series. By suitable choice of the auxiliary parameter ?, we can obtain reasonable solutions for large modulus. In this study, we compare HAM results, with those of homotopy perturbation method and the exact solutions. The first differential equation to be solved is a straight fin with a temperature‐dependent thermal conductivity and the second one is the two‐ and three‐dimensional unsteady diffusion problems. © 2009 Wiley Periodicals, Inc. Numer Methods Partial Differential Eq 2010  相似文献   

17.
We consider the Calderón problem in an infinite cylindrical domain, whose cross section is a bounded domain of the plane. We prove log–log stability in the determination of the isotropic periodic conductivity coefficient from partial Dirichlet data and partial Neumann boundary observations of the solution. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

18.
For a fixed value of a parameter k≥2, the Maximum k-Edge-Colorable Subgraph Problem consists in finding k edge-disjoint matchings in a simple graph, with the goal of maximising the total number of edges used. The problem is known to be -hard for all k, but there exist polynomial time approximation algorithms with approximation ratios tending to 1 as k tends to infinity. Herein we propose improved approximation algorithms for the cases of k=2 and k=3, having approximation ratios of 5/6 and 4/5, respectively.  相似文献   

19.
The convergences of three L1 spline methods for scattered data interpolation and fitting using bivariate spline spaces are studied in this paper. That is, L1 interpolatory splines, splines of least absolute deviation, and L1 smoothing splines are shown to converge to the given data function under some conditions and hence, the surfaces from these three methods will resemble the given data values.  相似文献   

20.
This paper underlines the role of directional compactness in the scalarization of graphical derivatives of set-valued maps taking values in infinite-dimensional spaces. Two main theorems are given. The first one states the equivalence of contingent epiderivatives and τw-contingent epiderivatives for directionally compact maps. The second main result proves a variational characterization for the contingent epiderivative of stable and directionally compact maps taking values in general image spaces, extending known results in finite-dimensional and reflexive Banach spaces. The hypotheses given are minimal as is shown by means of several examples. Connections of these theorems with other results of the literature are also provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号