首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Online auctions have been the subject of many empirical research efforts in the fields of economics and information systems. These research efforts are often based on analyzing data from Web sites such as eBay.com which provide public information about sequences of bids in closed auctions, typically in the form of tables on HTML pages. The existing literature on online auctions focuses on tools like summary statistics and more formal statistical methods such as regression models. However, there is a clear void in this growing body of literature in developing appropriate visualization tools. This is quite surprising, given that the sheer amount of data that can be found on sites such as eBay.com is overwhelming and can often not be displayed informatively using standard statistical graphics. In this article we introduce graphical methods for visualizing online auction data in ways that are informative and relevant to the types of research questions that are of interest. We start by using profile plots that reveal aspects of an auction such as bid values, bidding intensity, and bidder strategies. We then introduce the concept of statistical zooming (STAT-zoom) which can scale up to be used for visualizing large amounts of auctions. STAT-zoom adds the capability of looking at data summaries at various time scales interactively. Finally, we develop auction calendars and auction scene visualizations for viewing a set of many concurrent auctions. The different visualization methods are demonstrated using data on multiple auctions collected from eBay.com.  相似文献   

2.
Abstract

This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary.  相似文献   

3.
4.
5.
More than 50 years ago, John Tukey called for a reformation of academic statistics. In “The Future of Data Analysis,” he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or “data analysis.” Ten to 20 years ago, John Chambers, Jeff Wu, Bill Cleveland, and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland and Wu even suggested the catchy name “data science” for this envisioned field. A recent and growing phenomenon has been the emergence of “data science” programs at major universities, including UC Berkeley, NYU, MIT, and most prominently, the University of Michigan, which in September 2015 announced a $100M “Data Science Initiative” that aims to hire 35 new faculty. Teaching in these new programs has significant overlap in curricular subject matter with traditional statistics courses; yet many academic statisticians perceive the new programs as “cultural appropriation.” This article reviews some ingredients of the current “data science moment,” including recent commentary about data science in the popular media, and about how/whether data science is really different from statistics. The now-contemplated field of data science amounts to a superset of the fields of statistics and machine learning, which adds some technology for “scaling up” to “big data.” This chosen superset is motivated by commercial rather than intellectual developments. Choosing in this way is likely to miss out on the really important intellectual event of the next 50 years. Because all of science itself will soon become data that can be mined, the imminent revolution in data science is not about mere “scaling up,” but instead the emergence of scientific studies of data analysis science-wide. In the future, we will be able to predict how a proposal to change data analysis workflows would impact the validity of data analysis across all of science, even predicting the impacts field-by-field. Drawing on work by Tukey, Cleveland, Chambers, and Breiman, I present a vision of data science based on the activities of people who are “learning from data,” and I describe an academic field dedicated to improving that activity in an evidence-based manner. This new field is a better academic enlargement of statistics and machine learning than today’s data science initiatives, while being able to accommodate the same short-term goals. Based on a presentation at the Tukey Centennial Workshop, Princeton, NJ, September 18, 2015.  相似文献   

6.
In 1984, Cleveland suggested that statisticians have an important role in changing the use of graphics in science for the better. Thirty years later, we compared graphs published in top-rated applied science and statistics journals, evaluated for overall quality and against five principles of graphical excellence. Nearly 40% of the 97 graphs we sampled were rated as poor, with no striking differences between the applied science and statistics graphs. Better use of graphs requires better definition of variables, units of measurement, scales, groups, and other graphical elements, and more routine use of grid lines on a “standard” set of graphical forms. Progress over the next 30 years needs to be supported by changes in software defaults.  相似文献   

7.
Hard times is a satire against mid-Victorian statisticians, those whom Dickens called ‘the representatives of the wickedest and most enormous vice of this time’. Historians of mathematics have seen the novel as a cruel parody of statistical determinism, a fatalistic movement which swept the continent in the 1860s and 1870s. But to see it as such is to credit Dickens with a better understanding of contemporary mathematics than he in fact possessed. The statistics in Hard times are not the probabilistic theories of continental academics. They are the mundane facts and figures of the much more prosaic English statistical movement.  相似文献   

8.
A spreadplot is a visualization that simultaneously shows several different views of a dataset or model. The individual views can be dynamic, can support high-interaction direct manipulation, and can be algebraically linked with each other, possibly via an underlying statistical model. Thus, when a data analyst changes the information shown in one view of a statistical model, the changes can be processed by the model and instantly represented in the other views. Spreadplots simplify the analyst's task when many different plots are relevant to the analysis at hand, as is the case in regression analysis, where there are many plots that can be used for model building and diagnosis. On the other hand, the development of a visualization involving many dynamic, highly interactive, directly manipulable graphics is not a trivial task. This article discusses a software architecture which simplifies the spreadplot developer's task. The architecture addresses the two main problems in constructing a spreadplot, simplifying the layout of the plots and structuring the communication between them.  相似文献   

9.
On the accuracy of statistical procedures in Microsoft Excel 2010   总被引:1,自引:0,他引:1  
All previous versions of Microsoft Excel until Excel 2007 have been criticized by statisticians for several reasons, including the accuracy of statistical functions, the properties of the random number generator, the quality of statistical add-ins, the weakness of the Solver for nonlinear regression, and the data graphical representation. Until recently Microsoft did not make an attempt to fix all the errors in Excel and was still marketing a product that contained known errors. We provide an update of these studies given the recent release of Excel 2010 and we have added OpenOffice.org Calc 3.3 and Gnumeric 1.10.16 to the analysis, for the purpose of comparison. The conclusion is that the stream of papers, mainly in Computational Statistics and Data Analysis, has started to pay off: Microsoft has partially improved the statistical aspects of Excel, essentially the statistical functions and the random number generator.  相似文献   

10.
Analysis of means (ANOM), similar to Shewhart control chart that exhibits individual mean effects on a graphical display, is an attractive alternative mean testing procedure for the analysis of variance (ANOVA). The procedure is primarily used to analyze experimental data from designs with only fixed effects. Recently introduced, the ANOM procedure based on the q‐distribution (ANOMQ procedure) generalizes the ANOM approach to random effects models. This article reveals that the application of ANOM and ANOMQ procedures in advanced designs such as hierarchically nested and split‐plot designs with fixed, random, and mixed effects enhances the data visualization aspect in graphical testing. Data from two real‐world experiments are used to illustrate the proposed procedure; furthermore, these experiments exhibit the ANOM procedures' visualization ability compared with ANOVA from the point of view of the practitioner.  相似文献   

11.
Almost every U.S.-based statistician working on problems motivated by atmospheric science is connected to the statistics program at the National Center for Atmospheric Research (NCAR). Through its permanent staff scientists, postdoctoral researchers, visitors, seminars, workshops, and published work, NCAR has made a profound impact on the community of statisticians working in the atmospheric and climate sciences. This past year saw a reorganization of statistics at NCAR. This article looks back at more than 20 years of statistics there.  相似文献   

12.
Increasing attention has been given over the last decade by the statistics, mathematics and science education communities to the development of statistical literacy and numeracy skills of all citizens and the enhancement of statistics education at all levels. This paper introduces the emerging discipline of statistics education and considers its role in the development of these important skills. The paper begins with information on the growing importance of statistics in today's society, schools and colleges, summarizes unique challenges students face as they learn statistics, and makes a case for the importance of collaboration between mathematicians and statisticians in preparing teachers to teach students how to understand and reason about data. We discuss the differences and interrelations between statistics and mathematics, recognizing that mathematics is the discipline that has traditionally included instruction in statistics. We conclude with an argument that statistics should be viewed as a bridge between mathematics and science and should be taught in both disciplines.  相似文献   

13.
Abstract

Many statistical multiple integration problems involve integrands that have a dominant peak. In applying numerical methods to solve these problems, statisticians have paid relatively little attention to existing quadrature methods and available software developed in the numerical analysis literature. One reason these methods have been largely overlooked, even though they are known to be more efficient than Monte Carlo for well-behaved problems of low dimensionality, may be that when applied naively they are poorly suited for peaked-integrand problems. In this article we use transformations based on “split t” distributions to allow the integrals to be efficiently computed using a subregion-adaptive numerical integration algorithm. Our split t distributions are modifications of those suggested by Geweke and may also be used to define Monte Carlo importance functions. We then compare our approach to Monte Carlo. In the several examples we examine here, we find subregion-adaptive integration to be substantially more efficient than importance sampling.  相似文献   

14.
We present CARTscans, a graphical tool that displays predicted values across a fourdimensional subspace. We show how these plots are useful for understanding the structure and relationships between variables in a wide variety of models, including (but not limited to) regression trees, ensembles of trees, and linear regressions with varying degrees of interactions. In addition, the common visualization framework allows diverse complex models to be visually compared in a way that illuminates the similarities and differences in the underlying methods, facilitates the choice of a particular model structure, and provides a useful check for implausible predictions of future observations in regions with little or no data.  相似文献   

15.
“Exploratory” and “confirmatory” data analysis can both be viewed as methods for comparing observed data to what would be obtained under an implicit or explicit statistical model. For example, many of Tukey's methods can be interpreted as checks against hypothetical linear models and Poisson distributions. In more complex situations, Bayesian methods can be useful for constructing reference distributions for various plots that are useful in exploratory data analysis. This article proposes an approach to unify exploratory data analysis with more formal statistical methods based on probability models. These ideas are developed in the context of examples from fields including psychology, medicine, and social science.  相似文献   

16.
Abstract

Belief networks provide an important bridge between statistical modeling and expert systems. This article presents methods for visualizing probabilistic “evidence flows” in belief networks, thereby enabling belief networks to explain their behavior. Building on earlier research on explanation in expert systems, we present a hierarchy of explanations, ranging from simple colorings to detailed displays. Our approach complements parallel work on textual explanations in belief networks. Graphical-Belief, Mathsoft Inc.'s belief network software, implements the methods.  相似文献   

17.
关于高维、相依和不完全数据的统计分析   总被引:4,自引:0,他引:4  
李国英 《数学进展》2002,31(3):193-199
本文包括3部分:第1部分简要讲述统计学的发展和面临的挑战,说明高维,相依和不完全数据的统计分析是在现代科学技术和社会经济中普遍存在的困难问题,第2部分概述我国学者在相关领域所取得的成果:最后谈谈对该领域当前研究趋势的个人认识。  相似文献   

18.
Abstract

Programming environments such as S and Lisp-Stat have languages for performing computations, data storage mechanisms, and a graphical interface. These languages provide an interactive interface to data analysis that is invaluable. To take full advantage of these programming environments, statisticians must understand the differences between them. Ihaka and Gentleman introduced R, a version of S which uses a different scoping regimen. In some ways this makes R behave more like Lisp-Stat. This article discusses the concept of scoping rules and shows how lexical scope can enhance the functionality of a language.  相似文献   

19.
Medical practitioners were largely responsible for the development and application of vital statistics in the mid-nineteenth century, whilst mathematicians established the discipline of mathematical statistics at the end of the nineteenth century in Victorian Britain. The ground-breaking work of such vital statisticians as T R Malthus, William Farr, Edwin Chadwick and Florence Nightingale are examined. Charles Darwin's emphasis of individual biological continuous variation, which played a pivotal role in the epistemic transition from vital to mathematical is assessed in the context of the innovative work of these mathematical statisticians: Francis Galton, W F R Weldon, and primarily Karl Pearson with contributions from Francis Ysidro Edgeworth, George Udny Yule and William Sealy Gosset.  相似文献   

20.
This article proposes a new approach to principal component analysis (PCA) for interval-valued data. Unlike classical observations, which are represented by single points in p-dimensional space ?p, interval-valued observations are represented by hyper-rectangles in ?p, and as such, have an internal structure that does not exist in classical observations. As a consequence, statistical methods for classical data must be modified to account for the structure of the hyper-rectangles before they can be applied to interval-valued data. This article extends the classical PCA method to interval-valued data by using the so-called symbolic covariance to determine the principal component (PC) space to reflect the total variation of interval-valued data. The article also provides a new approach to constructing the observations in a PC space for better visualization. This new representation of the observations reflects their true structure in the PC space. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号