期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Journal of computational and graphical statistics》2013,22(2):299-319

Online auctions have been the subject of many empirical research efforts in the fields of economics and information systems. These research efforts are often based on analyzing data from Web sites such as eBay.com which provide public information about sequences of bids in closed auctions, typically in the form of tables on HTML pages. The existing literature on online auctions focuses on tools like summary statistics and more formal statistical methods such as regression models. However, there is a clear void in this growing body of literature in developing appropriate visualization tools. This is quite surprising, given that the sheer amount of data that can be found on sites such as eBay.com is overwhelming and can often not be displayed informatively using standard statistical graphics. In this article we introduce graphical methods for visualizing online auction data in ways that are informative and relevant to the types of research questions that are of interest. We start by using profile plots that reveal aspects of an auction such as bid values, bidding intensity, and bidder strategies. We then introduce the concept of statistical zooming (STAT-zoom) which can scale up to be used for visualizing large amounts of auctions. STAT-zoom adds the capability of looking at data summaries at various time scales interactively. Finally, we develop auction calendars and auction scene visualizations for viewing a set of many concurrent auctions. The different visualization methods are demonstrated using data on multiple auctions collected from eBay.com. 相似文献

2.

Recognizing and visualizing copulas: An approach using local Gaussian approximation

《Insurance: Mathematics and Economics》2014

In this paper we examine the relationship between a newly developed local dependence measure, the local Gaussian correlation, and standard copula theory. We are able to describe characteristics of the dependence structure in different copula models in terms of the local Gaussian correlation. Further, we construct a goodness-of-fit test for bivariate copula models. An essential ingredient of this test is the use of a canonical local Gaussian correlation and Gaussian pseudo-observations which make the test independent of the margins, so that it is a genuine test of the copula structure. A Monte Carlo study reveals that the test performs very well compared to a commonly used alternative test. We also propose two types of diagnostic plots which can be used to investigate the cause of a rejected null. Finally, our methods are applied to a “classical” insurance data set. 相似文献

3.

Interactive Visualization of Hierarchically Structured Data

Kris Sankaran Susan Holmes 《Journal of computational and graphical statistics》2018,27(3):553-563

We introduce methods for visualization of data structured along trees, especially hierarchically structured collections of time series. To this end, we identify questions that often emerge when working with hierarchical data and provide an R package to simplify their investigation. Our key contribution is the adaptation of the visualization principles of focus-plus-context and linking to the study of tree-structured data. Our motivating application is to the analysis of bacterial time series, where an evolutionary tree relating bacteria is available a priori. However, we have identified common problem types where, if a tree is not directly available, it can be constructed from data and then studied using our techniques. We perform detailed case studies to describe the alternative use cases, interpretations, and utility of the proposed visualization methods. 相似文献

4.

复杂数据统计过程的若干研究

下载免费PDF全文

邹长亮《中国科学:数学》2013,43(8):741-750

统计过程控制(statistical process contor, SPC) 是应用统计方法对过程中的各个阶段进行监控,从而达到改进和保证质量的目的. 本文在一些重要的前沿问题上展开研究, 其中包括profile 数据过程的监控和诊断、监测drift 飘移的控制图、多元过程控制和多阶段过程的检测和诊断. 本文引入并开发各种新的统计技术, 紧密结合计算算法, 解决这些当前质量控制领域研究的难点问题. 相似文献

5.

Interior-Point Algorithms,Penalty Methods and Equilibrium Problems

Hande Y. Benson Arun Sen David F. Shanno Robert J. Vanderbei 《Computational Optimization and Applications》2006,34(2):155-182

In this paper we consider the question of solving equilibrium problems—formulated as complementarity problems and, more generally, mathematical programs with equilibrium constraints (MPECs)—as nonlinear programs, using an interior-point approach. These problems pose theoretical difficulties for nonlinear solvers, including interior-point methods. We examine the use of penalty methods to get around these difficulties and provide substantial numerical results. We go on to show that penalty methods can resolve some problems that interior-point algorithms encounter in general. An erratum to this article is available at . 相似文献

6.

The Three-Dimensional Pallet Chart: An Analysis of the Factors Affecting the Set of Feasible Layouts for a Class of Two-Dimensional Packing Problems

Kathryn A. Dowsland 《The Journal of the Operational Research Society》1984,35(10):895-905

The two-dimensional orthogonal packing problem of packing identical rectangles into a large containing rectangle is important in pallet loading and has recently received much attention in O.R. publications. In this paper, we examine the conditions under which the set of feasible layouts remains unchanged and show that these conditions can be represented by a series of planes in three-dimensional space. We call this representation the three-dimensional pallet chart because it is an extension of the two-dimensional pallet charts presently used in many practical situations. The strength of this result is demonstrated by three examples of its use. Accurate two-dimensional charts are produced from the three-dimensional version with a minimum of calculation, and a complete sensitivity analysis to changes in box and pallet dimensions can be carried out visually by viewing the chart at different angles. Finally, the result is used to generate a new procedure for determining the maximum number of rectangles which can be fitted. This method is shown to be accurate for over 90% of observations in a random sample of 5000—an improvement of 20% over previous methods. 相似文献

7.

Selection of parameter values in environmental models using sparse data: A case study

George M. Hornberger Bernard J. Cosby 《Applied mathematics and computation》1985,17(4):335-355

Models of environmental processes must often be constructed without the use of extensive data sets. This can occur because the exercise is preliminary (aimed at guiding future data collection) or because requisite data are extremely difficult, expensive, or even impossible to obtain. In such cases traditional, statistically based methods for estimating parameters in the model cannot be applied; in fact, parameter estimation cannot be accomplished in a rigorous way at all. We examine the use of a regionalized sensitivity analysis procedure to select appropriate values for parameters in cases where only sparse, imprecise data are available. The utility of the method is examined in the context of equilibrium and dynamic models for describing water quality and hydrological data in a small catchment in Shehandoah National Park, Virginia. Results demonstrate that (1) models can be “tentatively calibrated” using this procedure; (2) the data most likely to provide a stringent test of the model can be identified; and (3) potential problems with model identifiability can be exposed in a preliminary analysis. 相似文献

8.

Statistical Significance of Clustering Using Soft Thresholding

Hanwen Huang Yufeng Liu Ming Yuan J. S. Marron 《Journal of computational and graphical statistics》2013,22(4):975-993

Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available, when the data are very high in dimension. Statistical significance of clustering (SigClust) is a recently developed cluster evaluation tool for high-dimensional low sample size (HDLSS) data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of Type I error, in the important case where there are a few very large eigenvalues. This article addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues, which leads to a much improved SigClust. Major improvements in SigClust performance are shown by both mathematical analysis, based on the new notion of theoretical cluster index (TCI), and extensive simulation studies. Applications to some cancer genomic data further demonstrate the usefulness of these improvements. 相似文献

9.

Computation of Smooth Manifolds Via Rigorous Multi-parameter Continuation in Infinite Dimensions

Marcio Gameiro Jean-Philippe Lessard Alessandro Pugliese 《Foundations of Computational Mathematics》2016,16(2):531-575

In this paper, we introduce a constructive rigorous numerical method to compute smooth manifolds implicitly defined by infinite-dimensional nonlinear operators. We compute a simplicial triangulation of the manifold using a multi-parameter continuation method on a finite-dimensional projection. The triangulation is then used to construct local charts and an atlas of the manifold in the infinite-dimensional domain of the operator. The idea behind the construction of the smooth charts is to use the radii polynomial approach to verify the hypotheses of the uniform contraction principle over a simplex. The construction of the manifold is globalized by proving smoothness along the edge of adjacent simplices. We apply the method to compute portions of a two-dimensional manifold of equilibria of the Cahn–Hilliard equation. 相似文献

10.

A stochastic expectation-maximization algorithm for the analysis of system lifetime data with known signature

Yandan Yang Hon Keung Tony Ng Narayanaswamy Balakrishnan 《Computational Statistics》2016,31(2):609-641

Statistical estimation of the model parameters of component lifetime distribution based on system lifetime data with known system structure is discussed here. We propose the use of stochastic expectation-maximization (SEM) algorithm for obtaining the maximum likelihood estimates of model parameters based on complete and censored system lifetimes. Different ways of implementing the SEM algorithm are also studied. We have shown that the proposed methods are feasible and are easy to implement for various families of component lifetime distributions. The proposed methodologies are then illustrated with two popular lifetime models—the Weibull and Birnbaum-Saunders distributions. Monte Carlo simulation is then used to compare the performance of the proposed methods with the corresponding estimation by direct maximization. Finally, two illustrative examples are presented along with some concluding remarks. 相似文献

11.

Revisiting fitting monotone polynomials to data

Kevin Murray Samuel Müller Berwin A. Turlach 《Computational Statistics》2013,28(5):1989-2005

We revisit Hawkins’ (Comput Stat 9(3):233–247, 1994) algorithm for fitting monotonic polynomials and discuss some practical issues that we encountered using this algorithm, for example when fitting high degree polynomials or situations with a sparse design matrix but multiple observations per $x$ -value. As an alternative, we describe a new approach to fitting monotone polynomials to data, based on different characterisations of monotone polynomials and using a Levenberg–Marquardt type algorithm. We consider different parameterisations, examine effective starting values for the non-linear algorithms, and discuss some limitations. We illustrate our methodology with examples of simulated and real world data. All algorithms discussed in this paper are available in the R Development Core Team (A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, 2011) package MonoPoly. 相似文献

12.

The Plot-Data Interface in Statistical Graphics

Catherine Hurley 《Journal of computational and graphical statistics》2013,22(4):365-379

Abstract

Statistical software systems include modules for manipulating data sets, model fitting, and graphics. Because plots display data, and models are fit to data, both the model-fitting and graphics modules depend on the data. Today's statistical environments allow the analyst to choose or even build a suitable data structure for storing the data and to implement new kinds of plots. The multiplicity problem caused by many plot varieties and many data representations is avoided by constructing a plot-data interface. The interface is a convention by which plots communicate with data sets, allowing plots to be independent of the actual data representation. This article describes the components of such a plot-data interface. The same strategy may be used to deal with the dependence of model-fitting procedures on data. 相似文献

13.

Subregion-Adaptive Integration of Functions Having a Dominant Peak

Alan Genz Robert E. Kass 《Journal of computational and graphical statistics》2013,22(1):92-111

Abstract

Many statistical multiple integration problems involve integrands that have a dominant peak. In applying numerical methods to solve these problems, statisticians have paid relatively little attention to existing quadrature methods and available software developed in the numerical analysis literature. One reason these methods have been largely overlooked, even though they are known to be more efficient than Monte Carlo for well-behaved problems of low dimensionality, may be that when applied naively they are poorly suited for peaked-integrand problems. In this article we use transformations based on “split t” distributions to allow the integrals to be efficiently computed using a subregion-adaptive numerical integration algorithm. Our split t distributions are modifications of those suggested by Geweke and may also be used to define Monte Carlo importance functions. We then compare our approach to Monte Carlo. In the several examples we examine here, we find subregion-adaptive integration to be substantially more efficient than importance sampling. 相似文献

14.

Preemptive benchmarking problem: An approach for official statistics in small areas

A. Sedeño-Noda E. González-Dávila C. González-Martín A. González-Yanes 《European Journal of Operational Research》2009

National Statistical Agencies and Autonomous Institutions are extremely interested in using information from those areas that are actually smaller than the actual areas for which a survey is initially designed. As such, small area estimation and its application are valuable when conducting research on Official Statistics. A wide range of different methods are available which provide estimations to small area levels, being reasonable to guarantee that they add up to the published design-based estimations in a large area that includes these small areas. This requirement is known as benchmarking. Different algorithms, all based on distances between original data and modified data, are introduced in this paper, with the intention of satisfying the benchmarking property. We provide rules to apply these proposed calibrated methods according to user criteria. Goal programming with priorities methodology is used to represent user preferences. The result is a collection of different interdependent network flow problems. Some of these problems require the development of ad hoc methods. The introduced methods are assessed by a Monte Carlo simulation study using the Spanish Labour Force Survey in the Canary Islands. The results also show that the consistency of the estimator is independent of the used calibrated methods, but it does depend on the benchmarking weights. 相似文献

15.

Rating non-elite tennis players using team doubles competition results

S R Clarke 《The Journal of the Operational Research Society》2011,62(7):1385-1390

Statistical methods can be useful in rating non-elite tennis players. This paper shows how clubs can use simple optimization techniques to rate their club's players in doubles tennis competitions. Even though clubs lack all relevant information, the effects of home advantage, position played, partner and strength of team opposition can be taken into account and evaluated. The results from all competing clubs, available to the body organizing the competition, are used to rate all players in the competition, and validate the results that were obtained with limited information. We show a home ground advantage exists in non-elite doubles tennis. A simple exponential smoothing method of rating players is then tested, and shown to produce reasonable results. 相似文献

16.

Statistical process control for monitoring scheduling performance—addressing the problem of correlated data

B L MacCarthy T Wasusri 《The Journal of the Operational Research Society》2001,52(7):810-820

In both manufacturing and service operations effective scheduling plays an important role in achieving delivery performance and in utilizing resources economically. Classical scheduling theory takes a narrow, static view of performance. In reality the assessment of scheduling performance is a particularly difficult task. Typically scheduling is an activity that takes place repeatedly over time in the context of an overall planning and control architecture. Scheduling may be viewed as an activity within a process. Statistical Process Control (SPC) provides an attractive option for monitoring performance. In this paper we investigate the potential of applying SPC control charts in this context. The feasibility of monitoring flow time in a single processor model using control charts is studied using simulation. The application of control charts to monitor time-related measures in operational systems raises fundamental statistical problems. The need for approaches that are robust with respect to data correlation and lack of normality is shown to be an essential requirement. Residual-based approaches and the Exponentially Weighted Moving Average (EWMA) chart are shown to be reasonably effective in avoiding false alarms and in detecting process shifts. The applicability of the single processor model to more complex operational systems is discussed. The implications of the work for the design of performance monitoring and continuous improvement systems for time-related measures in manufacturing and service operations are considered. A number of areas are highlighted for further theoretical and practical studies. 相似文献

17.

Charting Northern Skies With the Aid of a Spreadsheet

Keith B. Lucas 《School science and mathematics》1994,94(3):151-157

An obstacle often encountered in preparing for an astronomical viewing evening is the difficulty of locating appropriate and easy-to-use star charts. This article describes, briefly, two types of star charts commonly available. Both are less than ideal for use by school students because of their complexity or lack of direct applicability to the observer's location. A strategy is outlined by which teachers can have students prepare, using a computer spreadsheet, their own easy-to-use star charts which are appropriate for the location, date, and time of viewing. 相似文献

18.

Analysis of gap acceptance in a complex traffic manoeuvre

P.A. Storr Dale F. Cooper M.R.C. McDowell 《European Journal of Operational Research》1980,5(2):94-101

相似文献

19.

On Exponential Representations of Log-Spacings of Extreme Order Statistics 总被引：5，自引：0，他引：5

J. Beirlant G. Dierckx A. Guillou C. Staăricaă 《Extremes》2002,5(2):157-180

In Beirlant et al. (1999) and Feuerverger and Hall (1999) an exponential regression model (ERM) was introduced on the basis of scaled log-spacings between subsequent extreme order statistics from a Pareto-type distribution. This lead to the construction of new bias-corrected estimators for the tail index. In this note, under quite general conditions, asymptotic justification for this regression model is given as well as for resulting tail index estimators. Also, we discuss diagnostic methods for adaptive selection of the threshold when using the Hill (1975) estimator which follow from the ERM approach. We show how the diagnostic presented in Guillou and Hall (2001) is linked to the ERM, while a new proposal is suggested. We also provide some small sample comparisons with other existing methods. 相似文献

20.

Bivariate Birnbaum–Saunders distribution and associated inference

Debasis Kundu N. Balakrishnan A. Jamalizadeh 《Journal of multivariate analysis》2010,101(1):113-125

Univariate Birnbaum–Saunders distribution has been used quite effectively to model positively skewed data, especially lifetime data and crack growth data. In this paper, we introduce bivariate Birnbaum–Saunders distribution which is an absolutely continuous distribution whose marginals are univariate Birnbaum–Saunders distributions. Different properties of this bivariate Birnbaum–Saunders distribution are then discussed. This new family has five unknown parameters and it is shown that the maximum likelihood estimators can be obtained by solving two non-linear equations. We also propose simple modified moment estimators for the unknown parameters which are explicit and can therefore be used effectively as an initial guess for the computation of the maximum likelihood estimators. We then present the asymptotic distributions of the maximum likelihood estimators and use them to construct confidence intervals for the parameters. We also discuss likelihood ratio tests for some hypotheses of interest. Monte Carlo simulations are then carried out to examine the performance of the proposed estimators. Finally, a numerical data analysis is performed in order to illustrate all the methods of inference discussed here. 相似文献