首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Changepoint models are widely used to model the heterogeneity of sequential data. We present a novel sequential Monte Carlo (SMC) online expectation–maximization (EM) algorithm for estimating the static parameters of such models. The SMC online EM algorithm has a cost per time which is linear in the number of particles and could be particularly important when the data is representable as a long sequence of observations, since it drastically reduces the computational requirements for implementation. We present an asymptotic analysis for the stability of the SMC estimates used in the online EM algorithm and demonstrate the performance of this scheme by using both simulated and real data originating from DNA analysis. The supplementary materials for the article are available online.  相似文献   

2.
Ordinary differential equations (ODEs) are equalities involving a function and its derivatives that define the evolution of the function over a prespecified domain. The applications of ODEs range from simulation and prediction to control and diagnosis in diverse fields such as engineering, physics, medicine, and finance. Parameter estimation is often required to calibrate these theoretical models to data. While there are many methods for estimating ODE parameters from partially observed data, they are invariably subject to several problems including high computational cost, complex estimation procedures, biased estimates, and large sampling variance. We propose a method that overcomes these issues and produces estimates of the ODE parameters that have less bias, a smaller sampling variance, and a 10-fold improvement in computational efficiency. The package GenPen containing the Matlab code to perform the methods described in this article is available online.  相似文献   

3.
In this article, we propose an improvement on the sequential updating and greedy search (SUGS) algorithm for fast fitting of Dirichlet process mixture models. The SUGS algorithm provides a means for very fast approximate Bayesian inference for mixture data which is particularly of use when datasets are so large that many standard Markov chain Monte Carlo (MCMC) algorithms cannot be applied efficiently, or take a prohibitively long time to converge. In particular, these ideas are used to initially interrogate the data, and to refine models such that one can potentially apply exact data analysis later on. SUGS relies upon sequentially allocating data to clusters and proceeding with an update of the posterior on the subsequent allocations and parameters which assumes this allocation is correct. Our modification softens this approach, by providing a probability distribution over allocations, with a similar computational cost; this approach has an interpretation as a variational Bayes procedure and hence we term it variational SUGS (VSUGS). It is shown in simulated examples that VSUGS can outperform, in terms of density estimation and classification, a version of the SUGS algorithm in many scenarios. In addition, we present a data analysis for flow cytometry data, and SNP data via a three-class Dirichlet process mixture model, illustrating the apparent improvement over the original SUGS algorithm.  相似文献   

4.
We introduce a class of spatiotemporal models for Gaussian areal data. These models assume a latent random field process that evolves through time with random field convolutions; the convolving fields follow proper Gaussian Markov random field (PGMRF) processes. At each time, the latent random field process is linearly related to observations through an observational equation with errors that also follow a PGMRF. The use of PGMRF errors brings modeling and computational advantages. With respect to modeling, it allows more flexible model structures such as different but interacting temporal trends for each region, as well as distinct temporal gradients for each region. Computationally, building upon the fact that PGMRF errors have proper density functions, we have developed an efficient Bayesian estimation procedure based on Markov chain Monte Carlo with an embedded forward information filter backward sampler (FIFBS) algorithm. We show that, when compared with the traditional one-at-a-time Gibbs sampler, our novel FIFBS-based algorithm explores the posterior distribution much more efficiently. Finally, we have developed a simulation-based conditional Bayes factor suitable for the comparison of nonnested spatiotemporal models. An analysis of the number of homicides in Rio de Janeiro State illustrates the power of the proposed spatiotemporal framework.

Supplemental materials for this article are available online in the journal’s webpage.  相似文献   

5.
Stochastic epidemic models describe the dynamics of an epidemic as a disease spreads through a population. Typically, only a fraction of cases are observed at a set of discrete times. The absence of complete information about the time evolution of an epidemic gives rise to a complicated latent variable problem in which the state space size of the epidemic grows large as the population size increases. This makes analytically integrating over the missing data infeasible for populations of even moderate size. We present a data augmentation Markov chain Monte Carlo (MCMC) framework for Bayesian estimation of stochastic epidemic model parameters, in which measurements are augmented with subject-level disease histories. In our MCMC algorithm, we propose each new subject-level path, conditional on the data, using a time-inhomogenous continuous-time Markov process with rates determined by the infection histories of other individuals. The method is general, and may be applied to a broad class of epidemic models with only minimal modifications to the model dynamics and/or emission distribution. We present our algorithm in the context of multiple stochastic epidemic models in which the data are binomially sampled prevalence counts, and apply our method to data from an outbreak of influenza in a British boarding school. Supplementary material for this article is available online.  相似文献   

6.
The Gaussian geostatistical model has been widely used for modeling spatial data. However, this model suffers from a severe difficulty in computation: it requires users to invert a large covariance matrix. This is infeasible when the number of observations is large. In this article, we propose an auxiliary lattice-based approach for tackling this difficulty. By introducing an auxiliary lattice to the space of observations and defining a Gaussian Markov random field on the auxiliary lattice, our model completely avoids the requirement of matrix inversion. It is remarkable that the computational complexity of our method is only O(n), where n is the number of observations. Hence, our method can be applied to very large datasets with reasonable computational (CPU) times. The numerical results indicate that our model can approximate Gaussian random fields very well in terms of predictions, even for those with long correlation lengths. For real data examples, our model can generally outperform conventional Gaussian random field models in both prediction errors and CPU times. Supplemental materials for the article are available online.  相似文献   

7.
The article develops a hybrid variational Bayes (VB) algorithm that combines the mean-field and stochastic linear regression fixed-form VB methods. The new estimation algorithm can be used to approximate any posterior without relying on conjugate priors. We propose a divide and recombine strategy for the analysis of large datasets, which partitions a large dataset into smaller subsets and then combines the variational distributions that have been learned in parallel on each separate subset using the hybrid VB algorithm. We also describe an efficient model selection strategy using cross-validation, which is straightforward to implement as a by-product of the parallel run. The proposed method is applied to fitting generalized linear mixed models. The computational efficiency of the parallel and hybrid VB algorithm is demonstrated on several simulated and real datasets. Supplementary material for this article is available online.  相似文献   

8.
We develop algorithms for performing semiparametric regression analysis in real time, with data processed as it is collected and made immediately available via modern telecommunications technologies. Our definition of semiparametric regression is quite broad and includes, as special cases, generalized linear mixed models, generalized additive models, geostatistical models, wavelet nonparametric regression models and their various combinations. Fast updating of regression fits is achieved by couching semiparametric regression into a Bayesian hierarchical model or, equivalently, graphical model framework and employing online mean field variational ideas. An Internet site attached to this article, realtime-semiparametric-regression.net, illustrates the methodology for continually arriving stock market, real estate, and airline data. Flexible real-time analyses based on increasingly ubiquitous streaming data sources stand to benefit. This article has online supplementary material.  相似文献   

9.
Three-dimensional data arrays (collections of individual data matrices) are increasingly prevalent in modern data and pose unique challenges to pattern extraction and visualization. This article introduces a biclustering technique for exploration and pattern detection in such complex structured data. The proposed framework couples the popular plaid model together with tools from functional data analysis to guide the estimation of bicluster responses over the array. We present an efficient algorithm that first detects biclusters that exhibit strong deviations for some data matrices, and then estimates their responses over the entire data array. Altogether, the framework is useful to home in on and display underlying structure and its evolution over conditions/time. The methods are scalable to large datasets, and can accommodate a variety of dynamic patterns. The proposed techniques are illustrated on gene expression data and bilateral trade networks. Supplementary materials are available online.  相似文献   

10.
We extend the definition of functional data registration to encompass a larger class of registration models. In contrast to traditional registration models, we allow for registered functions that have more than one primary direction of variation. The proposed Bayesian hierarchical model simultaneously registers the observed functions and estimates the two primary factors that characterize variation in the registered functions. Each registered function is assumed to be predominantly composed of a linear combination of these two primary factors, and the function-specific weights for each observation are estimated within the registration model. We show how these estimated weights can easily be used to classify functions after registration using both simulated data and a juggling dataset. Supplementary materials for this article are available online.  相似文献   

11.
Online auctions have been the subject of many empirical research efforts in the fields of economics and information systems. These research efforts are often based on analyzing data from Web sites such as eBay.com which provide public information about sequences of bids in closed auctions, typically in the form of tables on HTML pages. The existing literature on online auctions focuses on tools like summary statistics and more formal statistical methods such as regression models. However, there is a clear void in this growing body of literature in developing appropriate visualization tools. This is quite surprising, given that the sheer amount of data that can be found on sites such as eBay.com is overwhelming and can often not be displayed informatively using standard statistical graphics. In this article we introduce graphical methods for visualizing online auction data in ways that are informative and relevant to the types of research questions that are of interest. We start by using profile plots that reveal aspects of an auction such as bid values, bidding intensity, and bidder strategies. We then introduce the concept of statistical zooming (STAT-zoom) which can scale up to be used for visualizing large amounts of auctions. STAT-zoom adds the capability of looking at data summaries at various time scales interactively. Finally, we develop auction calendars and auction scene visualizations for viewing a set of many concurrent auctions. The different visualization methods are demonstrated using data on multiple auctions collected from eBay.com.  相似文献   

12.
Gaussian process models have been widely used in spatial statistics but face tremendous modeling and computational challenges for very large nonstationary spatial datasets. To address these challenges, we develop a Bayesian modeling approach using a nonstationary covariance function constructed based on adaptively selected partitions. The partitioned nonstationary class allows one to knit together local covariance parameters into a valid global nonstationary covariance for prediction, where the local covariance parameters are allowed to be estimated within each partition to reduce computational cost. To further facilitate the computations in local covariance estimation and global prediction, we use the full-scale covariance approximation (FSA) approach for the Bayesian inference of our model. One of our contributions is to model the partitions stochastically by embedding a modified treed partitioning process into the hierarchical models that leads to automated partitioning and substantial computational benefits. We illustrate the utility of our method with simulation studies and the global Total Ozone Matrix Spectrometer (TOMS) data. Supplementary materials for this article are available online.  相似文献   

13.
Variational approximations have the potential to scale Bayesian computations to large datasets and highly parameterized models. Gaussian approximations are popular, but can be computationally burdensome when an unrestricted covariance matrix is employed and the dimension of the model parameter is high. To circumvent this problem, we consider a factor covariance structure as a parsimonious representation. General stochastic gradient ascent methods are described for efficient implementation, with gradient estimates obtained using the so-called “reparameterization trick.” The end result is a flexible and efficient approach to high-dimensional Gaussian variational approximation. We illustrate using robust P-spline regression and logistic regression models. For the latter, we consider eight real datasets, including datasets with many more covariates than observations, and another with mixed effects. In all cases, our variational method provides fast and accurate estimates. Supplementary material for this article is available online.  相似文献   

14.
Continuous threshold regression is a common type of nonlinear regression that is attractive to many practitioners for its easy interpretability. More widespread adoption of threshold regression faces two challenges: (i) the computational complexity of fitting threshold regression models and (ii) obtaining correct coverage of confidence intervals under model misspecification. Both challenges result from the nonsmooth and nonconvex nature of the threshold regression model likelihood function. In this article we first show that these two issues together make the ideal approach for making model-robust inference in continuous threshold linear regression an impractical one. The need for a faster way of fitting continuous threshold linear models motivated us to develop a fast grid search method. The new method, based on the simple yet powerful dynamic programming principle, improves the performance by several orders of magnitude. Supplementary materials for this article are available online.  相似文献   

15.
Spatial climate data are often presented as summaries of areal regions such as grid cells, either because they are the output of numerical climate models or to facilitate comparison with numerical climate model output. Extreme value analysis can benefit greatly from spatial methods that borrow information across regions. For Gaussian outcomes, a host of methods that respect the areal nature of the data are available, including conditional and simultaneous autoregressive models. However, to our knowledge, there is no such method in the spatial extreme value analysis literature. In this article, we propose a new method for areal extremes that accounts for spatial dependence using latent clustering of neighboring regions. We show that the proposed model has desirable asymptotic dependence properties and leads to relatively simple computation. Applying the proposed method to North American climate data reveals several local and continental-scale changes in the distribution of precipitation and temperature extremes over time. Supplementary material for this article is available online.  相似文献   

16.
The challenges of understanding the impacts of air pollution require detailed information on the state of air quality. While many modeling approaches attempt to treat this problem, physically-based deterministic methods are often overlooked due to their costly computational requirements and complicated implementation. In this work we extend a non-intrusive Reduced Basis Data Assimilation method (known as PBDW state estimation) to large pollutant dispersion case studies relying on equations involved in chemical transport models for air quality modeling. This, with the goal of rendering methods based on parameterized partial differential equations (PDE) feasible in air quality modeling applications requiring quasi-real-time approximation and correction of model error in imperfect models. Reduced basis methods (RBM) aim to compute a cheap and accurate approximation of a physical state using approximation spaces made of a suitable sample of solutions to the model. One of the keys of these techniques is the decomposition of the computational work into an expensive one-time offline stage and a low-cost parameter-dependent online stage. Traditional RBMs require modifying the assembly routines of the computational code, an intrusive procedure which may be impossible in cases of operational model codes. We propose a less intrusive reduced order method using data assimilation for measured pollution concentrations, adapted for consideration of the scale and specific application to exterior pollutant dispersion as can be found in urban air quality studies. Common statistical techniques of data assimilation in use in these applications require large historical data sets, or time-consuming iterative methods. The method proposed here avoids both disadvantages. In the case studies presented in this work, the method allows to correct for unmodeled physics and treat cases of unknown parameter values, all while significantly reducing online computational time.  相似文献   

17.
Gibbs random fields play an important role in statistics. However, they are complicated to work with due to an intractability of the likelihood function and there has been much work devoted to finding computational algorithms to allow Bayesian inference to be conducted for such so-called doubly intractable distributions. This article extends this work and addresses the issue of estimating the evidence and Bayes factor for such models. The approach that we develop is shown to yield good performance. Supplementary materials for this article are available online.  相似文献   

18.
We propose an algorithm, semismooth Newton coordinate descent (SNCD), for the elastic-net penalized Huber loss regression and quantile regression in high dimensional settings. Unlike existing coordinate descent type algorithms, the SNCD updates a regression coefficient and its corresponding subgradient simultaneously in each iteration. It combines the strengths of the coordinate descent and the semismooth Newton algorithm, and effectively solves the computational challenges posed by dimensionality and nonsmoothness. We establish the convergence properties of the algorithm. In addition, we present an adaptive version of the “strong rule” for screening predictors to gain extra efficiency. Through numerical experiments, we demonstrate that the proposed algorithm is very efficient and scalable to ultrahigh dimensions. We illustrate the application via a real data example. Supplementary materials for this article are available online.  相似文献   

19.
Models with intractable likelihood functions arise in areas including network analysis and spatial statistics, especially those involving Gibbs random fields. Posterior parameter estimation in these settings is termed a doubly intractable problem because both the likelihood function and the posterior distribution are intractable. The comparison of Bayesian models is often based on the statistical evidence, the integral of the un-normalized posterior distribution over the model parameters which is rarely available in closed form. For doubly intractable models, estimating the evidence adds another layer of difficulty. Consequently, the selection of the model that best describes an observed network among a collection of exponential random graph models for network analysis is a daunting task. Pseudolikelihoods offer a tractable approximation to the likelihood but should be treated with caution because they can lead to an unreasonable inference. This article specifies a method to adjust pseudolikelihoods to obtain a reasonable, yet tractable, approximation to the likelihood. This allows implementation of widely used computational methods for evidence estimation and pursuit of Bayesian model selection of exponential random graph models for the analysis of social networks. Empirical comparisons to existing methods show that our procedure yields similar evidence estimates, but at a lower computational cost. Supplementary material for this article is available online.  相似文献   

20.
Statistical analysis of large datasets offers new opportunities to better understand underlying processes. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources. As a consequence, datasets often contain mixed data, that is, both quantitative and qualitative, and many missing values. Furthermore, aggregated data present a natural multilevel structure, where individuals or samples are nested within different sites, such as countries or hospitals. Imputation of multilevel data has therefore drawn some attention recently, but current solutions are not designed to handle mixed data, and suffer from important drawbacks, such as their computational cost. In this article, we propose a single imputation method for multilevel data, which can be used to complete either quantitative, categorical, or mixed data. The method is based on multilevel singular value decomposition (SVD), which consists in decomposing the variability of the data into two components, the between and within groups variability, and performing an SVD on both parts. We show on a simulation study that in comparison to competitors, the method has the advantages of handling datasets of various size, and being computationally faster. Furthermore, it is the first so far to handle mixed data. We apply the method to impute a medical dataset resulting from the aggregation of several hospitals datasets. This application falls in the framework of a larger project on Trauma patients. To overcome obstacles associated to the aggregation of medical data, we turn to distributed computation. The method is implemented in the R package missMDA. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号