首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
The article develops a hybrid variational Bayes (VB) algorithm that combines the mean-field and stochastic linear regression fixed-form VB methods. The new estimation algorithm can be used to approximate any posterior without relying on conjugate priors. We propose a divide and recombine strategy for the analysis of large datasets, which partitions a large dataset into smaller subsets and then combines the variational distributions that have been learned in parallel on each separate subset using the hybrid VB algorithm. We also describe an efficient model selection strategy using cross-validation, which is straightforward to implement as a by-product of the parallel run. The proposed method is applied to fitting generalized linear mixed models. The computational efficiency of the parallel and hybrid VB algorithm is demonstrated on several simulated and real datasets. Supplementary material for this article is available online.  相似文献   

2.
We consider a problem of estimating local smoothness of a spatially inhomogeneous function from noisy data under the framework of smoothing splines. Most existing studies related to this problem deal with estimation induced by a single smoothing parameter or partially local smoothing parameters, which may not be efficient to characterize various degrees of smoothness of the underlying function when it is spatially varying. In this paper, we propose a new nonparametric method to estimate local smoothness of the function based on a moving local risk minimization coupled with spatially adaptive smoothing splines. The proposed method provides full information of the local smoothness at every location on the entire data domain, so that it is able to understand the degrees of spatial inhomogeneity of the function. A successful estimate of the local smoothness is useful for identifying abrupt changes of smoothness of the data, performing functional clustering and improving the uniformity of coverage of the confidence intervals of smoothing splines. We further consider a nontrivial extension of the local smoothness of inhomogeneous two-dimensional functions or spatial fields. Empirical performance of the proposed method is evaluated through numerical examples, which demonstrates promising results of the proposed method.  相似文献   

3.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

4.
We consider local smoothing of datasets where the design space is the d-dimensional (d≥1) torus and the response variable is real-valued. Our purpose is to extend least squares local polynomial fitting to this situation. We give both theoretical and empirical results.  相似文献   

5.
This article uses projection depth (PD) for robust classification of multivariate data. Here we consider two types of classifiers, namely, the maximum depth classifier and the modified depth-based classifier. The latter involves kernel density estimation, where one needs to choose the associated scale of smoothing. We consider both the single scale and the multi-scale versions of kernel density estimation, and investigate the large sample properties of the resulting classifiers under appropriate regularity conditions. Some simulated and real data sets are analyzed to evaluate the finite sample performance of these classification tools.  相似文献   

6.
A current challenge for many Bayesian analyses is determining when to terminate high-dimensional Markov chain Monte Carlo simulations. To this end, we propose using an automated sequential stopping procedure that terminates the simulation when the computational uncertainty is small relative to the posterior uncertainty. Further, we show this stopping rule is equivalent to stopping when the effective sample size is sufficiently large. Such a stopping rule has previously been shown to work well in settings with posteriors of moderate dimension. In this article, we illustrate its utility in high-dimensional simulations while overcoming some current computational issues. As examples, we consider two complex Bayesian analyses on spatially and temporally correlated datasets. The first involves a dynamic space-time model on weather station data and the second a spatial variable selection model on fMRI brain imaging data. Our results show the sequential stopping rule is easy to implement, provides uncertainty estimates, and performs well in high-dimensional settings. Supplementary materials for this article are available online.  相似文献   

7.
In this paper, the rotated cone fitting problem is considered. In case the measured data are generally accurate and it is needed to fit the surface within expected error bound, it is more appropriate to use l∞ norm than 12 norm. l∞ fitting rotated cones need to minimize, under some bound constraints, the maximum function of some nonsmooth functions involving both absolute value and square root functions. Although this is a low dimensional problem, in some practical application, it is needed to fitting large amount of cones repeatedly, moreover, when large amount of measured data are to be fitted to one rotated cone, the number of components in the maximum function is large. So it is necessary to develop efficient solution methods. To solve such optimization problems efficiently, a truncated smoothing Newton method is presented. At first, combining aggregate smoothing technique to the maximum function as well as the absolute value function and a smoothing function to the square root function, a monotonic and uniform smooth approximation to the objective function is constructed. Using the smooth approximation, a smoothing Newton method can be used to solve the problem. Then, to reduce the computation cost, a truncated aggregate smoothing technique is applied to give the truncated smoothing Newton method, such that only a small subset of component functions are aggregated in each iteration point and hence the computation cost is considerably reduced.  相似文献   

8.
In this paper, the rotated cone fitting problem is considered. In case the measured data are generally accurate and it is needed to fit the surface within expected error bound, it is more appropriate to use l∞ norm than l2 norm. l∞ fitting rotated cones need to minimize, under some bound constraints, the maximum function of some nonsmooth functions involving both absolute value and square root functions. Although this is a low dimensional problem, in some practical application, it is needed to fitting large...  相似文献   

9.
Multi-label classification assigns more than one label for each instance; when the labels are ordered in a predefined structure, the task is called Hierarchical Multi-label Classification (HMC). In HMC there are global and local approaches. Global approaches treat the problem as a whole but tend to explode with large datasets. Local approaches divide the problem into local subproblems, but usually do not exploit the information of the hierarchy. This paper addresses the problem of HMC for both tree and Direct Acyclic Graph (DAG) structures whose labels do not necessarily reach a leaf node. A local classifier per parent node is trained incorporating the prediction of the parent(s) node(s) as an additional attribute to include the relations between classes. In the classification phase, the branches with low probability to occur are pruned, performing non-mandatory leaf node prediction. Our method evaluates each possible path from the root of the hierarchy, taking into account the prediction value and the level of the nodes; selecting the path (or paths in the case of DAGs) with the highest score. We tested our method with 20 datasets with tree and DAG structured hierarchies against a number of state-of-the-art methods. Our method proved to obtain superior results when dealing with deep and populated hierarchies.  相似文献   

10.
This paper presents a simulation-based framework for sequential inference from partially and discretely observed point process models with static parameters. Taking on a Bayesian perspective for the static parameters, we build upon sequential Monte Carlo methods, investigating the problems of performing sequential filtering and smoothing in complex examples, where current methods often fail. We consider various approaches for approximating posterior distributions using SMC. Our approaches, with some theoretical discussion are illustrated on a doubly stochastic point process applied in the context of finance.  相似文献   

11.
This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components. We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplemental materials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context.  相似文献   

12.
Estimating equation approaches have been widely used in statistics inference. Important examples of estimating equations are the likelihood equations. Since its introduction by Sir R. A. Fisher almost a century ago, maximum likelihood estimation (MLE) is still the most popular estimation method used for fitting probability distribution to data, including fitting lifetime distributions with censored data. However, MLE may produce substantial bias and even fail to obtain valid confidence intervals when data size is not large enough or there is censoring data. In this paper, based on nonlinear combinations of order statistics, we propose new estimation equation approaches for a class of probability distributions, which are particularly effective for skewed distributions with small sample sizes and censored data. The proposed approaches may possess a number of attractive properties such as consistency, sufficiency and uniqueness. Asymptotic normality of these new estimators is derived. The construction of new estimation equations and their numerical performance under different censored schemes are detailed via Weibull distribution and generalized exponential distribution.  相似文献   

13.
Summary A method is described for fitting cubic smoothing splines to samples of equally spaced data. The method is based on the canonical decomposition of the linear transformation from the data to the fitted values. Techniques for estimating the required amount of smoothing, including generalized cross validation, may easily be integrated into the calculations. For large samples the method is fast and does not require prohibitively large data storage.  相似文献   

14.
In statistical and biometric sciences, one often uses predictive linear models. The initial form of such models is usually obtained by fitting the coefficients of the model to a set of observed data according to the classical least squares method. Newborn models that are obtained in this way will be referred to as raw models. Such raw models are often subject of efforts to improve them as to their predictive performance on external datasets. Several methods can be followed to fine‐tune raw models, thus leading to a variety of model building strategies. In this paper, the idea of so‐called victory rates is introduced to compare the performance of building strategies mutually.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
We study conditions for the existence of a solution of a periodic problem for a model nonlinear equation in the spatially multidimensional case and consider various types of large time asymptotics (exponential and oscillating) for such solutions. The generalized Kolmogorov-Petrovskii-Piskunov equation, the nonlinear Schrödinger equation, and some other partial differential equations are special cases of this equation. We analyze the solution smoothing phenomenon under certain conditions on the linear part of the equation and study the case of nonsmall initial data for a nonlinearity of special form. The leading asymptotic term is presented, and the remainder in the asymptotics of the solution is estimated in a spatially uniform metric.  相似文献   

16.
Li Dong  Guohui Zhao 《Optimization》2016,65(4):729-749
Homotopy methods are globally convergent under weak conditions and robust; however, the efficiency of a homotopy method is closely related with the construction of the homotopy map and the path tracing algorithm. Different homotopies may behave very different in performance even though they are all theoretically convergent. In this paper, a spline smoothing homotopy method for nonconvex nonlinear programming is developed using cubic spline to smooth the max function of the constraints of nonlinear programming. Some properties of spline smoothing function are discussed and the global convergence of spline smoothing homotopy under the weak normal cone condition is proven. The spline smoothing technique uses a smooth constraint instead of m constraints and acts also as an active set technique. So the spline smoothing homotopy method is more efficient than previous homotopy methods like combined homotopy interior point method, aggregate constraint homotopy method and other probability one homotopy methods. Numerical tests with the comparisons to some other methods show that the new method is very efficient for nonlinear programming with large number of complicated constraints.  相似文献   

17.
This article presents a likelihood-based boosting approach for fitting binary and ordinal mixed models. In contrast to common procedures, this approach can be used in high-dimensional settings where a large number of potentially influential explanatory variables are available. Constructed as a componentwise boosting method, it is able to perform variable selection with the complexity of the resulting estimator being determined by information criteria. The method is investigated in simulation studies both for cumulative and sequential models and is illustrated by using real datasets. The supplementary materials for the article are available online.  相似文献   

18.
Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM’s conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online.  相似文献   

19.
We present a comparison of different multigrid approaches for the solution of systems arising from high‐order continuous finite element discretizations of elliptic partial differential equations on complex geometries. We consider the pointwise Jacobi, the Chebyshev‐accelerated Jacobi, and the symmetric successive over‐relaxation smoothers, as well as elementwise block Jacobi smoothing. Three approaches for the multigrid hierarchy are compared: (1) high‐order h‐multigrid, which uses high‐order interpolation and restriction between geometrically coarsened meshes; (2) p‐multigrid, in which the polynomial order is reduced while the mesh remains unchanged, and the interpolation and restriction incorporate the different‐order basis functions; and (3) a first‐order approximation multigrid preconditioner constructed using the nodes of the high‐order discretization. This latter approach is often combined with algebraic multigrid for the low‐order operator and is attractive for high‐order discretizations on unstructured meshes, where geometric coarsening is difficult. Based on a simple performance model, we compare the computational cost of the different approaches. Using scalar test problems in two and three dimensions with constant and varying coefficients, we compare the performance of the different multigrid approaches for polynomial orders up to 16. Overall, both h‐multigrid and p‐multigrid work well; the first‐order approximation is less efficient. For constant coefficients, all smoothers work well. For variable coefficients, Chebyshev and symmetric successive over‐relaxation smoothing outperform Jacobi smoothing. While all of the tested methods converge in a mesh‐independent number of iterations, none of them behaves completely independent of the polynomial order. When multigrid is used as a preconditioner in a Krylov method, the iteration number decreases significantly compared with using multigrid as a solver. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
Progress in information technologies has enabled to apply computer-intensive methods to statistical analysis. In time series modeling, sequential Monte Carlo method was developed for general nonlinear non-Gaussian state-space models and it enables to consider very complex nonlinear non-Gaussian models for real-world problems. In this paper, we consider several computational problems associated with sequential Monte Carlo filter and smoother, such as the use of a huge number of particles, two-filter formula for smoothing, and parallel computation. The posterior mean smoother and the Gaussian-sum smoother are also considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号