首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Having the ability to work with complex models can be highly beneficial. However, complex models often have intractable likelihoods, so methods that involve evaluation of the likelihood function are infeasible. In these situations, the benefits of working with likelihood-free methods become apparent. Likelihood-free methods, such as parametric Bayesian indirect likelihood that uses the likelihood of an alternative parametric auxiliary model, have been explored throughout the literature as a viable alternative when the model of interest is complex. One of these methods is called the synthetic likelihood (SL), which uses a multivariate normal approximation of the distribution of a set of summary statistics. This article explores the accuracy and computational efficiency of the Bayesian version of the synthetic likelihood (BSL) approach in comparison to a competitor known as approximate Bayesian computation (ABC) and its sensitivity to its tuning parameters and assumptions. We relate BSL to pseudo-marginal methods and propose to use an alternative SL that uses an unbiased estimator of the SL, when the summary statistics have a multivariate normal distribution. Several applications of varying complexity are considered to illustrate the findings of this article. Supplemental materials are available online. Computer code for implementing the methods on all examples is available at https://github.com/cdrovandi/Bayesian-Synthetic-Likelihood.  相似文献   

2.
An empirical Bayes method to select basis functions and knots in multivariate adaptive regression spline (MARS) is proposed, which takes both advantages of frequentist model selection approaches and Bayesian approaches. A penalized likelihood is maximized to estimate regression coefficients for selected basis functions, and an approximated marginal likelihood is maximized to select knots and variables involved in basis functions. Moreover, the Akaike Bayes information criterion (ABIC) is used to determine the number of basis functions. It is shown that the proposed method gives estimation of regression structure that is relatively parsimonious and more stable for some example data sets.  相似文献   

3.
We introduce a new technique to select the number of components of a mixture model with spatial dependence. The method consists of an estimation of the integrated completed likelihood based on a Laplace’s approximation and a new technique to deal with the normalizing constant intractability of the hidden Potts model. Our proposal is applied to a real satellite image. Supplementary materials are available online.  相似文献   

4.
Models with intractable likelihood functions arise in areas including network analysis and spatial statistics, especially those involving Gibbs random fields. Posterior parameter estimation in these settings is termed a doubly intractable problem because both the likelihood function and the posterior distribution are intractable. The comparison of Bayesian models is often based on the statistical evidence, the integral of the un-normalized posterior distribution over the model parameters which is rarely available in closed form. For doubly intractable models, estimating the evidence adds another layer of difficulty. Consequently, the selection of the model that best describes an observed network among a collection of exponential random graph models for network analysis is a daunting task. Pseudolikelihoods offer a tractable approximation to the likelihood but should be treated with caution because they can lead to an unreasonable inference. This article specifies a method to adjust pseudolikelihoods to obtain a reasonable, yet tractable, approximation to the likelihood. This allows implementation of widely used computational methods for evidence estimation and pursuit of Bayesian model selection of exponential random graph models for the analysis of social networks. Empirical comparisons to existing methods show that our procedure yields similar evidence estimates, but at a lower computational cost. Supplementary material for this article is available online.  相似文献   

5.
Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are available online.  相似文献   

6.
??When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

7.
Pair-copula Bayesian networks (PCBNs) are a novel class of multivariate statistical models, which combine the distributional flexibility of pair-copula constructions (PCCs) with the parsimony of conditional independence models associated with directed acyclic graphs (DAGs). We are first to provide generic algorithms for random sampling and likelihood inference in arbitrary PCBNs as well as for selecting orderings of the parents of the vertices in the underlying graphs. Model selection of the DAG is facilitated using a version of the well-known PC algorithm that is based on a novel test for conditional independence of random variables tailored to the PCC framework. A simulation study shows the PC algorithm’s high aptitude for structure estimation in non-Gaussian PCBNs. The proposed methods are finally applied to modeling financial return data. Supplementary materials for this article are available online.  相似文献   

8.
One of the main advantages of Bayesian approaches is that they offer principled methods of inference in models of varying dimensionality and of models of infinite dimensionality. What is less widely appreciated is how the model inference is sensitive to prior distributions and therefore how priors should be set for real problems. In this paper prior sensitivity is considered with respect to the problem of inference in Gaussian mixture models. Two distinct Bayesian approaches have been proposed. The first is to use Bayesian model selection based upon the marginal likelihood; the second is to use an infinite mixture model which ‘side steps’ model selection. Explanations for the prior sensitivity are given in order to give practitioners guidance in setting prior distributions. In particular the use of conditionally conjugate prior distributions instead of purely conjugate prior distributions are advocated as a method for investigating prior sensitivity of the mean and variance individually.  相似文献   

9.
We examine three Bayesian case influence measures including the φ-divergence, Cook’s posterior mode distance, and Cook’s posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology. Supplemental materials for the article are available online.  相似文献   

10.
11.
When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

12.
本文在多种复杂数据下, 研究一类半参数变系数部分线性模型的统计推断理论和方法. 首先在纵向数据和测量误差数据等复杂数据下, 研究半参数变系数部分线性模型的经验似然推断问题, 分别提出分组的和纠偏的经验似然方法. 该方法可以有效地处理纵向数据的组内相关性给构造经验似然比函数所带来的困难. 其次在测量误差数据和缺失数据等复杂数据下, 研究模型的变量选择问题, 分别提出一个“纠偏” 的和基于借补值的变量选择方法. 该变量选择方法可以同时选择参数分量及非参数分量中的重要变量, 并且变量选择与回归系数的估计同时进行. 通过选择适当的惩罚参数, 证明该变量选择方法可以相合地识别出真实模型, 并且所得的正则估计具有oracle 性质.  相似文献   

13.
Block clustering aims to reveal homogeneous block structures in a data table. Among the different approaches of block clustering, we consider here a model-based method: the Gaussian latent block model for continuous data which is an extension of the Gaussian mixture model for one-way clustering. For a given data table, several candidate models are usually examined, which differ for example in the number of clusters. Model selection then becomes a critical issue. To this end, we develop a criterion based on an approximation of the integrated classification likelihood for the Gaussian latent block model, and propose a Bayesian information criterion-like variant following the same pattern. We also propose a non-asymptotic exact criterion, thus circumventing the controversial definition of the asymptotic regime arising from the dual nature of the rows and columns in co-clustering. The experimental results show steady performances of these criteria for medium to large data tables.  相似文献   

14.
A computationally simple approach to inference in state space models is proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation of an intractable likelihood by matching summary statistics for the observed data with statistics computed from data simulated from the true process, based on parameter draws from the prior. Draws that produce a “match” between observed and simulated summaries are retained, and used to estimate the inaccessible posterior. With no reduction to a low-dimensional set ofsufficient statistics being possible in the state space setting, we define the summaries as the maximum of an auxiliary likelihood function, and thereby exploit the asymptotic sufficiency of this estimator for the auxiliary parameter vector. We derive conditions under which this approach—including a computationally efficient version based on the auxiliary score—achieves Bayesian consistency. To reduce the well-documented inaccuracy of ABC in multiparameter settings, we propose the separate treatment of each parameter dimension using an integrated likelihood technique. Three stochastic volatility models for which exact Bayesian inference is either computationally challenging, or infeasible, are used for illustration. We demonstrate that our approach compares favorably against an extensive set of approximate and exact comparators. An empirical illustration completes the article. Supplementary materials for this article are available online.  相似文献   

15.
The marginal likelihood of the data computed using Bayesian score metrics is at the core of score+search methods when learning Bayesian networks from data. However, common formulations of those Bayesian score metrics rely on free parameters which are hard to assess. Recent theoretical and experimental works have also shown that the commonly employed BDe score metric is strongly biased by the particular assignments of its free parameter known as the equivalent sample size. This sensitivity means that poor choices of this parameter lead to inferred BN models whose structure and parameters do not properly represent the distribution generating the data even for large sample sizes. In this paper we argue that the problem is that the BDe metric is based on assumptions about the BN model parameters distribution assumed to generate the data which are too strict and do not hold in real settings. To overcome this issue we introduce here an approach that tries to marginalize the meta-parameter locally, aiming to embrace a wider set of assumptions about these parameters. It is shown experimentally that this approach offers a robust performance, as good as that of the standard BDe metric with an optimum selection of its free parameter and, in consequence, this method prevents the choice of wrong settings for this widely applied Bayesian score metric.  相似文献   

16.
在多元非参数模型中带宽和阶的选择对局部多项式估计量的表现十分重要。本文基于交叉验证准则提出一个自适应贝叶斯带宽选择方法。在给定的误差密度函数下,该方法可推导出对应的似然函数,并构造带宽参数的后验密度函数。随后,通过带宽的后验期望可同时获得阶和带宽的估计。数值模拟的结果表明,该方法不仅比大拇指准则方法精确,且比交叉验证方法耗时更少。与此同时,与Nadaraya-Watson估计相比,所提带宽选择方法对多元非参数模型的适应性要更好。最后,本文通过一组实际数据说明有限样本下所提贝叶斯带宽选择的表现很好。  相似文献   

17.
Pawlak’s attribute dependency degree model is applicable to feature selection in pattern recognition. However, the dependency degrees given by the model are often inadequately computed as a result of the indiscernibility relation. This paper discusses an improvement to Pawlak’s model and presents a new attribute dependency function. The proposed model is based on decision-relative discernibility matrices and measures how many times condition attributes are used to determine the decision value by referring to the matrix. The proposed dependency degree is computed by considering the two cases that two decision values are equal or unequal. A feature of the proposed model is that attribute dependency degrees have significant properties related to those of Armstrong’s axioms. An advantage of the proposed model is that data efficiency is considered in the computation of dependency degrees. It is shown through examples that the proposed model is able to compute dependency degrees more strictly than Pawlak’s model.  相似文献   

18.
We study the law of the iterated logarithm (LIL) for the maximum likelihood estimation of the parameters (as a convex optimization problem) in the generalized linear models with independent or weakly dependent (ρ-mixing) responses under mild conditions. The LIL is useful to derive the asymptotic bounds for the discrepancy between the empirical process of the log-likelihood function and the true log-likelihood. The strong consistency of some penalized likelihood-based model selection criteria can be shown as an application of the LIL. Under some regularity conditions, the model selection criterion will be helpful to select the simplest correct model almost surely when the penalty term increases with the model dimension, and the penalty term has an order higher than O(log log n) but lower than O(n): Simulation studies are implemented to verify the selection consistency of Bayesian information criterion.  相似文献   

19.
In this article, we introduce the Bayesian change point and variable selection algorithm that uses dynamic programming recursions to draw direct samples from a very high-dimensional space in a computationally efficient manner, and apply this algorithm to a geoscience problem that concerns the Earth's history of glaciation. Strong evidence exists for at least two changes in the behavior of the Earth's glaciers over the last five million years. Around 2.7 Ma, the extent of glacial cover on the Earth increased, but the frequency of glacial melting events remained constant at 41 kyr. A more dramatic change occurred around 1 Ma. For over three decades, the “Mid-Pleistocene Transition” has been described in the geoscience literature not only by a further increase in the magnitude of glacial cover, but also as the dividing point between the 41 kyr and the 100 kyr glacial worlds. Given such striking changes in the glacial record, it is clear that a model whose parameters can change through time is essential for the analysis of these data. The Bayesian change point algorithm provides a probabilistic solution to a data segmentation problem, while the exact Bayesian inference in regression procedure performs variable selection within each regime delineated by the change points. Together, they can model a time series in which the predictor variables as well as the parameters of the model are allowed to change with time. Our algorithm allows one to simultaneously perform variable selection and change point analysis in a computationally efficient manner. Supplementary materials including MATLAB code for the Bayesian change point and variable selection algorithm and the datasets described in this article are available online or by contacting the first author.  相似文献   

20.
Bayesian approaches to prediction and the assessment of predictive uncertainty in generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty. When there are linear dependencies among potential predictor variables in a generalized linear model, existing Markov chain Monte Carlo algorithms for sampling from the posterior distribution on the model and parameter space in Bayesian variable selection problems may not work well. This article describes a sampling algorithm based on the Swendsen-Wang algorithm for the Ising model, and which works well when the predictors are far from orthogonality. In problems of variable selection for generalized linear models we can index different models by a binary parameter vector, where each binary variable indicates whether or not a given predictor variable is included in the model. The posterior distribution on the model is a distribution on this collection of binary strings, and by thinking of this posterior distribution as a binary spatial field we apply a sampling scheme inspired by the Swendsen-Wang algorithm for the Ising model in order to sample from the model posterior distribution. The algorithm we describe extends a similar algorithm for variable selection problems in linear models. The benefits of the algorithm are demonstrated for both real and simulated data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号