首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bayesian predictive densities for the 2-dimensional Wishart model are investigated. The performance of predictive densities is evaluated by using the Kullback–Leibler divergence. It is proved that a Bayesian predictive density based on a prior exactly dominates that based on the Jeffreys prior if the prior density satisfies some geometric conditions. An orthogonally invariant prior is introduced and it is shown that the Bayesian predictive density based on the prior is minimax and dominates that based on the right invariant prior with respect to the triangular group.  相似文献   

2.
This work presents a new scheme to obtain the prior distribution parameters in the framework of Rufo et al. (Comput Stat 21:621–637, 2006). Firstly, an analytical expression of the proposed Kullback–Leibler divergence is derived for each distribution in the considered family. Therefore, no previous simulation technique is needed to estimate integrals and thus, the error related to this procedure is avoided. Secondly, a global optimization algorithm based on interval arithmetic is applied to obtain the prior parameters from the derived expression. The main advantage by using this approach is that all solutions are found and rightly bounded. Finally, an application comparing this strategy with the previous one illustrates the proposal.  相似文献   

3.
The EM algorithm is a sophisticated method for estimating statistical models with hidden variables based on the Kullback–Leibler divergence. A natural extension of the Kullback–Leibler divergence is given by a class of Bregman divergences, which in general enjoy robustness to contamination data in statistical inference. In this paper, a modification of the EM algorithm based on the Bregman divergence is proposed for estimating finite mixture models. The proposed algorithm is geometrically interpreted as a sequence of projections induced from the Bregman divergence. Since a rigorous algorithm includes a nonlinear optimization procedure, two simplification methods for reducing computational difficulty are also discussed from a geometrical viewpoint. Numerical experiments on a toy problem are carried out to confirm appropriateness of the simplifications.  相似文献   

4.
This paper uses a decision theoretic approach for updating a probability measure representing beliefs about an unknown parameter. A cumulative loss function is considered, which is the sum of two terms: one depends on the prior belief and the other one on further information obtained about the parameter. Such information is thus converted to a probability measure and the key to this process is shown to be the Kullback–Leibler divergence. The Bayesian approach can be derived as a natural special case. Some illustrations are presented.  相似文献   

5.
Estimators based on the mode are introduced and shown empirically to have smaller Kullback–Leibler risk than the maximum likelihood estimator. For one of these, the midpoint modal estimator (MME), we prove the Kullback–Leibler risk is below \frac12{\frac{1}{2}} while for the MLE the risk is above \frac12{\frac{1}{2}} for a wide range of success probabilities that approaches the unit interval as the sample size grows to infinity. The MME is related to the mean of Fisher’s Fiducial estimator and to the rule of succession for Jefferey’s noninformative prior.  相似文献   

6.
This paper addresses the problem of estimating the density of a future outcome from a multivariate normal model. We propose a class of empirical Bayes predictive densities and evaluate their performances under the Kullback–Leibler (KL) divergence. We show that these empirical Bayes predictive densities dominate the Bayesian predictive density under the uniform prior and thus are minimax under some general conditions. We also establish the asymptotic optimality of these empirical Bayes predictive densities in infinite-dimensional parameter spaces through an oracle inequality.  相似文献   

7.
The problem of selecting between semi-parametric and proportional hazards models is considered. We propose to make this choice based on the expectation of the log-likelihood (ELL) which can be estimated by the likelihood cross-validation (LCV) criterion. The criterion is used to choose an estimator in families of semi-parametric estimators defined by the penalized likelihood. A simulation study shows that the ELL criterion performs nearly as well in this problem as the optimal Kullback–Leibler criterion in term of Kullback–Leibler distance and that LCV performs reasonably well. The approach is applied to a model of age-specific risk of dementia as a function of sex and educational level from the data of a large cohort study.  相似文献   

8.
The prediction problem for a multivariate normal distribution is considered where both mean and variance are unknown. When the Kullback–Leibler loss is used, the Bayesian predictive density based on the right invariant prior, which turns out to be a density of a multivariate t-distribution, is the best invariant and minimax predictive density. In this paper, we introduce an improper shrinkage prior and show that the Bayesian predictive density against the shrinkage prior improves upon the best invariant predictive density when the dimension is greater than or equal to three.  相似文献   

9.
The identification of different dynamics in sequential data has become an every day need in scientific fields such as marketing, bioinformatics, finance, or social sciences. Contrary to cross-sectional or static data, this type of observations (also known as stream data, temporal data, longitudinal data or repeated measures) are more challenging as one has to incorporate data dependency in the clustering process. In this research we focus on clustering categorical sequences. The method proposed here combines model-based and heuristic clustering. In the first step, the categorical sequences are transformed by an extension of the hidden Markov model into a probabilistic space, where a symmetric Kullback–Leibler distance can operate. Then, in the second step, using hierarchical clustering on the matrix of distances, the sequences can be clustered. This paper illustrates the enormous potential of this type of hybrid approach using a synthetic data set as well as the well-known Microsoft dataset with website users search patterns and a survey on job career dynamics.  相似文献   

10.
We study the asymptotic behavior of the Bayesian estimator for a deterministic signal in additive Gaussian white noise, in the case where the set of minima of the Kullback–Leibler information is a submanifold of the parameter space. This problem includes as a special case the study of the asymptotic behavior of the nonlinear filter, when the state equation is noise-free, and when the limiting deterministic system is nonobservable. As the noise intensity goes to zero, the posterior probability distribution of the parameter asymptotically concentrates on the submanifold of minima of the Kullback–Leibler information. We give an explicit expression of the limit, and we study the rate of convergence. We apply these results to a practical example where nonidentifiability occurs.  相似文献   

11.
Information criteria based on the expected Kullback–Leibler information are presented by means of the asymptotic expansions derived with the Malliavin calculus. We consider the evaluation problem of statistical models for diffusion processes with small noise. The correction terms are essentially different from the ones for ergodic diffusion models presented in Uchida and Yoshida [34, 35].  相似文献   

12.
We present information criteria for statistical model evaluation problems for stochastic processes. The emphasis is put on the use of the asymptotic expansion of the distribution of an estimator based on the conditional Kullback–Leibler divergence for stochastic processes. Asymptotic properties of information criteria and their improvement are discussed. An application to a diffusion process is presented.  相似文献   

13.
Numerous empirical results have shown that combining regression procedures can be a very efficient method. This work provides PAC bounds for the L2 generalization error of such methods. The interest of these bounds are twofold.First, it gives for any aggregating procedure a bound for the expected risk depending on the empirical risk and the empirical complexity measured by the Kullback–Leibler divergence between the aggregating distribution and a prior distribution π and by the empirical mean of the variance of the regression functions under the probability .Secondly, by structural risk minimization, we derive an aggregating procedure which takes advantage of the unknown properties of the best mixture : when the best convex combination of d regression functions belongs to the d initial functions (i.e. when combining does not make the bias decrease), the convergence rate is of order (logd)/N. In the worst case, our combining procedure achieves a convergence rate of order which is known to be optimal in a uniform sense when (see [A. Nemirovski, in: Probability Summer School, Saint Flour, 1998; Y. Yang, Aggregating regression procedures for a better performance, 2001]).As in AdaBoost, our aggregating distribution tends to favor functions which disagree with the mixture on mispredicted points. Our algorithm is tested on artificial classification data (which have been also used for testing other boosting methods, such as AdaBoost).  相似文献   

14.
We study the large deviation principle for M-estimators (and maximum likelihood estimators in particular). We obtain the rate function of the large deviation principle for M-estimators. For exponential families, this rate function agrees with the Kullback–Leibler information number. However, for location or scale families this rate function is smaller than the Kullback–Leibler information number. We apply our results to obtain confidence regions of minimum size whose coverage probability converges to one exponentially. In the case of full exponential families, the constructed confidence regions agree with the ones obtained by inverting the likelihood ratio test with a simple null hypothesis.  相似文献   

15.
When learning processes depend on samples but not on the order of the information in the sample, then the Bernoulli distribution is relevant and Bernstein polynomials enter into the analysis. We derive estimates of the approximation of the entropy function x log x that are sharper than the bounds from Voronovskaja's theorem. In this way we get the correct asymptotics for the Kullback–Leibler distance for an encoding problem.  相似文献   

16.
Motivated from the bandwidth selection problem in local likelihood density estimation and from the problem of assessing a final model chosen by a certain model selection procedure, we consider estimation of the Kullback–Leibler divergence. It is known that the best bandwidth choice for the local likelihood density estimator depends on the distance between the true density and the ‘vehicle’ parametric model. Also, the Kullback–Leibler divergence may be a useful measure based on which one judges how far the true density is away from a parametric family. We propose two estimators of the Kullback-Leibler divergence. We derive their asymptotic distributions and compare finite sample properties. Research of Young Kyung Lee was supported by the Brain Korea 21 Projects in 2004. Byeong U. Park’s research was supported by KOSEF through Statistical Research Center for Complex Systems at Seoul National University.  相似文献   

17.
We apply the cross-entropy (CE) method to problems in clustering and vector quantization. The CE algorithm for clustering involves the following iterative steps: (a) generate random clusters according to a specified parametric probability distribution, (b) update the parameters of this distribution according to the Kullback–Leibler cross-entropy. Through various numerical experiments, we demonstrate the high accuracy of the CE algorithm and show that it can generate near-optimal clusters for fairly large data sets. We compare the CE method with well-known clustering and vector quantization methods such as K-means, fuzzy K-means and linear vector quantization, and apply each method to benchmark and image analysis data.  相似文献   

18.
The paper revisits the problem of selection of priors for regular one-parameter family of distributions. The goal is to find some “objective” or “default” prior by approximate maximization of the distance between the prior and the posterior under a general divergence criterion as introduced by Amari (Ann Stat 10:357–387, 1982) and Cressie and Read (J R Stat Soc Ser B 46:440–464, 1984). The maximization is based on an asymptotic expansion of this distance. The Kullback–Leibler, Bhattacharyya–Hellinger and Chi-square divergence are special cases of this general divergence criterion. It is shown that with the exception of one particular case, namely the Chi-square divergence, the general divergence criterion yields Jeffreys’ prior. For the Chi-square divergence, we obtain a prior different from that of Jeffreys and also from that of Clarke and Sun (Sankhya Ser A 59:215–231, 1997).  相似文献   

19.
A class of shrinkage priors for multivariate location-scale models is introduced. We consider Bayesian predictive densities for location-scale models and evaluate performance of them using the Kullback–Leibler divergence. We show that Bayesian predictive densities based on priors in the introduced class asymptotically dominate the best invariant predictive density.  相似文献   

20.
We present a version of O. Catoni's “progressive mixture estimator” (1999) suited for a general regression framework. Following basically Catoni's steps, we derive strong non-asymptotic upper bounds for the Kullback–Leibler risk in this framework. We give a more explicit form for this bound when the models considered are regression trees, present a modified version of the estimator in an extended framework and propose an approximate computation using a Metropolis algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号