首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
While mixtures of Gaussian distributions have been studied for more than a century, the construction of a reference Bayesian analysis of those models remains unsolved, with a general prohibition of improper priors due to the ill-posed nature of such statistical objects. This difficulty is usually bypassed by an empirical Bayes resolution. By creating a new parameterization centered on the mean and possibly the variance of the mixture distribution itself, we manage to develop here a weakly informative prior for a wide class of mixtures with an arbitrary number of components. We demonstrate that some posterior distributions associated with this prior and a minimal sample size are proper. We provide Markov chain Monte Carlo (MCMC) implementations that exhibit the expected exchangeability. We only study here the univariate case, the extension to multivariate location-scale mixtures being currently under study. An R package called Ultimixt is associated with this article. Supplementary material for this article is available online.  相似文献   

2.
Properties of finite mixtures of normal distributions are considered. Their behavioral similarities and differences relative to normal distributions are studied. A practical application of finite mixtures of normal distributions for the simulating the noise of neurophysiological signals is described. It is shown that the Aitken estimate can be used for the source amplitudes in the considered model.  相似文献   

3.
Problems of specification of discrete bivariate statistical models by a modified power series conditional distribution and a regression function are studied. An identifiability result for a wide class of such mixtures with infinite support is obtained. Also the finite support case within a more specific model is considered. Applications for Poisson, (truncated) geometric, and binomial mixtures are given. From the viewpoint of Bayesian analysis unique determination of the prior by a Bayes estimate of the mean for modified power series mixtures is investigated.  相似文献   

4.
Markov chain Monte Carlo (MCMC) methods for Bayesian computation are mostly used when the dominating measure is the Lebesgue measure, the counting measure, or a product of these. Many Bayesian problems give rise to distributions that are not dominated by the Lebesgue measure or the counting measure alone. In this article we introduce a simple framework for using MCMC algorithms in Bayesian computation with mixtures of mutually singular distributions. The idea is to find a common dominating measure that allows the use of traditional Metropolis-Hastings algorithms. In particular, using our formulation, the Gibbs sampler can be used whenever the full conditionals are available. We compare our formulation with the reversible jump approach and show that the two are closely related. We give results for three examples, involving testing a normal mean, variable selection in regression, and hypothesis testing for differential gene expression under multiple conditions. This allows us to compare the three methods considered: Metropolis-Hastings with mutually singular distributions, Gibbs sampler with mutually singular distributions, and reversible jump. In our examples, we found the Gibbs sampler to be more precise and to need considerably less computer time than the other methods. In addition, the full conditionals used in the Gibbs sampler can be used to further improve the estimates of the model posterior probabilities via Rao-Blackwellization, at no extra cost.  相似文献   

5.
This article proposes a Bayesian density estimation method based upon mixtures of gamma distributions. It considers both the cases of known mixture size, using a Gibbs sampling scheme with a Metropolis step, and unknown mixture size, using a reversible jump technique that allows us to move from one mixture size to another. We illustrate our methods using a number of simulated datasets, generated from distributions covering a wide range of cases: single distributions, mixtures of distributions with equal means and different variances, mixtures of distributions with different means and small variances and, finally, a distribution contaminated by low-weighted distributions with different means and equal, small variances. An application to estimation of some quantities for a M/G/1 queue is given, using real E-mail data from CNR-IAMI.  相似文献   

6.
In model-based clustering, the density of each cluster is usually assumed to be a certain basic parametric distribution, for example, the normal distribution. In practice, it is often difficult to decide which parametric distribution is suitable to characterize a cluster, especially for multivariate data. Moreover, the densities of individual clusters may be multimodal themselves, and therefore cannot be accurately modeled by basic parametric distributions. This article explores a clustering approach that models each cluster by a mixture of normals. The resulting overall model is a multilayer mixture of normals. Algorithms to estimate the model and perform clustering are developed based on the classification maximum likelihood (CML) and mixture maximum likelihood (MML) criteria. BIC and ICL-BIC are examined for choosing the number of normal components per cluster. Experiments on both simulated and real data are presented.  相似文献   

7.

In model-based clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al. (Stat Comput 26:303–324, 2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with K components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than K with high probability. The number of clusters is then inferred a posteriori from the data. The present paper makes the following contributions in the context of sparse finite mixture modelling. First, it is illustrated that the concept of sparse finite mixture is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non-Gaussian clusters. Second, sparse finite mixtures are compared to Dirichlet process mixtures with respect to their ability to identify the number of clusters. For both model classes, a random hyper prior is considered for the parameters determining the weight distribution. By suitable matching of these priors, it is shown that the choice of this hyper prior is far more influential on the cluster solution than whether a sparse finite mixture or a Dirichlet process mixture is taken into consideration.

  相似文献   

8.
A Bayesian semiparametric procedure for confirmatory factor analysis model is proposed to address the heterogeneity of the multivariate responses. The approach relies on the use of a prior over the space of mixing distributions with finite components. Blocked Gibbs sampler is implemented to cope with the posterior analysis. For model comparison, themeasure and Bayes factor are developed. A generalized weighted Chinese restaurant algorithm is suggested to compute the likelihood of data. Empirical results are presented to illustrate the effectiveness of the methodologies.  相似文献   

9.
Within the framework of Bayesian inference, when observations are exchangeable and take values in a finite space X, a prior P is approximated (in the Prokhorov metric) with any precision by explicitly constructed mixtures of Dirichlet distributions. Likewise, the posteriors are approximated with some precision by the posteriors of these mixtures of Dirichlet distributions. Approximations in the uniform metric for distribution functions are also given. These results are applied to obtain a method for eliciting prior beliefs and to approximate both the predictive distribution (in the variational metric) and the posterior distribution function of d (in the Lévy metric), when is a random probability having distribution P.  相似文献   

10.
有限混合Gamma分布的拓扑稠密性证明   总被引:2,自引:0,他引:2  
首先给出了有限混合Erlang分布在正实数轴上所有概率分布中稠密的理论证明,进而给出了混合Gamma分布具有稠密性的结论,说明有限混合Gamma分布具有广泛的适用性,可以用来刻画正实数轴上的任意随机变量.  相似文献   

11.
The Zellner's g-prior and its recent hierarchical extensions are the most popular default prior choices in the Bayesian variable selection context. These prior setups can be expressed as power-priors with fixed set of imaginary data. In this article, we borrow ideas from the power-expected-posterior (PEP) priors to introduce, under the g-prior approach, an extra hierarchical level that accounts for the imaginary data uncertainty. For normal regression variable selection problems, the resulting power-conditional-expected-posterior (PCEP) prior is a conjugate normal-inverse gamma prior that provides a consistent variable selection procedure and gives support to more parsimonious models than the ones supported using the g-prior and the hyper-g prior for finite samples. Detailed illustrations and comparisons of the variable selection procedures using the proposed method, the g-prior, and the hyper-g prior are provided using both simulated and real data examples. Supplementary materials for this article are available online.  相似文献   

12.
This article introduces a novel and flexible framework for investigating the roles of actors within a network. Particular interest is in roles as defined by local network connectivity patterns, identified using the ego-networks extracted from the network. A mixture of exponential-family random graph models (ERGM) is developed for these ego-networks to cluster the nodes into roles. We refer to this model as the ego-ERGM. An expectation-maximization algorithm is developed to infer the unobserved cluster assignments and to estimate the mixture model parameters using a maximum pseudo-likelihood approximation. We demonstrate the flexibility and utility of the method using examples of simulated and real networks.  相似文献   

13.
In this article, we model multivariate categorical (binary and ordinal) response data using a very rich class of scale mixture of multivariate normal (SMMVN) link functions to accommodate heavy tailed distributions. We consider both noninformative as well as informative prior distributions for SMMVN-link models. The notation of informative prior elicitation is based on available similar historical studies. The main objectives of this article are (i) to derive theoretical properties of noninformative and informative priors as well as the resulting posteriors and (ii) to develop an efficient Markov chain Monte Carlo algorithm to sample from the resulting posterior distribution. A real data example from prostate cancer studies is used to illustrate the proposed methodologies.  相似文献   

14.
There are a number of cases where the moments of a distribution are easily obtained, but theoretical distributions are not available in closed form. This paper shows how to use moment methods to approximate a theoretical univariate distribution with mixtures of known distributions. The methods are illustrated with gamma mixtures. It is shown that for a certain class of mixture distributions, which include the normal and gamma mixture families, one can solve for a p-point mixing distribution such that the corresponding mixture has exactly the same first 2p moments as the targeted univariate distribution. The gamma mixture approximation to the distribution of a positive weighted sums of independent central 2 variables is demonstrated and compared with a number of existing approximations. The numerical results show that the new approximation is generally superior to these alternatives.  相似文献   

15.
An estimator of the number of components of a finite mixture ofk-dimensional distributions is given on the basis of a one-dimensional independent random sample obtained by a transformation of ak-dimensional independent random sample. A consistency of the estimator is shown. Some simulation results are given in a case of finite mixtures of two-dimensional normal distributions.  相似文献   

16.
This paper considers the problem of learning multinomial distributions from a sample of independent observations. The Bayesian approach usually assumes a prior Dirichlet distribution about the probabilities of the different possible values. However, there is no consensus on the parameters of this Dirichlet distribution. Here, it will be shown that this is not a simple problem, providing examples in which different selection criteria are reasonable. To solve it the Imprecise Dirichlet Model (IDM) was introduced. But this model has important drawbacks, as the problems associated to learning from indirect observations. As an alternative approach, the Imprecise Sample Size Dirichlet Model (ISSDM) is introduced and its properties are studied. The prior distribution over the parameters of a multinomial distribution is the basis to learn Bayesian networks using Bayesian scores. Here, we will show that the ISSDM can be used to learn imprecise Bayesian networks, also called credal networks when all the distributions share a common graphical structure. Some experiments are reported on the use of the ISSDM to learn the structure of a graphical model and to build supervised classifiers.  相似文献   

17.
We propose an algorithm for nonparametric estimation for finite mixtures of multivariate random vectors that strongly resembles a true EM algorithm. The vectors are assumed to have independent coordinates conditional upon knowing from which mixture component they come, but otherwise their density functions are completely unspecified. Sometimes, the density functions may be partially specified by Euclidean parameters, a case we call semiparametric. Our algorithm is much more flexible and easily applicable than existing algorithms in the literature; it can be extended to any number of mixture components and any number of vector coordinates of the multivariate observations. Thus it may be applied even in situations where the model is not identifiable, so care is called for when using it in situations for which identifiability is difficult to establish conclusively. Our algorithm yields much smaller mean integrated squared errors than an alternative algorithm in a simulation study. In another example using a real dataset, it provides new insights that extend previous analyses. Finally, we present two different variations of our algorithm, one stochastic and one deterministic, and find anecdotal evidence that there is not a great deal of difference between the performance of these two variants. The computer code and data used in this article are available online.  相似文献   

18.
Parameter estimation for model-based clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univariate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a substantial departure from the traditional EM approach and alleviates some of the associated computational complexities and uncertainties. Our variational algorithm is applied to simulated and real data. The paper concludes with discussion and suggestions for future work.  相似文献   

19.
High-dimensional data are prevalent across many application areas, and generate an ever-increasing demand for statistical methods of dimension reduction, such as cluster and significance analysis. One application area that has recently received much interest is the analysis of microarray gene expression data.

The results of cluster analysis are open to subjective interpretation. To facilitate the objective inference of such analyses, we use flexible parameterizations of the cluster means, paired with model selection, to generate sparse and easy-to-interpret representations of each cluster. Model selection in cluster analysis is combinatorial in the numbers of clusters and data dimensions, and thus presents a computationally challenging task.

In this article we introduce a model selection method based on rate-distortion theory, which allows us to turn the combinatorial model selection problem into a fast and simultaneous selection across clusters. The method is also applicable to model selection in significance analysis

We show that simultaneous model selection for cluster analysis generates objectively interpretable cluster models, and that the selection performance is competitive with a combinatorial search, at a fraction of the computational cost. Moreover, we show that the rate-distortion based significance analysis substantially increases the power compared with standard methods.

This article has supplementary material online.  相似文献   

20.
A mixture approach to clustering is an important technique in cluster analysis. A mixture of multivariate multinomial distributions is usually used to analyze categorical data with latent class model. The parameter estimation is an important step for a mixture distribution. Described here are four approaches to estimating the parameters of a mixture of multivariate multinomial distributions. The first approach is an extended maximum likelihood (ML) method. The second approach is based on the well-known expectation maximization (EM) algorithm. The third approach is the classification maximum likelihood (CML) algorithm. In this paper, we propose a new approach using the so-called fuzzy class model and then create the fuzzy classification maximum likelihood (FCML) approach for categorical data. The accuracy, robustness and effectiveness of these four types of algorithms for estimating the parameters of multivariate binomial mixtures are compared using real empirical data and samples drawn from the multivariate binomial mixtures of two classes. The results show that the proposed FCML algorithm presents better accuracy, robustness and effectiveness. Overall, the FCML algorithm has the superiority over the ML, EM and CML algorithms. Thus, we recommend FCML as another good tool for estimating the parameters of mixture multivariate multinomial models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号