首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 612 毫秒
1.
This paper considers the problem of learning multinomial distributions from a sample of independent observations. The Bayesian approach usually assumes a prior Dirichlet distribution about the probabilities of the different possible values. However, there is no consensus on the parameters of this Dirichlet distribution. Here, it will be shown that this is not a simple problem, providing examples in which different selection criteria are reasonable. To solve it the Imprecise Dirichlet Model (IDM) was introduced. But this model has important drawbacks, as the problems associated to learning from indirect observations. As an alternative approach, the Imprecise Sample Size Dirichlet Model (ISSDM) is introduced and its properties are studied. The prior distribution over the parameters of a multinomial distribution is the basis to learn Bayesian networks using Bayesian scores. Here, we will show that the ISSDM can be used to learn imprecise Bayesian networks, also called credal networks when all the distributions share a common graphical structure. Some experiments are reported on the use of the ISSDM to learn the structure of a graphical model and to build supervised classifiers.  相似文献   

2.
In this paper, we address the problem of learning discrete Bayesian networks from noisy data. A graphical model based on a mixture of Gaussian distributions with categorical mixing structure coming from a discrete Bayesian network is considered. The network learning is formulated as a maximum likelihood estimation problem and performed by employing an EM algorithm. The proposed approach is relevant to a variety of statistical problems for which Bayesian network models are suitable—from simple regression analysis to learning gene/protein regulatory networks from microarray data.  相似文献   

3.
This paper presents a new algorithm for learning the structure of a special type of Bayesian network. The conditional phase-type (C-Ph) distribution is a Bayesian network that models the probabilistic causal relationships between a skewed continuous variable, modelled by the Coxian phase-type distribution, a special type of Markov model, and a set of interacting discrete variables. The algorithm takes a data set as input and produces the structure, parameters and graphical representations of the fit of the C-Ph distribution as output. The algorithm, which uses a greedy-search technique and has been implemented in MATLAB, is evaluated using a simulated data set consisting of 20,000 cases. The results show that the original C-Ph distribution is recaptured and the fit of the network to the data is discussed.  相似文献   

4.
Models with intractable likelihood functions arise in areas including network analysis and spatial statistics, especially those involving Gibbs random fields. Posterior parameter estimation in these settings is termed a doubly intractable problem because both the likelihood function and the posterior distribution are intractable. The comparison of Bayesian models is often based on the statistical evidence, the integral of the un-normalized posterior distribution over the model parameters which is rarely available in closed form. For doubly intractable models, estimating the evidence adds another layer of difficulty. Consequently, the selection of the model that best describes an observed network among a collection of exponential random graph models for network analysis is a daunting task. Pseudolikelihoods offer a tractable approximation to the likelihood but should be treated with caution because they can lead to an unreasonable inference. This article specifies a method to adjust pseudolikelihoods to obtain a reasonable, yet tractable, approximation to the likelihood. This allows implementation of widely used computational methods for evidence estimation and pursuit of Bayesian model selection of exponential random graph models for the analysis of social networks. Empirical comparisons to existing methods show that our procedure yields similar evidence estimates, but at a lower computational cost. Supplementary material for this article is available online.  相似文献   

5.
The theory of Gaussian graphical models is a powerful tool for independence analysis between continuous variables. In this framework, various methods have been conceived to infer independence relations from data samples. However, most of them result in stepwise, deterministic, descent algorithms that are inadequate for solving this issue. More recent developments have focused on stochastic procedures, yet they all base their research on strong a priori knowledge and are unable to perform model selection among the set of all possible models. Moreover, convergence of the corresponding algorithms is slow, precluding applications on a large scale. In this paper, we propose a novel Bayesian strategy to deal with structure learning. Relating graphs to their supports, we convert the problem of model selection into that of parameter estimation. Use of non-informative priors and asymptotic results yield a posterior probability for independence graph supports in closed form. Gibbs sampling is then applied to approximate the full joint posterior density. We finally give three examples of structure learning, one from synthetic data, and the two others from real data.  相似文献   

6.
Bayesian networks are graphical models that represent the joint distribution of a set of variables using directed acyclic graphs. The graph can be manually built by domain experts according to their knowledge. However, when the dependence structure is unknown (or partially known) the network has to be estimated from data by using suitable learning algorithms. In this paper, we deal with a constraint-based method to perform Bayesian networks structural learning in the presence of ordinal variables. We propose an alternative version of the PC algorithm, which is one of the most known procedures, with the aim to infer the network by accounting for additional information inherent to ordinal data. The proposal is based on a nonparametric test, appropriate for ordinal variables. A comparative study shows that, in some situations, the proposal discussed here is a slightly more efficient solution than the PC algorithm.  相似文献   

7.
An estimation of distribution algorithm for nurse scheduling   总被引:2,自引:0,他引:2  
Schedules can be built in a similar way to a human scheduler by using a set of rules that involve domain knowledge. This paper presents an Estimation of Distribution Algorithm (EDA) for the nurse scheduling problem, which involves choosing a suitable scheduling rule from a set for the assignment of each nurse. Unlike previous work that used Genetic Algorithms (GAs) to implement implicit learning, the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The EDA is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, a new rule string has been obtained. Another set of rule strings will be generated in this way, some of which will replace previous strings based on fitness selection. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. Computational results from 52 real data instances demonstrate the success of this approach. It is also suggested that the learning mechanism in the proposed approach might be suitable for other scheduling problems.  相似文献   

8.
We develop a methodology to efficiently implement the reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms of Green, applicable for example to model selection inference in a Bayesian framework, which builds on the “dragging fast variables” ideas of Neal. We call such algorithms annealed importance sampling reversible jump (aisRJ). The proposed procedures can be thought of as being exact approximations of idealized RJ algorithms which in a model selection problem would sample the model labels only, but cannot be implemented. Central to the methodology is the idea of bridging different models with fictitious intermediate models, whose role is to introduce smooth intermodel transitions and, as we shall see, improve performance. Efficiency of the resulting algorithms is demonstrated on two standard model selection problems and we show that despite the additional computational effort incurred, the approach can be highly competitive computationally. Supplementary materials for the article are available online.  相似文献   

9.
Bayesian networks (BNs) provide a powerful graphical model for encoding the probabilistic relationships among a set of variables, and hence can naturally be used for classification. However, Bayesian network classifiers (BNCs) learned in the common way using likelihood scores usually tend to achieve only mediocre classification accuracy because these scores are less specific to classification, but rather suit a general inference problem. We propose risk minimization by cross validation (RMCV) using the 0/1 loss function, which is a classification-oriented score for unrestricted BNCs. RMCV is an extension of classification-oriented scores commonly used in learning restricted BNCs and non-BN classifiers. Using small real and synthetic problems, allowing for learning all possible graphs, we empirically demonstrate RMCV superiority to marginal and class-conditional likelihood-based scores with respect to classification accuracy. Experiments using twenty-two real-world datasets show that BNCs learned using an RMCV-based algorithm significantly outperform the naive Bayesian classifier (NBC), tree augmented NBC (TAN), and other BNCs learned using marginal or conditional likelihood scores and are on par with non-BN state of the art classifiers, such as support vector machine, neural network, and classification tree. These experiments also show that an optimized version of RMCV is faster than all unrestricted BNCs and comparable with the neural network with respect to run-time. The main conclusion from our experiments is that unrestricted BNCs, when learned properly, can be a good alternative to restricted BNCs and traditional machine-learning classifiers with respect to both accuracy and efficiency.  相似文献   

10.
Summary New Bayesian cohort models designed to resolve the identification problem in cohort analysis are proposed in this paper. At first, the basic cohort model which represents the statistical structure of time-series social survey data in terms of age, period and cohort effects is explained. The logit cohort model for qualitative data from a binomial distribution and the normal-type cohort model for quantitative data from a normal distribution are considered as two special cases of the basic model. In order to overcome the identification problem in cohort analysis, a Bayesian approach is adopted, based on the assumption that the effect parameters change gradually. A Bayesian information criterion ABIC is introduced for the selection of the optimal model. This approach is so flexible that both the logit and the normal-type cohort models can be made applicable, not only to standard cohort tables but also to general cohort tables in which the range of age group is not equal to the interval between periods. The practical utility of the proposed models is demonstrated by analysing two data sets from the literature on cohort analysis. The Institute of Statistical Mathematics  相似文献   

11.
We focus on purchase incidence modelling for a European direct mail company. Response models based on statistical and neural network techniques are contrasted. The evidence framework of MacKay is used as an example implementation of Bayesian neural network learning, a method that is fairly robust with respect to problems typically encountered when implementing neural networks. The automatic relevance determination (ARD) method, an integrated feature of this framework, allows us to assess the relative importance of the inputs. The basic response models use operationalisations of the traditionally discussed Recency, Frequency and Monetary (RFM) predictor categories. In a second experiment, the RFM response framework is enriched by the inclusion of other (non-RFM) customer profiling predictors. We contribute to the literature by providing experimental evidence that: (1) Bayesian neural networks offer a viable alternative for purchase incidence modelling; (2) a combined use of all three RFM predictor categories is advocated by the ARD method; (3) the inclusion of non-RFM variables allows to significantly augment the predictive power of the constructed RFM classifiers; (4) this rise is mainly attributed to the inclusion of customer/company interaction variables and a variable measuring whether a customer uses the credit facilities of the direct mailing company.  相似文献   

12.
The marginal likelihood of the data computed using Bayesian score metrics is at the core of score+search methods when learning Bayesian networks from data. However, common formulations of those Bayesian score metrics rely on free parameters which are hard to assess. Recent theoretical and experimental works have also shown that the commonly employed BDe score metric is strongly biased by the particular assignments of its free parameter known as the equivalent sample size. This sensitivity means that poor choices of this parameter lead to inferred BN models whose structure and parameters do not properly represent the distribution generating the data even for large sample sizes. In this paper we argue that the problem is that the BDe metric is based on assumptions about the BN model parameters distribution assumed to generate the data which are too strict and do not hold in real settings. To overcome this issue we introduce here an approach that tries to marginalize the meta-parameter locally, aiming to embrace a wider set of assumptions about these parameters. It is shown experimentally that this approach offers a robust performance, as good as that of the standard BDe metric with an optimum selection of its free parameter and, in consequence, this method prevents the choice of wrong settings for this widely applied Bayesian score metric.  相似文献   

13.
One of the main advantages of Bayesian approaches is that they offer principled methods of inference in models of varying dimensionality and of models of infinite dimensionality. What is less widely appreciated is how the model inference is sensitive to prior distributions and therefore how priors should be set for real problems. In this paper prior sensitivity is considered with respect to the problem of inference in Gaussian mixture models. Two distinct Bayesian approaches have been proposed. The first is to use Bayesian model selection based upon the marginal likelihood; the second is to use an infinite mixture model which ‘side steps’ model selection. Explanations for the prior sensitivity are given in order to give practitioners guidance in setting prior distributions. In particular the use of conditionally conjugate prior distributions instead of purely conjugate prior distributions are advocated as a method for investigating prior sensitivity of the mean and variance individually.  相似文献   

14.
Bayesian approaches to prediction and the assessment of predictive uncertainty in generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty. When there are linear dependencies among potential predictor variables in a generalized linear model, existing Markov chain Monte Carlo algorithms for sampling from the posterior distribution on the model and parameter space in Bayesian variable selection problems may not work well. This article describes a sampling algorithm based on the Swendsen-Wang algorithm for the Ising model, and which works well when the predictors are far from orthogonality. In problems of variable selection for generalized linear models we can index different models by a binary parameter vector, where each binary variable indicates whether or not a given predictor variable is included in the model. The posterior distribution on the model is a distribution on this collection of binary strings, and by thinking of this posterior distribution as a binary spatial field we apply a sampling scheme inspired by the Swendsen-Wang algorithm for the Ising model in order to sample from the model posterior distribution. The algorithm we describe extends a similar algorithm for variable selection problems in linear models. The benefits of the algorithm are demonstrated for both real and simulated data.  相似文献   

15.
With uncorrelated Gaussian factors extended to mutually independent factors beyond Gaussian, the conventional factor analysis is extended to what is recently called independent factor analysis. Typically, it is called binary factor analysis (BFA) when the factors are binary and called non-Gaussian factor analysis (NFA) when the factors are from real non-Gaussian distributions. A crucial issue in both BFA and NFA is the determination of the number of factors. In the literature of statistics, there are a number of model selection criteria that can be used for this purpose. Also, the Bayesian Ying-Yang (BYY) harmony learning provides a new principle for this purpose. This paper further investigates BYY harmony learning in comparison with existing typical criteria, including Akaik’s information criterion (AIC), the consistent Akaike’s information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion on selection of the number of factors. This comparative study is made via experiments on the data sets with different sample sizes, data space dimensions, noise variances, and hidden factors numbers. Experiments have shown that for both BFA and NFA, in most cases BIC outperforms AIC, CAIC, and CV while the BYY criterion is either comparable with or better than BIC. In consideration of the fact that the selection by these criteria has to be implemented at the second stage based on a set of candidate models which have to be obtained at the first stage of parameter learning, while BYY harmony learning can provide not only a new class of criteria implemented in a similar way but also a new family of algorithms that perform parameter learning at the first stage with automated model selection, BYY harmony learning is more preferred since computing costs can be saved significantly.  相似文献   

16.
We consider optimal decision-making problems in an uncertain environment. In particular, we consider the case in which the distribution of the input is unknown, yet there is some historical data drawn from the distribution. In this paper, we propose a new type of distributionally robust optimization model called the likelihood robust optimization (LRO) model for this class of problems. In contrast to previous work on distributionally robust optimization that focuses on certain parameters (e.g., mean, variance, etc.) of the input distribution, we exploit the historical data and define the accessible distribution set to contain only those distributions that make the observed data achieve a certain level of likelihood. Then we formulate the targeting problem as one of optimizing the expected value of the objective function under the worst-case distribution in that set. Our model avoids the over-conservativeness of some prior robust approaches by ruling out unrealistic distributions while maintaining robustness of the solution for any statistically likely outcomes. We present statistical analyses of our model using Bayesian statistics and empirical likelihood theory. Specifically, we prove the asymptotic behavior of our distribution set and establish the relationship between our model and other distributionally robust models. To test the performance of our model, we apply it to the newsvendor problem and the portfolio selection problem. The test results show that the solutions of our model indeed have desirable performance.  相似文献   

17.
In this article, we propose a novel Bayesian nonparametric clustering algorithm based on a Dirichlet process mixture of Dirichlet distributions which have been shown to be very flexible for modeling proportional data. The idea is to let the number of mixture components increases as new data to cluster arrive in such a manner that the model selection problem (i.e. determination of the number of clusters) can be answered without recourse to classic selection criteria. Thus, the proposed model can be considered as an infinite Dirichlet mixture model. An expectation propagation inference framework is developed to learn this model by obtaining a full posterior distribution on its parameters. Within this learning framework, the model complexity and all the involved parameters are evaluated simultaneously. To show the practical relevance and efficiency of our model, we perform a detailed analysis using extensive simulations based on both synthetic and real data. In particular, real data are generated from three challenging applications namely images categorization, anomaly intrusion detection and videos summarization.  相似文献   

18.
There exists a wide variety of models for return, and the chosen model determines the tool required to calculate the value at risk (VaR). This paper introduces an alternative methodology to model‐based simulation by using a Monte Carlo simulation of the Dirichlet process. The model is constructed in a Bayesian framework, using properties initially described by Ferguson. A notable advantage of this model is that, on average, the random draws are sampled from a mixed distribution that consists of a prior guess by an expert and the empirical process based on a random sample of historical asset returns. The method is relatively automatic and similar to machine learning tools, e.g. the estimate is updated as new data arrive.  相似文献   

19.
Many reduced-order models are neither robust with respect to parameter changes nor cost-effective enough for handling the nonlinear dependence of complex dynamical systems. In this study, we put forth a robust machine learning framework for projection-based reduced-order modeling of such nonlinear and nonstationary systems. As a demonstration, we focus on a nonlinear advection-diffusion system given by the viscous Burgers equation, which is a prototypical setting of more realistic fluid dynamics applications due to its quadratic nonlinearity. In our proposed methodology the effects of truncated modes are modeled using a single layer feed-forward neural network architecture. The neural network architecture is trained by utilizing both the Bayesian regularization and extreme learning machine approaches, where the latter one is found to be more computationally efficient. A significant emphasis is laid on the selection of basis functions through the use of both Fourier bases and proper orthogonal decomposition. It is shown that the proposed model yields significant improvements in accuracy over the standard Galerkin projection methodology with a negligibly small computational overhead and provide reliable predictions with respect to parameter changes.  相似文献   

20.
Approximate Bayesian inference by importance sampling derives probabilistic statements from a Bayesian network, an essential part of evidential reasoning with the network and an important aspect of many Bayesian methods. A critical problem in importance sampling on Bayesian networks is the selection of a good importance function to sample a network’s prior and posterior probability distribution. The initially optimal importance functions eventually start deviating from the optimal function when sampling a network’s posterior distribution given evidence, even when adaptive methods are used that adjust an importance function to the evidence by learning. In this article we propose a new family of Refractor Importance Sampling (RIS) algorithms for adaptive importance sampling under evidential reasoning. RIS applies “arc refractors” to a Bayesian network by adding new arcs and refining the conditional probability tables. The goal of RIS is to optimize the importance function for the posterior distribution and reduce the error variance of sampling. Our experimental results show a significant improvement of RIS over state-of-the-art adaptive importance sampling algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号