首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Compositional model theory serves as an alternative approach to multidimensional probability distribution representation and processing. Every compositional model over a finite non-empty set of variables N is uniquely defined by its generating sequence - an ordered set of low-dimensional probability distributions. A generating sequence structure induces a system of conditional independence statements over N valid for every multidimensional distribution represented by a compositional model with this structure.The equivalence problem is how to characterise whether all independence statements induced by structure P are induced by a second structure P and vice versa. This problem can be solved in several ways. A partial solution of the so-called direct characterisation of an equivalence problem is represented here. We deduce and describe three properties of equivalent structures necessary for equivalence of the respective structures. We call them characteristic properties of classes of equivalent structures.  相似文献   

2.
It is well-known that a conditional independence statement for discrete variables is equivalent to constraining to zero a suitable set of log–linear interactions. In this paper we show that this is also equivalent to zero constraints on suitable sets of marginal log–linear interactions, that can be formulated within a class of smooth marginal log–linear models. This result allows much more flexibility than known until now in combining several conditional independencies into a smooth marginal model. This result is the basis for a procedure that can search for such a marginal parameterization, so that, if one exists, the model is smooth.  相似文献   

3.
Log-linear models are the popular workhorses of analyzing contingency tables. A log-linear parameterization of an interaction model can be more expressive than a direct parameterization based on probabilities, leading to a powerful way of defining restrictions derived from marginal, conditional and context-specific independence. However, parameter estimation is often simpler under a direct parameterization, provided that the model enjoys certain decomposability properties. Here we introduce a cyclical projection algorithm for obtaining maximum likelihood estimates of log-linear parameters under an arbitrary context-specific graphical log-linear model, which needs not satisfy criteria of decomposability. We illustrate that lifting the restriction of decomposability makes the models more expressive, such that additional context-specific independencies embedded in real data can be identified. It is also shown how a context-specific graphical model can correspond to a non-hierarchical log-linear parameterization with a concise interpretation. This observation can pave way to further development of non-hierarchical log-linear models, which have been largely neglected due to their believed lack of interpretability.  相似文献   

4.
Exploiting independencies to compute semigraphoid and graphoid structures   总被引:1,自引:0,他引:1  
We deal with conditional independencies, which have a fundamental role in probability and multivariate statistics. The structure of probabilistic independencies is described by semigraphoids or, for strictly positive probabilities, by graphoids. In this paper, given a set of independencies compatible with a probability, the attention is focused toward the problem of computing efficiently the closure with respect to the semigraphoid and graphoid structures. We introduce a suitable notion of projection in order to provide a new method which properly uses conditional independence statements.  相似文献   

5.
While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for datasets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation dataset (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables for the main presentation, we also show that the proposed methodology can be easily extended to general discrete variables.  相似文献   

6.
When applying any technique of multidimensional models to problems of practice, one always has to cope with two problems: the necessity to represent the models with a ”reasonable” number of parameters and to have sufficiently efficient computational procedures at one’s disposal. When considering graphical Markov models in probability theory, both of these conditions are fulfilled; various computational procedures for decomposable models are based on the ideas of local computations, whose theoretical foundations were laid by Lauritzen and Spiegelhalter.The presented contribution studies a possibility of transferring these ideas from probability theory into Dempster-Shafer theory of evidence. The paper recalls decomposable models, discusses connection of the model structure with the corresponding system of conditional independence relations, and shows that under special additional conditions, one can locally compute specific basic assignments which can be considered to be conditional.  相似文献   

7.
For a number of situations, a Bayesian network can be split into a core network consisting of a set of latent variables describing the status of a system, and a set of fragments relating the status variables to observable evidence that could be collected about the system state. This situation arises frequently in educational testing, where the status variables represent the student proficiency and the evidence models (graph fragments linking competency variables to observable outcomes) relate to assessment tasks that can be used to assess that proficiency. The traditional approach to knowledge engineering in this situation would be to maintain a library of fragments, where the graphical structure is specified using a graphical editor and then the probabilities are entered using a separate spreadsheet for each node. If many evidence model fragments employ the same design pattern, a lot of repetitive data entry is required. As the parameter values that determine the strength of the evidence can be buried on interior screens of an interface, it can be difficult for a design team to get an impression of the total evidence provided by a collection of evidence models for the system variables, and to identify holes in the data collection scheme. A Q-matrix - an incidence matrix whose rows represent observable outcomes from assessment tasks and whose columns represent competency variables - provides the graphical structure of the evidence models. The Q-matrix can be augmented to provide details of relationship strengths and provide a high level overview of the kind of evidence available. The relationships among the status variables can be represented with an inverse covariance matrix; this is particularly useful in models from the social sciences as often the domain experts’ knowledge about the system states comes from factor analyses and similar procedures that naturally produce covariance matrixes. The representation of the model using matrixes means that the bulk of the specification work can be done using a desktop spreadsheet program and does not require specialized software, facilitating collaboration with external experts. The design idea is illustrated with some examples from prior assessment design projects.  相似文献   

8.
Gaussian graphical models represent the underlying graph structure of conditional dependence between random variables, which can be determined using their partial correlation or precision matrix. In a high-dimensional setting, the precision matrix is estimated using penalized likelihood by adding a penalization term, which controls the amount of sparsity in the precision matrix and totally characterizes the complexity and structure of the graph. The most commonly used penalization term is the L1 norm of the precision matrix scaled by the regularization parameter, which determines the trade-off between sparsity of the graph and fit to the data. In this article, we propose several procedures to select the regularization parameter in the estimation of graphical models that focus on recovering reliably the appropriate network structure of the graph. We conduct an extensive simulation study to show that the proposed methods produce useful results for different network topologies. The approaches are also applied in a high-dimensional case study of gene expression data with the aim to discover the genes relevant to colon cancer. Using these data, we find graph structures, which are verified to display significant biological gene associations. Supplementary material is available online.  相似文献   

9.
We define a new class of coloured graphical models, called regulatory graphs. These graphs have their own distinctive formal semantics and can directly represent typical qualitative hypotheses about regulatory processes like those described by various biological mechanisms. They admit an embellishment into classes of probabilistic statistical models and so standard Bayesian methods of model selection can be used to choose promising candidate explanations of regulation. Regulation is modelled by the existence of a deterministic relationship between the longitudinal series of observations labelled by the receiving vertex and the donating one. This class contains longitudinal cluster models as a degenerate graph. Edge colours directly distinguish important features of the mechanism like inhibition and excitation and graphs are often cyclic. With appropriate distributional assumptions, because the regulatory relationships map onto each other through a group structure, it is possible to define a conditional conjugate analysis. This means that even when the model space is huge it is nevertheless feasible, using a Bayesian MAP search, to a discover regulatory network with a high Bayes Factor score. We also show that, like the class of Bayesian Networks, regulatory graphs also admit a formal but distinctive causal algebra. The topology of the graph then represents collections of hypotheses about the predicted effect of controlling the process by tearing out message passers or forcing them to transmit certain signals. We illustrate our methods on a microarray experiment measuring the expression of thousands of genes as a longitudinal series where the scientific interest lies in the circadian regulation of these plants.  相似文献   

10.
Chain graph (CG) is a general model of graphical Markov models. Some different chain graphs may describe the same conditional independence structure, then we say that these CGs are Markov equivalent. In 1990 Frydenberg showed that every class of Markov equivalent CGs has a CG which is called the largest chain graph with the greatest number of lines. This paper presents an efficient algorithm for finding the largest chain graph of the corresponding Markov equivalent class of a given CG. The computational complexity of the algorithm is O(n3). It is more efficient than the complexity O(n!) of the present algorithms. Also a more intuitive graphical characterization of the largest chain graph is provided based on the algorithm in this paper.  相似文献   

11.
Graphical models are wildly used to describe conditional dependence relationships among interacting random variables. Among statistical inference problems of a graphical model, one particular interest is utilizing its interaction structure to reduce model complexity. As an important approach to utilizing structural information, decomposition allows a statistical inference problem to be divided into some sub-problems with lower complexities. In this paper, to investigate decomposition of covariate-dependent graphical models, we propose some useful definitions of decomposition of covariate-dependent graphical models with categorical data in the form of contingency tables. Based on such a decomposition, a covariate-dependent graphical model can be split into some sub-models, and the maximum likelihood estimation of this model can be factorized into the maximum likelihood estimations of the sub-models. Moreover, some sufficient and necessary conditions of the proposed definitions of decomposition are studied.  相似文献   

12.
Analysis of large dimensional contingency tables is rather difficult. Fienberg and Kim (1999, Journal of American Statistical Association, 94, 229–239) studied the problem of combining conditional (on single variable) log-linear structures for graphical models to obtain partial information about the full graphical log-linear model. In this paper, we consider the general log-linear models and obtain explicit representation for the log-linear parameters of the full model based on that of conditional structures. As a consequence, we give conditions under which a particular log-linear parameter is present or not in the full model. Some of the main results of Fienberg and Kim follow from our results. The explicit relationships between full model and the conditional structures are also presented. The connections between conditional structures and the layer structures are pointed out. We investigate also the hierarchical nature of the full model, based on conditional structures. Kim (2006, Computational Statistics and Data Analysis, 50, 2044–2064) analyzed graphical log-linear models based on conditional log-linear structures, when a set of variables is conditioned. For this case, we employ the Möbius inversion technique to obtain the interaction parameters of the full log-linear model, and discuss their properties. The hierarchical nature of the full model is also studied based on conditional structures. This result could be effectively used for the model selection also. As applications of our results, we have discussed several typical examples, including a real-life example.  相似文献   

13.
The time-evolving precision matrix of a piecewise-constant Gaussian graphical model encodes the dynamic conditional dependency structure of a multivariate time-series. Traditionally, graphical models are estimated under the assumption that data are drawn identically from a generating distribution. Introducing sparsity and sparse-difference inducing priors, we relax these assumptions and propose a novel regularized M-estimator to jointly estimate both the graph and changepoint structure. The resulting estimator possesses the ability to therefore favor sparse dependency structures and/or smoothly evolving graph structures, as required. Moreover, our approach extends current methods to allow estimation of changepoints that are grouped across multiple dependencies in a system. An efficient algorithm for estimating structure is proposed. We study the empirical recovery properties in a synthetic setting. The qualitative effect of grouped changepoint estimation is then demonstrated by applying the method on a genetic time-course dataset. Supplementary material for this article is available online.  相似文献   

14.
Modeling dependence in high-dimensional systems has become an increasingly important topic. Most approaches rely on the assumption of a multivariate Gaussian distribution such as statistical models on directed acyclic graphs (DAGs). They are based on modeling conditional independencies and are scalable to high dimensions. In contrast, vine copula models accommodate more elaborate features like tail dependence and asymmetry, as well as independent modeling of the marginals. This flexibility comes however at the cost of exponentially increasing complexity for model selection and estimation. We show a novel connection between DAGs with limited number of parents and truncated vine copulas under sufficient conditions. This motivates a more general procedure exploiting the fast model selection and estimation of sparse DAGs while allowing for non-Gaussian dependence using vine copulas. By numerical examples in hundreds of dimensions, we demonstrate that our approach outperforms the standard method for vine structure selection. Supplementary material for this article is available online.  相似文献   

15.
Bayesian networks (BNs) have attained widespread use in data analysis and decision making. Well-studied topics include efficient inference, evidence propagation, parameter learning from data for complete and incomplete data scenarios, expert elicitation for calibrating BN probabilities, and structure learning. It is common for the researcher to assume the structure of the BN or to glean the structure from expert elicitation or domain knowledge. In this scenario, the model may be calibrated through learning the parameters from relevant data. There is a lack of work on model diagnostics for fitted BNs; this is the contribution of this article. We key on the definition of (conditional) independence to develop a graphical diagnostic that indicates whether the conditional independence assumptions imposed, when one assumes the structure of the BN, are supported by the data. We develop the approach theoretically and describe a Monte Carlo method to generate uncertainty measures for the consistency of the data with conditional independence assumptions under the model structure. We describe how this theoretical information and the data are presented in a graphical diagnostic tool. We demonstrate the approach through data simulated from BNs under different conditional independence assumptions. We also apply the diagnostic to a real-world dataset. The results presented in this article show that this approach is most feasible for smaller BNs—this is not peculiar to the proposed diagnostic graphic, but rather is related to the general difficulty of combining large BNs with data in any manner (such as through parameter estimation). It is the authors’ hope that this article helps highlight the need for more research into BN model diagnostics. This article has supplementary materials online.  相似文献   

16.
We investigate the structure of a large precision matrix in Gaussian graphical models by decomposing it into a low rank component and a remainder part with sparse precision matrix.Based on the decomposition,we propose to estimate the large precision matrix by inverting a principal orthogonal decomposition(IPOD).The IPOD approach has appealing practical interpretations in conditional graphical models given the low rank component,and it connects to Gaussian graphical models with latent variables.Specifically,we show that the low rank component in the decomposition of the large precision matrix can be viewed as the contribution from the latent variables in a Gaussian graphical model.Compared with existing approaches for latent variable graphical models,the IPOD is conveniently feasible in practice where only inverting a low-dimensional matrix is required.To identify the number of latent variables,which is an objective of its own interest,we investigate and justify an approach by examining the ratios of adjacent eigenvalues of the sample covariance matrix?Theoretical properties,numerical examples,and a real data application demonstrate the merits of the IPOD approach in its convenience,performance,and interpretability.  相似文献   

17.
Directed acyclic graphs (DAGs) constitute a qualitative representation for conditional independence (CI) properties of a probability distribution. It is known that every CI statement implied by the topology of a DAG is witnessed over it under a graph-theoretic criterion of d-separation. Alternatively, all such implied CI statements are derivable from the local independencies encoded by a DAG using the so-called semi-graphoid axioms. We consider Labeled Directed Acyclic Graphs (LDAGs) modeling graphically scenarios exhibiting context-specific independence (CSI). Such CSI statements are modeled by labeled edges, where labels encode contexts in which the edge vanishes. We study the problem of identifying all independence statements implied by the structure and the labels of an LDAG. We show that this problem is coNP-hard for LDAGs and formulate a sound extension of the semi-graphoid axioms for the derivation of such implied independencies. Finally we connect our study to certain qualitative versions of independence ubiquitous in database theory and teams semantics.  相似文献   

18.
An extended version of Hatzopoulos and Haberman (2009) dynamic parametric model is proposed for analyzing mortality structures, incorporating the cohort effect. A one-factor parameterized exponential polynomial in age effects within the generalized linear models (GLM) framework is used. Sparse principal component analysis (SPCA) is then applied to time-dependent GLM parameter estimates and provides (marginal) estimates for a two-factor principal component (PC) approach structure. Modeling the two-factor residuals in the same way, in age-cohort effects, provides estimates for the (conditional) three-factor age-period-cohort model. The age-time and cohort related components are extrapolated using dynamic linear regression (DLR) models. An application is presented for England & Wales males (1841-2006).  相似文献   

19.
Probabilistic Decision Graphs (PDGs) are a class of graphical models that can naturally encode some context specific independencies that cannot always be efficiently captured by other popular models, such as Bayesian Networks. Furthermore, inference can be carried out efficiently over a PDG, in time linear in the size of the model. The problem of learning PDGs from data has been studied in the literature, but only for the case of complete data. We propose an algorithm for learning PDGs in the presence of missing data. The proposed method is based on the Expectation-Maximisation principle for estimating the structure of the model as well as the parameters. We test our proposal on both artificially generated data with different rates of missing cells and real incomplete data. We also compare the PDG models learnt by our approach to the commonly used Bayesian Network (BN) model. The results indicate that the PDG model is less sensitive to the rate of missing data than BN model. Also, though the BN models usually attain higher likelihood, the PDGs are close to them also in size, which makes the learnt PDGs preferable for probabilistic inference purposes.  相似文献   

20.
Probabilistic Decision Graphs (PDGs) are probabilistic graphical models that represent a factorisation of a discrete joint probability distribution using a “decision graph”-like structure over local marginal parameters. The structure of a PDG enables the model to capture some context specific independence relations that are not representable in the structure of more commonly used graphical models such as Bayesian networks and Markov networks. This sometimes makes operations in PDGs more efficient than in alternative models. PDGs have previously been defined only in the discrete case, assuming a multinomial joint distribution over the variables in the model. We extend PDGs to incorporate continuous variables, by assuming a Conditional Gaussian (CG) joint distribution. We also show how inference can be carried out in an efficient way.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号