首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do not appropriately handle impossible combinations of variables, also known as structural zeros. Allowing nonzero probability for impossible combinations results in inaccurate estimates of joint and conditional probabilities, even for feasible combinations. We present an approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros. The basic idea is to treat the observed data as a truncated sample from an augmented dataset, thereby allowing us to exploit the conditional independence assumptions for computational expediency. As part of the approach, we develop an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which speeds up computation. We apply the approach to sample from a semiparametric version of the latent class model with structural zeros in the context of a key issue faced by national statistical agencies seeking to disseminate confidential data to the public: estimating the number of records in a sample that are unique in the population on a set of publicly available categorical variables. The latent class model offers remarkably accurate estimates of population uniqueness, even in the presence of a large number of structural zeros.  相似文献   

2.

Spatio-temporal data are common in practice. Existing methods for analyzing such data often employ parametric modelling with different sets of model assumptions. However, spatio-temporal data in practice often have complicated structures, including complex spatial and temporal data variation, latent spatio-temporal data correlation, and unknown data distribution. Because such data structures reflect the complicated impact of confounding variables, such as weather, demographic variables, life styles, and other cultural and environmental factors, they are usually too complicated to describe by parametric models. In this paper, we suggest a general modelling framework for estimating the mean and covariance functions of spatio-temporal data using a three-step local smoothing procedure. The suggested method can well accommodate the complicated structure of real spatio-temporal data. Under some regularity conditions, the consistency of the proposed estimators is established. Both simulation studies and a real-data application show that our proposed method could work well in practice.

  相似文献   

3.
We propose a multivariate statistical framework for regional development assessment based on structural equation modelling with latent variables and show how such methods can be combined with non-parametric classification methods such as cluster analysis to obtain development grouping of territorial units. This approach is advantageous over the current approaches in the literature in that it takes account of distributional issues such as departures from normality in turn enabling application of more powerful inferential techniques; it enables modelling of structural relationships among latent development dimensions and subsequently formal statistical testing of model specification and testing of various hypothesis on the estimated parameters; it allows for complex structure of the factor loadings in the measurement models for the latent variables which can also be formally tested in the confirmatory framework; and enables computation of latent variable scores that take into account structural or causal relationships among latent variables and complex structure of the factor loadings in the measurement models. We apply these methods to regional development classification of Slovenia and Croatia.  相似文献   

4.
For a number of situations, a Bayesian network can be split into a core network consisting of a set of latent variables describing the status of a system, and a set of fragments relating the status variables to observable evidence that could be collected about the system state. This situation arises frequently in educational testing, where the status variables represent the student proficiency and the evidence models (graph fragments linking competency variables to observable outcomes) relate to assessment tasks that can be used to assess that proficiency. The traditional approach to knowledge engineering in this situation would be to maintain a library of fragments, where the graphical structure is specified using a graphical editor and then the probabilities are entered using a separate spreadsheet for each node. If many evidence model fragments employ the same design pattern, a lot of repetitive data entry is required. As the parameter values that determine the strength of the evidence can be buried on interior screens of an interface, it can be difficult for a design team to get an impression of the total evidence provided by a collection of evidence models for the system variables, and to identify holes in the data collection scheme. A Q-matrix - an incidence matrix whose rows represent observable outcomes from assessment tasks and whose columns represent competency variables - provides the graphical structure of the evidence models. The Q-matrix can be augmented to provide details of relationship strengths and provide a high level overview of the kind of evidence available. The relationships among the status variables can be represented with an inverse covariance matrix; this is particularly useful in models from the social sciences as often the domain experts’ knowledge about the system states comes from factor analyses and similar procedures that naturally produce covariance matrixes. The representation of the model using matrixes means that the bulk of the specification work can be done using a desktop spreadsheet program and does not require specialized software, facilitating collaboration with external experts. The design idea is illustrated with some examples from prior assessment design projects.  相似文献   

5.
We consider models for the covariance between two blocks of variables. Such models are often used in situations where latent variables are believed to present. In this paper we characterize exactly the set of distributions given by a class of models with one-dimensional latent variables. These models relate two blocks of observed variables, modeling only the cross-covariance matrix. We describe the relation of this model to the singular value decomposition of the cross-covariance matrix. We show that, although the model is underidentified, useful information may be extracted. We further consider an alternative parameterization in which one latent variable is associated with each block, and we extend the result to models with r-dimensional latent variables.  相似文献   

6.
We investigate the structure of a large precision matrix in Gaussian graphical models by decomposing it into a low rank component and a remainder part with sparse precision matrix.Based on the decomposition,we propose to estimate the large precision matrix by inverting a principal orthogonal decomposition(IPOD).The IPOD approach has appealing practical interpretations in conditional graphical models given the low rank component,and it connects to Gaussian graphical models with latent variables.Specifically,we show that the low rank component in the decomposition of the large precision matrix can be viewed as the contribution from the latent variables in a Gaussian graphical model.Compared with existing approaches for latent variable graphical models,the IPOD is conveniently feasible in practice where only inverting a low-dimensional matrix is required.To identify the number of latent variables,which is an objective of its own interest,we investigate and justify an approach by examining the ratios of adjacent eigenvalues of the sample covariance matrix?Theoretical properties,numerical examples,and a real data application demonstrate the merits of the IPOD approach in its convenience,performance,and interpretability.  相似文献   

7.
The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple two-factor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells.  相似文献   

8.
Several papers have already stressed the interest of latent root regression and its similarities to partial least squares regression. A new formulation of this method which makes it even simpler than the original method to set up a prediction model is discussed. Furthermore, it is shown how this method can be extended not only to the case where it is desired to predict several response variables from a set of predictors but also to the multiblock setting where the aim is to predict one or several data sets from several other data sets. The interest of the method is illustrated on the basis of a data set pertaining to epidemiology.  相似文献   

9.
The Bradley–Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behavior, chess ranking, and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons, and random graphs. From a computational point of view, Hunter has proposed efficient iterative minorization-maximization (MM) algorithms to perform maximum likelihood estimation for these generalized Bradley–Terry models whereas Bayesian inference is typically performed using Markov chain Monte Carlo algorithms based on tailored Metropolis–Hastings proposals. We show here that these MM algorithms can be reinterpreted as special instances of expectation-maximization algorithms associated with suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications.  相似文献   

10.
This paper deals with the issue of estimating production frontier and measuring efficiency from a panel data set. First, it proposes an alternate method for the estimation of a production frontier on a short panel data set. The method is based on the so-called mean-and-covariance structure analysis which is closely related to the generalized method of moments. One advantage of the method is that it allows us to investigate the presence of correlations between individual effects and exogenous variables without the requirement of some available instruments uncorrelated with the individual effects as in instrumental variable estimation. Another advantage is that the method is well suited to a panel data set with a short number of periods. Second, the paper considers the question of recovering individual efficiency levels from the estimates obtained from the mean-and-covariance structure analysis. Since individual effects are here viewed as latent variables, they can be estimated as factor scores, i.e., weighted sums of the observed variables. We illustrate the proposed methods with the estimation of a stochastic production frontier on a short panel data of French fruit growers.  相似文献   

11.
It is natural to assume that a missing-data mechanism depends on latent variables in the analysis of incomplete data in latent variate modeling because latent variables are error-free and represent key notions investigated by applied researchers. Unfortunately, the missing-data mechanism is then not missing at random (NMAR). In this article, a new estimation method is proposed, which leads to consistent and asymptotically normal estimators for all parameters in a linear latent variate model, where the missing mechanism depends on the latent variables and no concrete functional form for the missing-data mechanism is used in estimation. The method to be proposed is a type of multi-sample analysis with or without mean structures, and hence, it is easy to implement. Complete-case analysis is shown to produce consistent estimators for some important parameters in the model.  相似文献   

12.
Summary  Graphical methods for the discovery of structural models from observational data provide interesting tools for applied researchers. A problem often faced in empirical studies is the presence of latent confounders which produce associations between the observed variables. Although causal inference algorithms exist which can cope with latent confounders, empirical applications assessing the performance of such algorithms are largely lacking. In this study, we apply the constraint based Fast Causal Inference algorithm implemented in the software program TETRAD on a data set containing strategy and performance information about 608 business units. In contrast to the informative and reasonable results for the impirical data, simulation findings reveal problems in recovering some of the structural relations.  相似文献   

13.
Kernel methods and rough sets are two general pursuits in the domain of machine learning and intelligent systems. Kernel methods map data into a higher dimensional feature space, where the resulting structure of the classification task is linearly separable; while rough sets granulate the universe with the use of relations and employ the induced knowledge granules to approximate arbitrary concepts existing in the problem at hand. Although it seems there is no connection between these two methodologies, both kernel methods and rough sets explicitly or implicitly dwell on relation matrices to represent the structure of sample information. Based on this observation, we combine these methodologies by incorporating Gaussian kernel with fuzzy rough sets and propose a Gaussian kernel approximation based fuzzy rough set model. Fuzzy T-equivalence relations constitute the fundamentals of most fuzzy rough set models. It is proven that fuzzy relations with Gaussian kernel are reflexive, symmetric and transitive. Gaussian kernels are introduced to acquire fuzzy relations between samples described by fuzzy or numeric attributes in order to carry out fuzzy rough data analysis. Moreover, we discuss information entropy to evaluate the kernel matrix and calculate the uncertainty of the approximation. Several functions are constructed for evaluating the significance of features based on kernel approximation and fuzzy entropy. Algorithms for feature ranking and reduction based on the proposed functions are designed. Results of experimental analysis are included to quantify the effectiveness of the proposed methods.  相似文献   

14.
Latent or unobserved phenomena pose a significant difficulty in data analysis as they induce complicated and confounding dependencies among a collection of observed variables. Factor analysis is a prominent multivariate statistical modeling approach that addresses this challenge by identifying the effects of (a small number of) latent variables on a set of observed variables. However, the latent variables in a factor model are purely mathematical objects that are derived from the observed phenomena, and they do not have any interpretation associated to them. A natural approach for attributing semantic information to the latent variables in a factor model is to obtain measurements of some additional plausibly useful covariates that may be related to the original set of observed variables, and to associate these auxiliary covariates to the latent variables. In this paper, we describe a systematic approach for identifying such associations. Our method is based on solving computationally tractable convex optimization problems, and it can be viewed as a generalization of the minimum-trace factor analysis procedure for fitting factor models via convex optimization. We analyze the theoretical consistency of our approach in a high-dimensional setting as well as its utility in practice via experimental demonstrations with real data.  相似文献   

15.
Pair-copula constructions of multiple dependence   总被引:9,自引:0,他引:9  
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method for performing inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocks. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional copulae. We apply the methodology to a financial data set. Our approach represents the first step towards the development of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically.  相似文献   

16.
The general multivariate analysis of variance model has been extensively studied in the statistical literature and successfully applied in many different fields for analyzing longitudinal data. In this article, we consider the extension of this model having two sets of regressors constituting a growth curve portion and a multivariate analysis of variance portion, respectively. Nowadays, the data collected in empirical studies have relatively complex structures though often demanding a parsimonious modeling. This can be achieved for example through imposing rank constraints on the regression coefficient matrices. The reduced rank regression structure also provides a theoretical interpretation in terms of latent variables. We derive likelihood based estimators for the mean parameters and covariance matrix in this type of models. A numerical example is provided to illustrate the obtained results.  相似文献   

17.
A composite model of neural network and rough sets components was constructed to predict a sample of bank holding patterns. The final model was able to correctly classify 96% of a testing set of four types of bank holding structures. Holding structure is defined as the number of banks under common ownership. For this study, forms of bank holding structure include: banks that are not owned by another company, single banks that are held by another firm, pairs of banks that are held by another enterprise, and three or more banks that are held by another company. Initially, input to the neural network model was 28 financial ratios for more than 200 banks in Arkansas for 1992. The 28 ratios are organized by categories such as liquidity, credit risk, leverage, efficiency, and profitability. The ratios were constructed with 70 bank variables such as net worth, deposits, total assets, net loans, total operating income, etc. The first neural network model correctly classified 84% of the testing set at a tolerance level of 0.20. Another artificial intelligence (AI) procedure known as two-dimensional rough sets was then applied to the dataset. Rough sets reduced the number of input variables from 28 to 18, a drop of 36% in the number of input variables. This version of rough sets also eliminated a number of records, thereby reducing the information system (i.e., matrix) on both vertical and horizontal dimensions. A second neural network was trained with the reduced number of input variables and records. This network correctly classified 96% of the testing set at a tolerance level of 0.20, an increase of 11% in the accuracy of the prediction. By applying two-dimensional reducts to the dataset of financial ratios, the predictive accuracy of the neural network model was improved substantially. Banking institutions that are prime candidates for mergers or acquisitions can then be more accurately identified through the use of this hybrid decision support system (DSS) which combines different types of AI techniques for the purposes of data management and modeling.  相似文献   

18.
Based on interval mathematical theory, the interval analysis method for the sensitivity analysis of the structure is advanced in this paper. The interval analysis method deals with the upper and lower bounds on eigenvalues of structures with uncertain-but-bounded (or interval) parameters. The stiffness matrix and the mass matrix of the structure, whose elements have the initial errors, are unknown except for the fact that they belong to given bounded matrix sets. The set of possible matrices can be described by the interval matrix. In terms of structural parameters, the stiffness matrix and the mass matrix take the non-negative decomposition. By means of interval extension, the generalized interval eigenvalue problem of structures with uncertain-but-bounded parameters can be divided into two generalized eigenvalue problems of a pair of real symmetric matrix pair by the real analysis method. Unlike normal sensitivity analysis method, the interval analysis method obtains informations on the response of structures with structural parameters (or design variables) changing and without any partial differential operation. Low computational effort and wide application rang are the characteristic of the proposed method. Two illustrative numerical examples illustrate the efficiency of the interval analysis.  相似文献   

19.
The aim of this paper is to develop an effective method for solving matrix games with payoffs of triangular fuzzy numbers (TFNs) which are arbitrary. In this method, it always assures that players’ gain-floor and loss-ceiling have a common TFN-type fuzzy value and hereby any matrix game with payoffs of TFNs has a TFN-type fuzzy value. Based on duality theorem of linear programming (LP) and the representation theorem for fuzzy sets, the mean and the lower and upper limits of the TFN-type fuzzy value are easily computed through solving the derived LP models with data taken from 1-cut set and 0-cut set of fuzzy payoffs. Hereby the TFN-type fuzzy value of any matrix game with payoffs of TFNs can be explicitly obtained. Moreover, we can easily compute the upper and lower bounds of any Alfa-cut set of the TFN-type fuzzy value for any matrix game with payoffs of TFNs and players’ optimal mixed strategies through solving the derived LP models at any specified confidence level Alfa. The proposed method in this paper is demonstrated with a numerical example and compared with other methods to show the validity, applicability and superiority.  相似文献   

20.
半参数再生散度模型是再生散度模型和半参数回归模型的推广,包括了半参数广义线性模型和广义部分线性模型等特殊类型.讨论的是该模型在响应变量和协变量均存在非随机缺失数据情形下参数的Bayes估计和基于Bayes因子的模型选择问题.在分析中,采用了惩罚样条来估计模型中的非参数成分,并建立了Bayes层次模型;为了解决Gibbs抽样过程中因参数高度相关带来的混合性差以及因维数增加导致出现不稳定性的问题,引入了潜变量做为添加数据并应用了压缩Gibbs抽样方法,改进了收敛性;同时,为了避免计算多重积分,利用了M-H算法估计边缘密度函数后计算Bayes因子,为模型的选择比较提供了一种准则.最后,通过模拟和实例验证了所给方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号