首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.  相似文献   

2.
We propose an algorithm for nonparametric estimation for finite mixtures of multivariate random vectors that strongly resembles a true EM algorithm. The vectors are assumed to have independent coordinates conditional upon knowing from which mixture component they come, but otherwise their density functions are completely unspecified. Sometimes, the density functions may be partially specified by Euclidean parameters, a case we call semiparametric. Our algorithm is much more flexible and easily applicable than existing algorithms in the literature; it can be extended to any number of mixture components and any number of vector coordinates of the multivariate observations. Thus it may be applied even in situations where the model is not identifiable, so care is called for when using it in situations for which identifiability is difficult to establish conclusively. Our algorithm yields much smaller mean integrated squared errors than an alternative algorithm in a simulation study. In another example using a real dataset, it provides new insights that extend previous analyses. Finally, we present two different variations of our algorithm, one stochastic and one deterministic, and find anecdotal evidence that there is not a great deal of difference between the performance of these two variants. The computer code and data used in this article are available online.  相似文献   

3.
Recently, different mixture models have been proposed for multilevel data, generally requiring the local independence assumption. In this work, this assumption is relaxed by allowing each mixture component at the lower level of the hierarchical structure to be modeled according to a multivariate Gaussian distribution with a non-diagonal covariance matrix. For high-dimensional problems, this solution can lead to highly parameterized models. In this proposal, the trade-off between model parsimony and flexibility is governed by assuming a latent factor generative model.  相似文献   

4.
In applied sciences, generalized linear mixed models have become one of the preferred tools to analyze a variety of longitudinal and clustered data. Due to software limitations, the analyses are often restricted to the setting in which the random effects terms follow a multivariate normal distribution. However, this assumption may be unrealistic, obscuring important features of among-unit variation. This work describes a widely applicable semiparametric Bayesian approach that relaxes the normality assumption by using a novel mixture of multivariate Polya trees prior to define a flexible nonparametric model for the random effects distribution. The nonparametric prior is centered on the commonly used parametric normal family. We allow this parametric family to hold only approximately, thereby providing a robust alternative for modeling. We discuss and implement practical procedures for addressing the computational challenges that arise under this approach. We illustrate the methodology by applying it to real-life examples.

Supplemental materials for this paper are available online.  相似文献   

5.

Variable selection for multivariate nonparametric regression models usually involves parameterized approximation for nonparametric functions in the objective function. However, this parameterized approximation often increases the number of parameters significantly, leading to the “curse of dimensionality” and inaccurate estimation. In this paper, we propose a novel and easily implemented approach to do variable selection in nonparametric models without parameterized approximation, enabling selection consistency to be achieved. The proposed method is applied to do variable selection for additive models. A two-stage procedure with selection and adaptive estimation is proposed, and the properties of this method are investigated. This two-stage algorithm is adaptive to the smoothness of the underlying components, and the estimation consistency can reach a parametric rate if the underlying model is really parametric. Simulation studies are conducted to examine the performance of the proposed method. Furthermore, a real data example is analyzed for illustration.

  相似文献   

6.

We demonstrate that, in a regression setting with a Hilbertian predictor, a response variable is more likely to be more highly correlated with the leading principal components of the predictor than with trailing ones. This is despite the extraction procedure being unsupervised. Our results are established under the conditional independence model, which includes linear regression and single-index models as special cases, with some assumptions on the regression vector. These results are a generalisation of earlier work which showed that this phenomenon holds for predictors which are real random vectors. A simulation study is used to quantify the phenomenon.

  相似文献   

7.
In this paper, two new tests for heteroscedasticity in nonparametric regression are presented and compared. The first of these tests consists in first estimating nonparametrically the unknown conditional variance function and then using a classical least-squares test for a general linear model to test whether this function is a constant. The second test is based on using an overall distance between a nonparametric estimator of the conditional variance function and a parametric estimator of the variance of the model under the assumption of homoscedasticity. A bootstrap algorithm is used to approximate the distribution of this test statistic. Extended versions of both procedures in two directions, first, in the context of dependent data, and second, in the case of testing if the variance function is a polynomial of a certain degree, are also described. A broad simulation study is carried out to illustrate the finite sample performance of both tests when the observations are independent and when they are dependent.  相似文献   

8.
In this paper, we describe models for dependent multivariate survival data using finite mixtures of positive stable frailty distributions. We investigate the cross-ratio function as a local measure of association. We estimate the parameters in the stable mixture together with the parameters of the (conditional) proportional hazards model in a Bayesian framework using Markov chain Monte Carlo algorithms. We illustrate the methodology using data on kidney infections.  相似文献   

9.
We propose a new test for independence of error and covariate in a nonparametric regression model. The test statistic is based on a kernel estimator for the L2-distance between the conditional distribution and the unconditional distribution of the covariates. In contrast to tests so far available in literature, the test can be applied in the important case of multivariate covariates. It can also be adjusted for models with heteroscedastic variance. Asymptotic normality of the test statistic is shown. Simulation results and a real data example are presented.  相似文献   

10.
When dealing with risk models the typical assumption of independence among claim size distributions is not always satisfied. Here we consider the case when the claim sizes are exchangeable and study the implications when constructing aggregated claims through compound Poisson‐type processes. In particular, exchangeability is achieved through conditional independence, using parametric and nonparametric measures for the conditioning distribution. Bayes' theorem is employed to ensure an arbitrary but fixed marginal distribution for the claim sizes. A full Bayesian analysis of the proposed model is illustrated with a panel‐type data set coming from a Medical Expenditure Panel Survey (MEPS). Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.

In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models are indeed special cases of the new models. Backfitting estimates and the corresponding modified EM algorithms are proposed to achieve optimal convergence rates for both parametric and nonparametric parts. We establish the identifiability results of the proposed two models and investigate the asymptotic properties of the proposed estimation procedures. Simulation studies are conducted to demonstrate the finite sample performance of the proposed models. Two real data applications using the new models reveal some interesting findings.

  相似文献   

12.
When actuaries face the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or a homeowner’s insurance policy, they usually assume that types of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different multivariate Poisson regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to date, mainly because of their computational difficulties. Bayesian inference based on MCMC helps to resolve this problem (and also allows us to derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claim. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models together with their zero-inflated versions.  相似文献   

13.
We consider normal ≡ Gaussian seemingly unrelated regressions (SUR) with incomplete data (ID). Imposing a natural minimal set of conditional independence constraints, we find a restricted SUR/ID model whose likelihood function and parameter space factor into the product of the likelihood functions and the parameter spaces of standard complete data multivariate analysis of variance models. Hence, the restricted model has a unimodal likelihood and permits explicit likelihood inference. In the development of our methodology, we review and extend existing results for complete data SUR models and the multivariate ID problem.  相似文献   

14.
Bayesian networks are one of the most widely used tools for modeling multivariate systems. It has been demonstrated that more expressive models, which can capture additional structure in each conditional probability table (CPT), may enjoy improved predictive performance over traditional Bayesian networks despite having fewer parameters. Here we investigate this phenomenon for models of various degree of expressiveness on both extensive synthetic and real data. To characterize the regularities within CPTs in terms of independence relations, we introduce the notion of partial conditional independence (PCI) as a generalization of the well-known concept of context-specific independence (CSI). To model the structure of the CPTs, we use different graph-based representations which are convenient from a learning perspective. In addition to the previously studied decision trees and graphs, we introduce the concept of PCI-trees as a natural extension of the CSI-based trees. To identify plausible models we use the Bayesian score in combination with a greedy search algorithm. A comparison against ordinary Bayesian networks shows that models with local structures in general enjoy parametric sparsity and improved out-of-sample predictive performance, however, often it is necessary to regulate the model fit with an appropriate model structure prior to avoid overfitting in the learning process. The tree structures, in particular, lead to high quality models and suggest considerable potential for further exploration.  相似文献   

15.
Exceedances over high thresholds are often modeled by fitting a generalized Pareto distribution (GPD) on R+. It is difficult to select the threshold, above which the GPD assumption is enough solid and enough data is available for inference. We suggest a new dynamically weighted mixture model, where one term of the mixture is the GPD, and the other is a light-tailed density distribution. The weight function varies on R+ in such a way that for large values the GPD component is predominant and thus takes the role of threshold selection. The full data set is used for inference on the parameters present in the two component distributions and in the weight function. Maximum likelihood provides estimates with approximate standard deviations. Our approach has been successfully applied to simulated data and to the (previously studied) Danish fire loss data set. We compare the new dynamic mixture method to Dupuis' robust thresholding approach in peaks-over-threshold inference. We discuss robustness with respect to the choice of the light-tailed component and the form of the weight function. We present encouraging simulation results that indicate that the new approach can be useful in unsupervised tail estimation, especially in heavy tailed situations and for small percentiles.  相似文献   

16.
This paper considers statistical modeling of the types of claim in a portfolio of insurance policies. For some classes of insurance contracts, in a particular period, it is possible to have a record of whether or not there is a claim on the policy, the types of claims made on the policy, and the amount of claims arising from each of the types. A typical example is automobile insurance where in the event of a claim, we are able to observe the amounts that arise from say injury to oneself, damage to one’s own property, damage to a third party’s property, and injury to a third party. Modeling the frequency and the severity components of the claims can be handled using traditional actuarial procedures. However, modeling the claim-type component is less known and in this paper, we recommend analyzing the distribution of these claim-types using multivariate probit models, which can be viewed as latent variable threshold models for the analysis of multivariate binary data. A recent article by Valdez and Frees [Valdez, E.A., Frees, E.W., Longitudinal modeling of Singapore motor insurance. University of New South Wales and the University of Wisconsin-Madison. Working Paper. Dated 28 December 2005, available from: http://wwwdocs.fce.unsw.edu.au/actuarial/research/papers/2006/Valdez-Frees-2005.pdf] considered this decomposition to extend the traditional model by including the conditional claim-type component, and proposed the multinomial logit model to empirically estimate this component. However, it is well known in the literature that this type of model assumes independence across the different outcomes. We investigate the appropriateness of fitting a multivariate probit model to the conditional claim-type component in which the outcomes may in fact be correlated, with possible inclusion of important covariates. Our estimation results show that when the outcomes are correlated, the multinomial logit model produces substantially different predictions relative to the true predictions; and second, through a simulation analysis, we find that even in ideal conditions under which the outcomes are independent, multinomial logit is still a poor approximation to the true underlying outcome probabilities relative to the multivariate probit model. The results of this paper serve to highlight the trade-off between tractability and flexibility when choosing the appropriate model.  相似文献   

17.
In this paper we raise the matter of considering a stochastic model of the surrender rate instead of the classical S-shaped deterministic curve (in function of the spread), still used in almost all insurance companies. For extreme scenarios, due to the lack of data, it could be tempting to assume that surrenders are conditionally independent with respect to a S-curve disturbance. However, we explain why this conditional independence between policyholders decisions, which has the advantage to be the simplest assumption, looks particularly maladaptive when the spread increases. Indeed the correlation between policyholders decisions is most likely to increase in this situation. We suggest and develop a simple model which integrates those phenomena. With stochastic orders it is possible to compare it to the conditional independence approach qualitatively. In a partially internal Solvency II model, we quantify the impact of the correlation phenomenon on a real life portfolio for a global risk management strategy.  相似文献   

18.
In this article, we propose an unbiased estimating equation approach for a two-component mixture model with correlated response data. We adapt the mixture-of-experts model and a generalized linear model for component distribution and mixing proportion, respectively. The new approach only requires marginal distributions of both component densities and latent variables. We use serial correlations from subjects’ subgroup memberships, which improves estimation efficiency and classification accuracy, and show that estimation consistency does not depend on the choice of the working correlation matrix. The proposed estimating equation is solved by an expectation-estimating-equation (EEE) algorithm. In the E-step of the EEE algorithm, we propose a joint imputation based on the conditional linear property for the multivariate Bernoulli distribution. In addition, we establish asymptotic properties for the proposed estimators and the convergence property using the EEE algorithm. Our method is compared to an existing competitive mixture model approach in both simulation studies and an election data application. Supplementary materials for this article are available online.  相似文献   

19.
Pair-copula Bayesian networks (PCBNs) are a novel class of multivariate statistical models, which combine the distributional flexibility of pair-copula constructions (PCCs) with the parsimony of conditional independence models associated with directed acyclic graphs (DAGs). We are first to provide generic algorithms for random sampling and likelihood inference in arbitrary PCBNs as well as for selecting orderings of the parents of the vertices in the underlying graphs. Model selection of the DAG is facilitated using a version of the well-known PC algorithm that is based on a novel test for conditional independence of random variables tailored to the PCC framework. A simulation study shows the PC algorithm’s high aptitude for structure estimation in non-Gaussian PCBNs. The proposed methods are finally applied to modeling financial return data. Supplementary materials for this article are available online.  相似文献   

20.
Single-index models have found applications in econometrics and biometrics, where multidimensional regression models are often encountered. This article proposes a nonparametric estimation approach that combines wavelet methods for nonequispaced designs with Bayesian models. We consider a wavelet series expansion of the unknown regression function and set prior distributions for the wavelet coefficients and the other model parameters. To ensure model identifiability, the direction parameter is represented via its polar coordinates. We employ ad hoc hierarchical mixture priors that perform shrinkage on wavelet coefficients and use Markov chain Monte Carlo methods for a posteriori inference. We investigate an independence-type Metropolis-Hastings algorithm to produce samples for the direction parameter. Our method leads to simultaneous estimates of the link function and of the index parameters. We present results on both simulated and real data, where we look at comparisons with other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号