首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Regression density estimation is the problem of flexibly estimating a response distribution as a function of covariates. An important approach to regression density estimation uses finite mixture models and our article considers flexible mixtures of heteroscedastic regression (MHR) models where the response distribution is a normal mixture, with the component means, variances, and mixture weights all varying as a function of covariates. Our article develops fast variational approximation (VA) methods for inference. Our motivation is that alternative computationally intensive Markov chain Monte Carlo (MCMC) methods for fitting mixture models are difficult to apply when it is desired to fit models repeatedly in exploratory analysis and model choice. Our article makes three contributions. First, a VA for MHR models is described where the variational lower bound is in closed form. Second, the basic approximation can be improved by using stochastic approximation (SA) methods to perturb the initial solution to attain higher accuracy. Third, the advantages of our approach for model choice and evaluation compared with MCMC-based approaches are illustrated. These advantages are particularly compelling for time series data where repeated refitting for one-step-ahead prediction in model choice and diagnostics and in rolling-window computations is very common. Supplementary materials for the article are available online.  相似文献   

2.
A mixture approach to clustering is an important technique in cluster analysis. A mixture of multivariate multinomial distributions is usually used to analyze categorical data with latent class model. The parameter estimation is an important step for a mixture distribution. Described here are four approaches to estimating the parameters of a mixture of multivariate multinomial distributions. The first approach is an extended maximum likelihood (ML) method. The second approach is based on the well-known expectation maximization (EM) algorithm. The third approach is the classification maximum likelihood (CML) algorithm. In this paper, we propose a new approach using the so-called fuzzy class model and then create the fuzzy classification maximum likelihood (FCML) approach for categorical data. The accuracy, robustness and effectiveness of these four types of algorithms for estimating the parameters of multivariate binomial mixtures are compared using real empirical data and samples drawn from the multivariate binomial mixtures of two classes. The results show that the proposed FCML algorithm presents better accuracy, robustness and effectiveness. Overall, the FCML algorithm has the superiority over the ML, EM and CML algorithms. Thus, we recommend FCML as another good tool for estimating the parameters of mixture multivariate multinomial models.  相似文献   

3.
In model-based clustering and classification, the cluster-weighted model is a convenient approach when the random vector of interest is constituted by a response variable $Y$ and by a vector ${\varvec{X}}$ of $p$ covariates. However, its applicability may be limited when $p$ is high. To overcome this problem, this paper assumes a latent factor structure for ${\varvec{X}}$ in each mixture component, under Gaussian assumptions. This leads to the cluster-weighted factor analyzers (CWFA) model. By imposing constraints on the variance of $Y$ and the covariance matrix of ${\varvec{X}}$ , a novel family of sixteen CWFA models is introduced for model-based clustering and classification. The alternating expectation-conditional maximization algorithm, for maximum likelihood estimation of the parameters of all models in the family, is described; to initialize the algorithm, a 5-step hierarchical procedure is proposed, which uses the nested structures of the models within the family and thus guarantees the natural ranking among the sixteen likelihoods. Artificial and real data show that these models have very good clustering and classification performance and that the algorithm is able to recover the parameters very well.  相似文献   

4.
Finite mixture regression (FMR) models are frequently used in statistical modeling, often with many covariates with low significance. Variable selection techniques can be employed to identify the covariates with little influence on the response. The problem of variable selection in FMR models is studied here. Penalized likelihood-based approaches are sensitive to data contamination, and their efficiency may be significantly reduced when the model is slightly misspecified. We propose a new robust variable selection procedure for FMR models. The proposed method is based on minimum-distance techniques, which seem to have some automatic robustness to model misspecification. We show that the proposed estimator has the variable selection consistency and oracle property. The finite-sample breakdown point of the estimator is established to demonstrate its robustness. We examine small-sample and robustness properties of the estimator using a Monte Carlo study. We also analyze a real data set.  相似文献   

5.
Various random effects models have been developed for clustered binary data; however, traditional approaches to these models generally rely heavily on the specification of a continuous random effect distribution such as Gaussian or beta distribution. In this article, we introduce a new model that incorporates nonparametric unobserved random effects on unit interval (0,1) into logistic regression multiplicatively with fixed effects. This new multiplicative model setup facilitates prediction of our nonparametric random effects and corresponding model interpretations. A distinctive feature of our approach is that a closed-form expression has been derived for the predictor of nonparametric random effects on unit interval (0,1) in terms of known covariates and responses. A quasi-likelihood approach has been developed in the estimation of our model. Our results are robust against random effects distributions from very discrete binary to continuous beta distributions. We illustrate our method by analyzing recent large stock crash data in China. The performance of our method is also evaluated through simulation studies.  相似文献   

6.
The paper proposes a latent class version of Combination of Uniform and (shifted) Binomial random variables ( CUB ) models for ordinal data to account for unobserved heterogeneity. The extension, called  LC-CUB , is useful when the heterogeneity is originated by clusters of respondents not identified by covariates: this may generate a multimodal response distribution, which cannot be adequately described by a standard  CUB model. The  LC-CUB model is a finite mixture of  CUB models yielding a multimodal theoretical distribution. Model identification is achieved by constraining the uncertainty parameters to be constant across latent classes. A simulation experiment shows the performance of the maximum likelihood estimator, whereas the usefulness of the approach is illustrated by means of a case study on political self-placement measured on an ordinal scale.  相似文献   

7.
Abstract

The primary model for cluster analysis is the latent class model. This model yields the mixture likelihood. Due to numerous local maxima, the success of the EM algorithm in maximizing the mixture likelihood depends on the initial starting point of the algorithm. In this article, good starting points for the EM algorithm are obtained by applying classification methods to randomly selected subsamples of the data. The performance of the resulting two-step algorithm, classification followed by EM, is compared to, and found superior to, the baseline algorithm of EM started from a random partition of the data. Though the algorithm is not complicated, comparing it to the baseline algorithm and assessing its performance with several classification methods is nontrivial. The strategy employed for comparing the algorithms is to identify canonical forms for the easiest and most difficult datasets to cluster within a large collection of cluster datasets and then to compare the performance of the two algorithms on these datasets. This has led to the discovery that, in the case of three homogeneous clusters, the most difficult datasets to cluster are those in which the clusters are arranged on a line and the easiest are those in which the clusters are arranged on an equilateral triangle. The performance of the two-step algorithm is assessed using several classification methods and is shown to be able to cluster large, difficult datasets consisting of three highly overlapping clusters arranged on a line with 10,000 observations and 8 variables.  相似文献   

8.
In this paper we present a discrete survival model with covariates and random effects, where the random effects may depend on the observed covariates. The dependence between the covariates and the random effects is modelled through correlation parameters, and these parameters can only be identified for time-varying covariates. For time-varying covariates, however, it is possible to separate regression effects and selection effects in the case of a certain dependene structure between the random effects and the time-varying covariates that are assumed to be conditionally independent given the initial level of the covariate. The proposed model is equivalent to a model with independent random effects and the initial level of the covariates as further covariates. The model is applied to simulated data that illustrates some identifiability problems, and further indicate how the proposed model may be an approximation to retrospectively collected data with incorrect specification of the waiting times. The model is fitted by maximum likelihood estimation that is implemented as iteratively reweighted least squares. © 1998 John Wiley & Sons, Ltd.  相似文献   

9.
To study the effect of methadone treatment in reducing multiple drug uses while controlling for their joint dependency and non-random dropout, we propose a bivariate binary model with a separate informative dropout (ID) model. In the model, the logit of the probabilities of each type of drug-use and dropout indicator as well as the log of the odds ratio of both drug-uses are linear in some covariates and outcomes. The model allows the evaluation of the joint probabilities of bivariate outcomes. To account for the heterogeneity of drug use across patients, the model is further extended to incorporate mixture and random effects. Parameter estimation is conducted using a Bayesian approach and is demonstrated using a methadone treatment data. A simulation experiment is conducted to evaluate the effect of including an ID modeling to parameters in the outcome models.  相似文献   

10.
Degradation data have been widely used to estimate product reliability. Because of technology advancement, time‐varying usage and environmental variables, which are called dynamic covariates, can be easily recorded nowadays, in addition to the traditional degradation measurements. The use of dynamic covariates is appealing because they have the potential to explain more variability in degradation paths. We propose a class of general path models to incorporate dynamic covariates for modeling of degradation paths. Physically motivated nonlinear functions are used to describe the degradation paths, and random effects are used to describe unit‐to‐unit variability. The covariate effects are modeled by shape‐restricted splines. The estimation of unknown model parameters is challenging because of the involvement of nonlinear relationships, random effects, and shaped‐restricted splines. We develop an efficient procedure for parameter estimations. The performance of the proposed method is evaluated by simulations. An outdoor coating weathering dataset is used to illustrate the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

11.
We present a unified semiparametric Bayesian approach based on Markov random field priors for analyzing the dependence of multicategorical response variables on time, space and further covariates. The general model extends dynamic, or state space, models for categorical time series and longitudinal data by including spatial effects as well as nonlinear effects of metrical covariates in flexible semiparametric form. Trend and seasonal components, different types of covariates and spatial effects are all treated within the same general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference is fully Bayesian and uses MCMC techniques for posterior analysis. The approach in this paper is based on latent semiparametric utility models and is particularly useful for probit models. The methods are illustrated by applications to unemployment data and a forest damage survey.  相似文献   

12.
为了分析删失数据,该文考虑变系数部分线性模型,此模型允许协变量对响应变量存在非线性影响.响应变量与协变量之间关系的统计模型通过线性结构来拟合是非常重要而且有益.对于删失数据,常用的统计方法不能直接应用于此模型.该文首先提出一类数据变换用以建立无偏条件期望.然后利用profile最小二乘方法,给出了模型中参数分量和非参数分量的profile最小二乘估计,并建立了这些估计的渐近正态性.最后通过数值例子来说明该文所提出的方法的有效性.  相似文献   

13.
In CUB models the uncertainty of choice is explicitly modelled as a Combination of discrete Uniform and shifted Binomial random variables. The basic concept to model the response as a mixture of a deliberate choice of a response category and an uncertainty component that is represented by a uniform distribution on the response categories is extended to a much wider class of models. The deliberate choice can in particular be determined by classical ordinal response models as the cumulative and adjacent categories model. Then one obtains the traditional and flexible models as special cases when the uncertainty component is irrelevant. It is shown that the effect of explanatory variables is underestimated if the uncertainty component is neglected in a cumulative type mixture model. Visualization tools for the effects of variables are proposed and the modelling strategies are evaluated by use of real data sets. It is demonstrated that the extended class of models frequently yields better fit than classical ordinal response models without an uncertainty component.  相似文献   

14.
The paper introduces a methodology for visualizing on a dimension reduced subspace the classification structure and the geometric characteristics induced by an estimated Gaussian mixture model for discriminant analysis. In particular, we consider the case of mixture of mixture models with varying parametrization which allow for parsimonious models. The approach is an extension of an existing work on reducing dimensionality for model-based clustering based on Gaussian mixtures. Information on the dimension reduction subspace is provided by the variation on class locations and, depending on the estimated mixture model, on the variation on class dispersions. Projections along the estimated directions provide summary plots which help to visualize the structure of the classes and their characteristics. A suitable modification of the method allows us to recover the most discriminant directions, i.e., those that show maximal separation among classes. The approach is illustrated using simulated and real data.  相似文献   

15.
We propose a two-component graphical chain model, the discrete regression distribution, where a set of discrete random variables is modeled as a response to a set of categorical and continuous covariates. The proposed model is useful for modeling a set of discrete variables measured at multiple sites along with a set of continuous and/or discrete covariates. The proposed model allows for joint examination of the dependence structure of the discrete response and observed covariates and also accommodates site-to-site variability. We develop the graphical model properties and theoretical justifications of this model. Our model has several advantages over the traditional logistic normal model used to analyze similar compositional data, including site-specific random effect terms and the incorporation of discrete and continuous covariates.  相似文献   

16.
The case-cohort design is widely used in large epidemiological studies and prevention trials for cost reduction. In such a design, covariates are assembled only for a subcohort which is a random subset of the entire cohort and any additional cases outside the subcohort. In this paper, we discuss the case-cohort analysis with a class of general additive-multiplicative hazard models which includes the commonly used Cox model and additive hazard model as special cases. Two sampling schemes for the subcohort, Bernoulli sampling with arbitrary selection probabilities and stratified simple random sampling with fixed subcohort sizes, are discussed. In each setting, an estimating function is constructed to estimate the regression parameters. The resulting estimator is shown to be consistent and asymptotically normally distributed. The limiting variance-covariance matrix can be consistently estimated by the case-cohort data. A simulation study is conducted to assess the finite sample performances of the proposed method and a real example is provided.  相似文献   

17.
最近可加危险(AH)模型被广泛地应用于生存分析数据,模型的协变量可以假设为时间独立或时间相关的.基于混合治愈模型,有界累计危险治愈模型和"不正确"的比例危险模型.本文将上述的可乘危险模型延伸到可加的危险模型,这里的模型可以允许含治愈部分的生存数据的存在."不正确"的AH模型的识别和参数估计也将在本文给出讨论.  相似文献   

18.
Mathematical programming (MP) discriminant analysis models can be used to develop classification models for assigning observations of unknown class membership to one of a number of specified classes using values of a set of features associated with each observation. Since most MP discriminant analysis models generate linear discriminant functions, these MP models are generally used to develop linear classification models. Nonlinear classifiers may, however, have better classification performance than linear classifiers. In this paper, a mixed integer programming model is developed to generate nonlinear discriminant functions composed of monotone piecewise-linear marginal utility functions for each feature and the cut-off value for class membership. It is also shown that this model can be extended for feature selection. The performance of this new MP model for two-group discriminant analysis is compared with statistical discriminant analysis and other MP discriminant analysis models using a real problem and a number of simulated problem sets.  相似文献   

19.
The main challenge in working with gene expression microarrays is that the sample size is small compared to the large number of variables (genes). In many studies, the main focus is on finding a small subset of the genes, which are the most important ones for differentiating between different types of cancer, for simpler and cheaper diagnostic arrays. In this paper, a sparse Bayesian variable selection method in probit model is proposed for gene selection and classification. We assign a sparse prior for regression parameters and perform variable selection by indexing the covariates of the model with a binary vector. The correlation prior for the binary vector assigned in this paper is able to distinguish models with the same size. The performance of the proposed method is demonstrated with one simulated data and two well known real data sets, and the results show that our method is comparable with other existing methods in variable selection and classification.  相似文献   

20.
The empirical part of this article is based on a study on car insurance data to explore how global and local geographical effects on frequency and size of claims can be assessed with appropriate statistical methodology. Because these spatial effects have to be modeled and estimated simultaneously with linear and possibly nonlinear effects of further covariates such as age of policy holder, age of car or bonus-malus score, generalized linear models cannot be applied. Also, compared to previous analyses, the geographical information is given by the exact location of the residence of policy holders. Therefore, we employ a new class of geoadditive models, where the spatial component is modeled based on stationary Gaussian random fields, common in geostatistics (Kriging). Statistical inference is carried out by an empirical Bayes or penalized likelihood approach using mixed model technology. The results confirm that the methodological concept provides useful tools for exploratory analyses of the data at hand and in similar situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号