首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Generalized linear and additive models are very efficient regression tools but many parameters have to be estimated if categorical predictors with many categories are included. The method proposed here focusses on the main effects of categorical predictors by using tree type methods to obtain clusters of categories. When the predictor has many categories one wants to know in particular which of the categories have to be distinguished with respect to their effect on the response. The tree-structured approach allows to detect clusters of categories that share the same effect while letting other predictors, in particular metric predictors, have a linear or additive effect on the response. An algorithm for the fitting is proposed and various stopping criteria are evaluated. The preferred stopping criterion is based on p values representing a conditional inference procedure. In addition, stability of clusters is investigated and the relevance of predictors is investigated by bootstrap methods. Several applications show the usefulness of the tree-structured approach and small simulation studies demonstrate that the fitting procedure works well.  相似文献   

2.
Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.  相似文献   

3.
This paper considers generalized linear models in a data‐rich environment in which a large number of potentially useful explanatory variables are available. In particular, it deals with the case that the sample size and the number of explanatory variables are of similar sizes. We adopt the idea that the relevant information of explanatory variables concerning the dependent variable can be represented by a small number of common factors and investigate the issue of selecting the number of common factors while taking into account the effect of estimated regressors. We develop an information criterion under model mis‐specification for both the distributional and structural assumptions and show that the proposed criterion is a natural extension of the Akaike information criterion (AIC). Simulations and empirical data analysis demonstrate that the proposed new criterion outperforms the AIC and Bayesian information criterion. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
This article presents a Markov chain Monte Carlo algorithm for both variable and covariance selection in the context of logistic mixed effects models. This algorithm allows us to sample solely from standard densities with no additional tuning. We apply a stochastic search variable approach to select explanatory variables as well as to determine the structure of the random effects covariance matrix.

Prior determination of explanatory variables and random effects is not a prerequisite because the definite structure is chosen in a data-driven manner in the course of the modeling procedure. To illustrate the method, we give two bank data examples.  相似文献   

5.
In CUB models the uncertainty of choice is explicitly modelled as a Combination of discrete Uniform and shifted Binomial random variables. The basic concept to model the response as a mixture of a deliberate choice of a response category and an uncertainty component that is represented by a uniform distribution on the response categories is extended to a much wider class of models. The deliberate choice can in particular be determined by classical ordinal response models as the cumulative and adjacent categories model. Then one obtains the traditional and flexible models as special cases when the uncertainty component is irrelevant. It is shown that the effect of explanatory variables is underestimated if the uncertainty component is neglected in a cumulative type mixture model. Visualization tools for the effects of variables are proposed and the modelling strategies are evaluated by use of real data sets. It is demonstrated that the extended class of models frequently yields better fit than classical ordinal response models without an uncertainty component.  相似文献   

6.
This paper presents an extension of the standard regression tree method to clustered data. Previous works extending tree methods to accommodate correlated data are mainly based on the multivariate repeated-measures approach. We propose a “mixed effects regression tree” method where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The proposed method can handle unbalanced clusters, allows observations within clusters to be split, and can incorporate random effects and observation-level covariates. We implemented the proposed method using a standard tree algorithm within the framework of the expectation-maximization (EM) algorithm. The simulation results show that the proposed regression tree method provides substantial improvements over standard trees when the random effects are non negligible. A real data example is used to illustrate the method.  相似文献   

7.
This article presents a likelihood-based boosting approach for fitting binary and ordinal mixed models. In contrast to common procedures, this approach can be used in high-dimensional settings where a large number of potentially influential explanatory variables are available. Constructed as a componentwise boosting method, it is able to perform variable selection with the complexity of the resulting estimator being determined by information criteria. The method is investigated in simulation studies both for cumulative and sequential models and is illustrated by using real datasets. The supplementary materials for the article are available online.  相似文献   

8.
We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not share the same parent originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.  相似文献   

9.
10.
Considering absolute log returns as a proxy for stochastic volatility, the influence of explanatory variables on absolute log returns of ultra high frequency data is analysed. The irregular time structure and time dependency of the data is captured by utilizing a continuous time ARMA(p,q) process. In particular, we propose a mixed effect model class for the absolute log returns. Explanatory variable information is used to model the fixed effects, whereas the error is decomposed in a non‐negative Lévy driven continuous time ARMA(p,q) process and a market microstructure noise component. The parameters are estimated in a state space approach. In a small simulation study the performance of the estimators is investigated. We apply our model to IBM trade data and quantify the influence of bid‐ask spread and duration on a daily basis. To verify the correlation in irregularly spaced data we use the variogram, known from spatial statistics. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

11.
Market baskets arise from consumers’ shopping trips and include items from multiple categories that are frequently chosen interdependently from each other. Explanatory models of multicategory choice behavior explicitly allow for such category purchase dependencies. They typically estimate own and across-category effects of marketing-mix variables on purchase incidences for a predefined set of product categories. Because of analytical restrictions, however, multicategory choice models can only handle a small number of categories. Hence, for large retail assortments, the issue emerges of how to determine the composition of shopping baskets with a meaningful selection of categories. Traditionally, this is resolved by managerial intuition. In this article, we combine multicategory choice models with a data-driven approach for basket selection. The proposed procedure also accounts for customer heterogeneity and thus can serve as a viable tool for designing target marketing programs. A data compression step first derives a set of basket prototypes which are representative for classes of market baskets with internally more distinctive (complementary) cross-category interdependencies and are responsible for the segmentation of households. In a second step, segment-specific cross-category effects are estimated for suitably selected categories using a multivariate logistic modeling framework. In an empirical illustration, significant differences in cross-effects and price elasticities can be shown both across segments and compared to the aggregate model.  相似文献   

12.
Additive hazards model with random effects is proposed for modelling the correlated failure time data when focus is on comparing the failure times within clusters and on estimating the correlation between failure times from the same cluster, as well as the marginal regression parameters. Our model features that, when marginalized over the random effect variable, it still enjoys the structure of the additive hazards model. We develop the estimating equations for inferring the regression parameters. The proposed estimators are shown to be consistent and asymptotically normal under appropriate regularity conditions. Furthermore, the estimator of the baseline hazards function is proposed and its asymptotic properties are also established. We propose a class of diagnostic methods to assess the overall fitting adequacy of the additive hazards model with random effects. We conduct simulation studies to evaluate the finite sample behaviors of the proposed estimators in various scenarios. Analysis of the Diabetic Retinopathy Study is provided as an illustration for the proposed method.  相似文献   

13.
The stochastic behaviour of lifetimes of a two component system is often primarily influenced by the system structure and by the covariates shared by the components. Any meaningful attempt to model the lifetimes must take into consideration the factors affecting their stochastic behaviour. In particular, for a load share system, we describe a reliability model incorporating both the load share dependence and the effect of observed and unobserved covariates. The model includes a bivariate Weibull to characterize load share, a positive stable distribution to describe frailty, and also incorporates effects of observed covariates. We investigate various interesting reliability properties of this model using cross ratio functions and conditional survivor functions. We implement maximum likelihood estimation of the model parameters and discuss model adequacy and selection. We illustrate our approach using a simulation study. For a real data situation, we demonstrate the superiority of the proposed model that incorporates both load share and frailty effects over competing models that incorporate just one of these effects. An attractive and computationally simple cross‐validation technique is introduced to reconfirm the claim. We conclude with a summary and discussion.  相似文献   

14.
Definitive screening designs (DSDs) are a class of experimental designs that allow the estimation of linear, quadratic, and interaction effects with little experimental effort if there is effect sparsity. The number of experimental runs is twice the number of factors of interest plus one. Many industrial experiments involve nonnormal responses. Generalized linear models (GLMs) are a useful alternative for analyzing these kind of data. The analysis of GLMs is based on asymptotic theory, something very debatable, for example, in the case of the DSD with only 13 experimental runs. So far, analysis of DSDs considers a normal response. In this work, we show a five‐step strategy that makes use of tools coming from the Bayesian approach to analyze this kind of experiment when the response is nonnormal. We consider the case of binomial, gamma, and Poisson responses without having to resort to asymptotic approximations. We use posterior odds that effects are active and posterior probability intervals for the effects and use them to evaluate the significance of the effects. We also combine the results of the Bayesian procedure with the lasso estimation procedure to enhance the scope of the method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
A simple but efficient method has been proposed to select variables in het-eroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.  相似文献   

16.
In this paper we investigate the adequacy of the own funds a company requires in order to remain healthy and avoid insolvency. Two methods are applied here; the quantile regression method and the method of mixed effects models. Quantile regression is capable of providing a more complete statistical analysis of the stochastic relationship among random variables than least squares estimation. The estimated mixed effects line can be considered as an internal industry equation (norm), which explains a systematic relation between a dependent variable (such as own funds) with independent variables (e.g. financial characteristics, such as assets, provisions, etc.). The above two methods are implemented with two data sets.  相似文献   

17.
论带有趋势变化的变量的相关:数值试验   总被引:1,自引:0,他引:1  
当计算相关的二个变量都包含有明显的趋势变化成分时,原变量之间的相关特征可能被歪曲(夸大或者缩小).对此问题进行了数值试验,结果表明,变量带有性质相反的趋势变化,会使这二个变量之间的相关系数减小(正相关的数值减小,负相关被夸大).变量带有性质相同的趋势变化,会使这二个变量之间的相关系数增加(正相关被夸大,负相关数值变小).数值试验还表明,趋势变化对相关的影响具有可交换性.只要不改变它们趋势变化的数值,它们叠加的变量互相交换,影响相关系数的后果是一样的;研究还指出,二个变量有相同的变化趋势时,对相关的影响会更大些.给出了实例.  相似文献   

18.
Two-component Poisson mixture regression is typically used to model heterogeneous count outcomes that arise from two underlying sub-populations. Furthermore, a random component can be incorporated into the linear predictor to account for the clustering data structure. However, when including random effects in both components of the mixture model, the two random effects are often assumed to be independent for simplicity. A two-component Poisson mixture regression model with bivariate random effects is proposed to deal with the correlated situation. A restricted maximum quasi-likelihood estimation procedure is provided to obtain the parameter estimates of the model. A simulation study shows both fixed effects and variance component estimates perform well under different conditions. An application to childhood gastroenteritis data demonstrates the usefulness of the proposed methodology, and suggests that neglecting the inherent correlation between random effects may lead to incorrect inferences concerning the count outcomes.  相似文献   

19.
Adjusting for an intermediate variable is a common analytic strategy for estimating a direct effect. Even if the total effect is unconfounded, the direct effect is not identified when unmeasured variables affect the intermediate and outcome variables. This paper focuses on the application of the principal stratification approach for estimating the direct effect of a randomized treatment. The approach is used to evaluate the direct effect of treatment as the difference between the expectations of potential outcomes within latent subgroups of subjects for which the intermediate variable would be constant, regardless of the randomized treatment assignment. To derive an estimator of the direct effect in cases in which the treatment and intermediate variables are dichotomous, we assume that the total effects are consistent between two standard populations. This assumption implies that the total effects are equal between two subpopulations with the same treatment assignment and a different intermediate behavior, or the total effects are equal between two subpopulations with a different treatment assignment and the same intermediate behavior. We show that the direct effect corresponds to the standard intention-to-treat effect under this assumption.  相似文献   

20.
Increasingly, fuzzy partitions are being used in multivariate classification problems as an alternative to the crisp classification procedures commonly used. One such fuzzy partition, the grade of membership model, partitions individuals into fuzzy sets using multivariate categorical data. Although the statistical methods used to estimate fuzzy membership for this model are based on maximum likelihood methods, large sample properties of the estimation procedure are problematic for two reasons. First, the number of incidental parameters increases with the size of the sample. Second, estimated parameters fall on the boundary of the parameter space with non-zero probability. This paper examines the consistency of the likelihood approach when estimating the components of a particular probability model that gives rise to a fuzzy partition. The results of the consistency proof are used to determine the large sample distribution of the estimates. Common methods of classifying individuals based on multivariate observations attempt to place each individual into crisply defined sets. The fuzzy partition allows for individual to individual heterogeneity, beyond simply errors in measurement, by defining a set of pure type characteristics and determining each individual's distance from these pure types. Both the profiles of the pure types and the heterogeneity of the individuals must be estimated from data. These estimates empirically define the fuzzy partition. In the current paper, this data is assumed to be categorical data. Because of the large number of parameters to be estimated and the limitations of categorical data, one may be concerned about whether or not the fuzzy partition can be estimated consistently. This paper shows that if heterogeneity is measured with respect to a fixed number of moments of the grade of membership scores of each individual, the estimated fuzzy partition is consistent.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号