期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Statistical inference using a weighted difference-based series approach for partially linear regression models

Chunrong Ai Jinhong You Yong Zhou 《Journal of multivariate analysis》2011,102(3):601-618

Partially linear regression models with fixed effects are useful tools for making econometric analyses and normalizing microarray data. Baltagi and Li (2002) [7] proposed a computation friendly difference-based series estimation (DSE) for them. We show that the DSE is not asymptotically efficient in most cases and further propose a weighted difference-based series estimation (WDSE). The weights in it do not involve any unknown parameters. The asymptotic properties of the resulting estimators are established for both balanced and unbalanced cases, and it is shown that they achieve a semiparametric efficient boundary. Additionally, we propose a variable selection procedure for identifying significant covariates in the parametric part of the semiparametric fixed-effects regression model. The method is based on a combination of the nonconcave penalization (Fan and Li, 2001 [13]) and weighted difference-based series estimation techniques. The resulting estimators have the oracle property; that is, they can correctly identify the true model as if the true model (the subset of variables with nonvanishing coefficients) were known in advance. Simulation studies are conducted and an application is given to demonstrate the finite sample performance of the proposed procedures. 相似文献

2.

Identification of graphical models for nonignorable nonresponse of binary outcomes in longitudinal studies

Wen-Qing MaZhi Geng Yong-Hua Hu 《Journal of multivariate analysis》2003,87(1):24-45

In this paper, we use directed acyclic graphs (DAGs) with temporal structure to describe models of nonignorable nonresponse mechanisms for binary outcomes in longitudinal studies, and we discuss identification of these models under an assumption that the sequence of variables has the first-order Markov dependence, that is, the future variables are independent of the past variables conditional on the present variables. We give a stepwise approach for checking identifiability of DAG models. For an unidentifiable model, we propose adding completely observed variables such that this model becomes identifiable. 相似文献

3.

Conditional information criteria for selecting variables in linear mixed models

Muni S. Srivastava 《Journal of multivariate analysis》2010,101(9):1970-1980

In this paper, we consider the problem of selecting the variables of the fixed effects in the linear mixed models where the random effects are present and the observation vectors have been obtained from many clusters. As the variable selection procedure, here we use the Akaike Information Criterion, AIC. In the context of the mixed linear models, two kinds of AIC have been proposed: marginal AIC and conditional AIC. In this paper, we derive three versions of conditional AIC depending upon different estimators of the regression coefficients and the random effects. Through the simulation studies, it is shown that the proposed conditional AIC’s are superior to the marginal and conditional AIC’s proposed in the literature in the sense of selecting the true model. Finally, the results are extended to the case when the random effects in all the clusters are of the same dimension but have a common unknown covariance matrix. 相似文献

4.

Linear mixed models and penalized least squares 总被引：1，自引：0，他引：1

Douglas M Bates Saikat DebRoy 《Journal of multivariate analysis》2004,91(1):1-17

Linear mixed-effects models are an important class of statistical models that are used directly in many fields of applications and also are used as iterative steps in fitting other types of mixed-effects models, such as generalized linear mixed models. The parameters in these models are typically estimated by maximum likelihood or restricted maximum likelihood. In general, there is no closed-form solution for these estimates and they must be determined by iterative algorithms such as EM iterations or general nonlinear optimization. Many of the intermediate calculations for such iterations have been expressed as generalized least squares problems. We show that an alternative representation as a penalized least squares problem has many advantageous computational properties including the ability to evaluate explicitly a profiled log-likelihood or log-restricted likelihood, the gradient and Hessian of this profiled objective, and an ECME update to refine this objective. 相似文献

5.

Conditional and unconditional methods for selecting variables in linear mixed models

Tatsuya Kubokawa 《Journal of multivariate analysis》2011,102(3):641-660

In the problem of selecting the explanatory variables in the linear mixed model, we address the derivation of the (unconditional or marginal) Akaike information criterion (AIC) and the conditional AIC (cAIC). The covariance matrices of the random effects and the error terms include unknown parameters like variance components, and the selection procedures proposed in the literature are limited to the cases where the parameters are known or partly unknown. In this paper, AIC and cAIC are extended to the situation where the parameters are completely unknown and they are estimated by the general consistent estimators including the maximum likelihood (ML), the restricted maximum likelihood (REML) and other unbiased estimators. We derive, related to AIC and cAIC, the marginal and the conditional prediction error criteria which select superior models in light of minimizing the prediction errors relative to quadratic loss functions. Finally, numerical performances of the proposed selection procedures are investigated through simulation studies. 相似文献

6.

Selecting mixed-effects models based on a generalized information criterion

Wenji Pu Xu-Feng Niu 《Journal of multivariate analysis》2006,97(3):733-758

The generalized information criterion (GIC) proposed by Rao and Wu [A strongly consistent procedure for model selection in a regression problem, Biometrika 76 (1989) 369-374] is a generalization of Akaike's information criterion (AIC) and the Bayesian information criterion (BIC). In this paper, we extend the GIC to select linear mixed-effects models that are widely applied in analyzing longitudinal data. The procedure for selecting fixed effects and random effects based on the extended GIC is provided. The asymptotic behavior of the extended GIC method for selecting fixed effects is studied. We prove that, under mild conditions, the selection procedure is asymptotically loss efficient regardless of the existence of a true model and consistent if a true model exists. A simulation study is carried out to empirically evaluate the performance of the extended GIC procedure. The results from the simulation show that if the signal-to-noise ratio is moderate or high, the percentages of choosing the correct fixed effects by the GIC procedure are close to one for finite samples, while the procedure performs relatively poorly when it is used to select random effects. 相似文献

7.

Minimax estimation for singular linear multivariate models with mixed uncertainty

Alexei R. Pankov 《Journal of multivariate analysis》2007,98(1):145-176

The problem of minimax estimation is examined for the linear multivariate statistically indeterminate observation model with mixed uncertainty. The a priori information on the distributions of model parameters is formulated in terms of second-order moment characteristics. It is shown that in the regular case the minimax estimate is defined explicitly via the solution of the dual optimization problem. For singular models, the method of dual optimization is developed by means of using the Tikhonov regularization techniques. Several particular cases which are widely used in practice are also considered. 相似文献

8.

On sparse estimation for semiparametric linear transformation models

Hao Helen Zhang Wenbin Lu 《Journal of multivariate analysis》2010,101(7):1594-1606

Semiparametric linear transformation models have received much attention due to their high flexibility in modeling survival data. A useful estimating equation procedure was recently proposed by Chen et al. (2002) [21] for linear transformation models to jointly estimate parametric and nonparametric terms. They showed that this procedure can yield a consistent and robust estimator. However, the problem of variable selection for linear transformation models has been less studied, partially because a convenient loss function is not readily available under this context. In this paper, we propose a simple yet powerful approach to achieve both sparse and consistent estimation for linear transformation models. The main idea is to derive a profiled score from the estimating equation of Chen et al. [21], construct a loss function based on the profile scored and its variance, and then minimize the loss subject to some shrinkage penalty. Under regularity conditions, we have shown that the resulting estimator is consistent for both model estimation and variable selection. Furthermore, the estimated parametric terms are asymptotically normal and can achieve a higher efficiency than that yielded from the estimation equations. For computation, we suggest a one-step approximation algorithm which can take advantage of the LARS and build the entire solution path efficiently. Performance of the new procedure is illustrated through numerous simulations and real examples including one microarray data. 相似文献

9.

Estimates of MM type for the multivariate linear model

Nadia L. Kudraszow Ricardo A. Maronna 《Journal of multivariate analysis》2011,102(9):1280-1292

We propose a class of robust estimates for multivariate linear models. Based on the approach of MM-estimation (Yohai 1987, [24]), we estimate the regression coefficients and the covariance matrix of the errors simultaneously. These estimates have both a high breakdown point and high asymptotic efficiency under Gaussian errors. We prove consistency and asymptotic normality assuming errors with an elliptical distribution. We describe an iterative algorithm for the numerical calculation of these estimates. The advantages of the proposed estimates over their competitors are demonstrated through both simulated and real data. 相似文献

10.

Statistical inference for panel data semiparametric partially linear regression models with heteroscedastic errors

Jinhong You Yong Zhou 《Journal of multivariate analysis》2010,101(5):1079-1101

We consider a panel data semiparametric partially linear regression model with an unknown parameter vector for the linear parametric component, an unknown nonparametric function for the nonlinear component, and a one-way error component structure which allows unequal error variances (referred to as heteroscedasticity). We develop procedures to detect heteroscedasticity and one-way error component structure, and propose a weighted semiparametric least squares estimator (WSLSE) of the parametric component in the presence of heteroscedasticity and/or one-way error component structure. This WSLSE is asymptotically more efficient than the usual semiparametric least squares estimator considered in the literature. The asymptotic properties of the WSLSE are derived. The nonparametric component of the model is estimated by the local polynomial method. Some simulations are conducted to demonstrate the finite sample performances of the proposed testing and estimation procedures. An example of application on a set of panel data of medical expenditures in Australia is also illustrated. 相似文献

11.

Fisher information for generalised linear mixed models

M.P. Wand 《Journal of multivariate analysis》2007,98(7):1412-1416

The Fisher information for the canonical link exponential family generalised linear mixed model is derived. The contribution from the fixed effects parameters is shown to have a particularly simple form. 相似文献

12.

Bayesian analysis of multivariate t linear mixed models using a combination of IBF and Gibbs samplers

Wan-Lun Wang Tsai-Hung Fan 《Journal of multivariate analysis》2012,105(1):300-310

The multivariate linear mixed model (MLMM) has become the most widely used tool for analyzing multi-outcome longitudinal data. Although it offers great flexibility for modeling the between- and within-subject correlation among multi-outcome repeated measures, the underlying normality assumption is vulnerable to potential atypical observations. We present a fully Bayesian approach to the multivariate t linear mixed model (MtLMM), which is a robust extension of MLMM with the random effects and errors jointly distributed as a multivariate t distribution. Owing to the introduction of too many hidden variables in the model, the conventional Markov chain Monte Carlo (MCMC) method may converge painfully slowly and thus fails to provide valid inference. To alleviate this problem, a computationally efficient inverse Bayes formulas (IBF) sampler coupled with the Gibbs scheme, called the IBF-Gibbs sampler, is developed and shown to be effective in drawing samples from the target distributions. The issues related to model determination and Bayesian predictive inference for future values are also investigated. The proposed methodologies are illustrated with a real example from an AIDS clinical trial and a careful simulation study. 相似文献

13.

Theory and inference for regression models with missing responses and covariates

Qingxia Chen Ming-Hui Chen 《Journal of multivariate analysis》2008,99(6):1302-1331

In this paper, we carry out an in-depth theoretical investigation for inference with missing response and covariate data for general regression models. We assume that the missing data are missing at random (MAR) or missing completely at random (MCAR) throughout. Previous theoretical investigations in the literature have focused only on missing covariates or missing responses, but not both. Here, we consider theoretical properties of the estimates under three different estimation settings: complete case (CC) analysis, a complete response (CR) analysis that involves an analysis of those subjects with only completely observed responses, and the all case (AC) analysis, which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We carry out a theoretical investigation of the three estimation methods in the normal linear model and analytically characterize the loss of information for each method, as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR or MCAR. In addition, a theoretical investigation of bias for the CC method is also carried out. A simulation study and real dataset are given to illustrate the methodology. 相似文献

14.

A note on bias due to fitting prospective multivariate generalized linear models to categorical outcomes ignoring retrospective sampling schemes

Bhramar Mukherjee Ivy Liu 《Journal of multivariate analysis》2009,100(3):459-472

Outcome-dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome-dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the “cases” based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to such multiple-category outcome data, ignoring the retrospective nature of the sampling design. We first provide a set of necessary and sufficient conditions for the link functions that will allow for equivalence of prospective and retrospective inference for the parameters of interest. We show that for categorical outcomes, prospective-retrospective equivalence does not hold beyond the generalized multinomial logit link. We then derive an approximate expression for the bias incurred when link functions outside this class are used. Most popular models for ordinal response fall outside the multiplicative intercept class and one should be cautious while performing a naive prospective analysis of such data as the bias could be substantial. We illustrate the extent of bias through a real data example, based on the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial by the National Cancer Institute. The simulations based on the real study illustrate that the bias approximations work well in practice. 相似文献

15.

High-Dimensional Mixed Graphical Models

Jie Cheng Tianxi Li Elizaveta Levina Ji Zhu 《Journal of computational and graphical statistics》2017,26(2):367-378

While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for datasets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation dataset (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables for the main presentation, we also show that the proposed methodology can be easily extended to general discrete variables. 相似文献

16.

A generalized Mahalanobis distance for mixed data 总被引：1，自引：0，他引：1

A.R. de Leon K.C. Carrière 《Journal of multivariate analysis》2005,92(1):174-185

A distance for mixed nominal, ordinal and continuous data is developed by applying the Kullback-Leibler divergence to the general mixed-data model, an extension of the general location model that allows for ordinal variables to be incorporated in the model. The distance obtained can be considered as a generalization of the Mahalanobis distance to data with a mixture of nominal, ordinal and continuous variables. Moreover, it includes as special cases previous Mahalanobis-type distances developed by Bedrick et al. (Biometrics 56 (2000) 394) and Bar-Hen and Daudin (J. Multivariate Anal. 53 (1995) 332). Asymptotic results regarding the maximum likelihood estimator of the distance are discussed. The results of a simulation study on the level and power of the tests are reported and a real-data example illustrates the method. 相似文献

17.

Strong consistency of least-squares estimation in linear regression models with vague concepts

Volker Krätschmer 《Journal of multivariate analysis》2006,97(3):633-654

Linear regression models with vague concepts extend the classical single equation linear regression models by admitting observations in form of fuzzy subsets instead of real numbers. They have recently been introduced [cf. Krätschmer, Induktive statistik auf basis unscharfer meßkonzepte am beispiel linearer regressionsmodelle, Unpublished Habilitation Monograph, Faculty of Law and Economics of the University of Saarland, Saarbrücken, 2001] to improve the empirical meaningfulness of the relationship between the involved items by a more sensitive attention to the problems of data measurement, in particular the fundamental problem of adequacy. The parameters of such models are still real numbers, and a method of estimation can be applied which extends directly the ordinary least-squares method. This paper deals with some first asymptotic properties of estimators obtained by the method. Firstly, strong consistency will be shown, and secondly, the convergence rate will be investigated. The later result will be the starting point for a future study which will calculate the limit distributions of the estimators. 相似文献

18.

The evaluation of variance component estimation software: generating benchmark problems by exact and approximate methods

Jörg Wensch Monika Wensch-Dorendorf Hermann H. Swalve 《Computational Statistics》2013,28(4):1725-1748

The prediction of breeding values depends on the reliable estimation of variance components. This complex task leads to nonlinear minimization problems that have to be solved by numerical algorithms. In order to evaluate the reliability of these algorithms benchmark problems have to be constructed where the exact solution is a priori known. We develop techniques to construct such benchmark problems for mixed models including fixed and random effects, ANOVA, ML and REML predictors, balanced and unbalanced data for 1-way classification. Besides the construction of artificial data that produce the desired variance components we describe a projection method to construct benchmark data from simulated data. We discuss the cases where exact expressions for the projection can be given and where a numerical approximation procedure has to be used. 相似文献

19.

Corrected version of AIC for selecting multivariate normal linear regression models in a general nonnormal case

Hirokazu Yanagihara 《Journal of multivariate analysis》2006,97(5):1070-1089

This paper deals with the bias reduction of Akaike information criterion (AIC) for selecting variables in multivariate normal linear regression models when the true distribution of observation is an unknown nonnormal distribution. We propose a corrected version of AIC which is partially constructed by the jackknife method and is adjusted to the exact unbiased estimator of the risk when the candidate model includes the true model. It is pointed out that the influence of nonnormality in the bias of our criterion is smaller than the ones in AIC and TIC. We verify that our criterion is better than the AIC, TIC and EIC by conducting numerical experiments. 相似文献

20.

Bayesian modeling of several covariance matrices and some results on propriety of the posterior for linear regression with correlated and/or heterogeneous errors

Michael J. Daniels 《Journal of multivariate analysis》2006,97(5):1185-1207

We explore simultaneous modeling of several covariance matrices across groups using the spectral (eigenvalue) decomposition and modified Cholesky decomposition. We introduce several models for covariance matrices under different assumptions about the mean structure. We consider ‘dependence’ matrices, which tend to have many parameters, as constant across groups and/or parsimoniously modeled via a regression formulation. For ‘variances’, we consider both unrestricted across groups and more parsimoniously modeled via log-linear models. In all these models, we explore the propriety of the posterior when improper priors are used on the mean and ‘variance’ parameters (and in some cases, on components of the ‘dependence’ matrices). The models examined include several common Bayesian regression models, whose propriety has not been previously explored, as special cases. We propose a simple approach to weaken the assumption of constant dependence matrices in an automated fashion and describe how to compute Bayes factors to test the hypothesis of constant ‘dependence’ across groups. The models are applied to data from two longitudinal clinical studies. 相似文献