首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 188 毫秒
1.
为了拟合纵向数据和其他相关数据,本文提出了变系数混合效应模型(VCMM).该模型运用变系数线性部分来表示协变量对响应变量的影响,而用随机效应来描述纵向数据组内的相关性, 因此,该模型允许协变量和响应变量之间存在十分灵活的泛函关系.文中运用光滑样条来估计均值部分的系数函数,而用限制最大似然的方法同时估计出光滑参数和方差成分,我们还得到了所提估计的计算方法.大量的模拟研究表明对于具有各种协方差结构的变系数混合效应模型,运用本文所提出的方法都能够十分有效地估计出模型中的系数函数和方差成分.  相似文献   

2.
The non-parametric estimation of average causal effects in observational studies often relies on controlling for confounding covariates through smoothing regression methods such as kernel, splines or local polynomial regression. Such regression methods are tuned via smoothing parameters which regulates the amount of degrees of freedom used in the fit. In this paper we propose data-driven methods for selecting smoothing parameters when the targeted parameter is an average causal effect. For this purpose, we propose to estimate the exact expression of the mean squared error of the estimators. Asymptotic approximations indicate that the smoothing parameters minimizing this mean squared error converges to zero faster than the optimal smoothing parameter for the estimation of the regression functions. In a simulation study we show that the proposed data-driven methods for selecting the smoothing parameters yield lower empirical mean squared error than other methods available such as, e.g., cross-validation.  相似文献   

3.
In this article, we consider the problem of estimating the eigenvalues and eigenfunctions of the covariance kernel (i.e., the functional principal components) from sparse and irregularly observed longitudinal data. We exploit the smoothness of the eigenfunctions to reduce dimensionality by restricting them to a lower dimensional space of smooth functions. We then approach this problem through a restricted maximum likelihood method. The estimation scheme is based on a Newton–Raphson procedure on the Stiefel manifold using the fact that the basis coefficient matrix for representing the eigenfunctions has orthonormal columns. We also address the selection of the number of basis functions, as well as that of the dimension of the covariance kernel by a second-order approximation to the leave-one-curve-out cross-validation score that is computationally very efficient. The effectiveness of our procedure is demonstrated by simulation studies and an application to a CD4+ counts dataset. In the simulation studies, our method performs well on both estimation and model selection. It also outperforms two existing approaches: one based on a local polynomial smoothing, and another using an EM algorithm. Supplementary materials including technical details, the R package fpca, and data analyzed by this article are available online.  相似文献   

4.
Lin and Zhang (J. Roy. Statist. Soc. Ser. B 61 (1999) 381) proposed the generalized additive mixed model (GAMM) as a framework for analysis of correlated data, where normally distributed random effects are used to account for correlation in the data, and proposed to use double penalized quasi-likelihood (DPQL) to estimate the nonparametric functions in the model and marginal likelihood to estimate the smoothing parameters and variance components simultaneously. However, the normal distributional assumption for the random effects may not be realistic in many applications, and it is unclear how violation of this assumption affects ensuing inferences for GAMMs. For a particular class of GAMMs, we propose a conditional estimation procedure built on a conditional likelihood for the response given a sufficient statistic for the random effect, treating the random effect as a nuisance parameter, which thus should be robust to its distribution. In extensive simulation studies, we assess performance of this estimator under a range of conditions and use it as a basis for comparison to DPQL to evaluate the impact of violation of the normality assumption. The procedure is illustrated with application to data from the Multicenter AIDS Cohort Study (MACS).  相似文献   

5.
This article presents and compares two approaches of principal component (PC) analysis for two-dimensional functional data on a possibly irregular domain. The first approach applies the singular value decomposition of the data matrix obtained from a fine discretization of the two-dimensional functions. When the functions are only observed at discrete points that are possibly sparse and may differ from function to function, this approach incorporates an initial smoothing step prior to the singular value decomposition. The second approach employs a mixed effects model that specifies the PC functions as bivariate splines on triangulations and the PC scores as random effects. We apply the thin-plate penalty for regularizing the function estimation and develop an effective expectation–maximization algorithm for calculating the penalized likelihood estimates of the parameters. The mixed effects model-based approach integrates scatterplot smoothing and functional PC analysis in a unified framework and is shown in a simulation study to be more efficient than the two-step approach that separately performs smoothing and PC analysis. The proposed methods are applied to analyze the temperature variation in Texas using 100 years of temperature data recorded by Texas weather stations. Supplementary materials for this article are available online.  相似文献   

6.
部分线性模型也就是响应变量关于一个或者多个协变量是线性的, 但对于其他的协变量是非线性的关系\bd 对于部分线性模型中的参数和非参数部分的估计方法, 惩罚最小二乘估计是重要的估计方法之一\bd 对于这种估计方法, 广义交叉验证法提供了一种确定光滑参数的方法\bd 但是, 在部分线性模型中, 用广义交叉验证法确定光滑参数的最优性还没有被证明\bd 本文证明了利用惩罚最小二乘估计对于部分线性模型估计时, 用广义交叉验证法选择光滑参数的最优性\bd 通过模拟验证了本文中所提出的用广义交叉验证法选择光滑参数具有很好的效果, 同时, 本文在模拟部分比较了广义交叉验证和最小二乘交叉验证的优劣.  相似文献   

7.
This paper is concerned with model selection in spline-based generalized linear mixed model. Exploiting the fact that smoothing parameters can be expressed as the reciprocal ratio of the variances of random effect under the setting of estimation by regularization, we propose a computationally efficient model selection procedure. Applications to some real data sets reveal that the proposed method selects reasonable models and is very fast to implement.  相似文献   

8.
Abstract

This article describes an appropriate way of implementing the generalized cross-validation method and some other least-squares-based smoothing parameter selection methods in penalized likelihood regression problems, and explains the rationales behind it. Simulations of limited scale are conducted to back up the semitheoretical analysis.  相似文献   

9.
Automatic model selection for partially linear models   总被引:1,自引:0,他引:1  
We propose and study a unified procedure for variable selection in partially linear models. A new type of double-penalized least squares is formulated, using the smoothing spline to estimate the nonparametric part and applying a shrinkage penalty on parametric components to achieve model parsimony. Theoretically we show that, with proper choices of the smoothing and regularization parameters, the proposed procedure can be as efficient as the oracle estimator [J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of American Statistical Association 96 (2001) 1348–1360]. We also study the asymptotic properties of the estimator when the number of parametric effects diverges with the sample size. Frequentist and Bayesian estimates of the covariance and confidence intervals are derived for the estimators. One great advantage of this procedure is its linear mixed model (LMM) representation, which greatly facilitates its implementation by using standard statistical software. Furthermore, the LMM framework enables one to treat the smoothing parameter as a variance component and hence conveniently estimate it together with other regression coefficients. Extensive numerical studies are conducted to demonstrate the effective performance of the proposed procedure.  相似文献   

10.
Penalized splines, or P-splines, are regression splines fit by least-squares with a roughness penalty.P-splines have much in common with smoothing splines, but the type of penalty used with a P-spline is somewhat more general than for a smoothing spline. Also, the number and location of the knots of a P-spline is not fixed as with a smoothing spline. Generally, the knots of a P-spline are at fixed quantiles of the independent variable and the only tuning parameters to choose are the number of knots and the penalty parameter. In this article, the effects of the number of knots on the performance of P-splines are studied. Two algorithms are proposed for the automatic selection of the number of knots. The myopic algorithm stops when no improvement in the generalized cross-validation statistic (GCV) is noticed with the last increase in the number of knots. The full search examines all candidates in a fixed sequence of possible numbers of knots and chooses the candidate that minimizes GCV.The myopic algorithm works well in many cases but can stop prematurely. The full-search algorithm worked well in all examples examined. A Demmler–Reinsch type diagonalization for computing univariate and additive P-splines is described. The Demmler–Reinsch basis is not effective for smoothing splines because smoothing splines have too many knots. For P-splines, however, the Demmler–Reinsch basis is very useful for super-fast generalized cross-validation.  相似文献   

11.
We propose a two-stage model selection procedure for the linear mixed-effects models. The procedure consists of two steps: First, penalized restricted log-likelihood is used to select the random effects, and this is done by adopting a Newton-type algorithm. Next, the penalized log-likelihood is used to select the fixed effects via pathwise coordinate optimization to improve the computation efficiency. We prove that our procedure has the oracle properties. Both simulation studies and a real data example are carried out to examine finite sample performance of the proposed fixed and random effects selection procedure. Supplementary materials including R code used in this article and proofs for the theorems are available online.  相似文献   

12.
In this paper we present a discrete survival model with covariates and random effects, where the random effects may depend on the observed covariates. The dependence between the covariates and the random effects is modelled through correlation parameters, and these parameters can only be identified for time-varying covariates. For time-varying covariates, however, it is possible to separate regression effects and selection effects in the case of a certain dependene structure between the random effects and the time-varying covariates that are assumed to be conditionally independent given the initial level of the covariate. The proposed model is equivalent to a model with independent random effects and the initial level of the covariates as further covariates. The model is applied to simulated data that illustrates some identifiability problems, and further indicate how the proposed model may be an approximation to retrospectively collected data with incorrect specification of the waiting times. The model is fitted by maximum likelihood estimation that is implemented as iteratively reweighted least squares. © 1998 John Wiley & Sons, Ltd.  相似文献   

13.
This work takes advantage of semiparametric modelling which improves significantly in many situations the estimation accuracy of the purely nonparametric approach. Herein for semiparametric estimations of probability mass function (pmf) of count data, and an unknown count regression function (crf), the kernel used is a binomial one and the bandiwdth selection is investigated by developing Bayesian approaches. About the latter, Bayes local and global bandwidth approaches are used to establish data-driven selection procedures in semiparametric framework. From conjugate beta prior distributions of the smoothing parameter and under the squared errors loss function, Bayes estimate for pmf is obtained in closed form. This is not available for the crf which is computed by the Markov Chain Monte Carlo technique. Simulation studies demonstrate that both proposed methods perform better than the classical cross-validation procedures, in particular the smoothing quality and execution times are optimized. All applications are made on real data sets.  相似文献   

14.
In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. Their prediction performance, however, highly depends on specific tuning parameters, in particular on the number of boosting iterations to perform. This crucial parameter is usually selected via cross-validation. The cross-validation procedure may highly depend on a completely random component, namely the considered fold partition. We empirically study how much this randomness affects the results of the boosting techniques, in terms of selected predictors and prediction ability of the related models. We use four publicly available data sets related to four different diseases. In these studies, the goal is to predict survival end-points when a large number of continuous candidate predictors are available. We focus on two well known boosting approaches implemented in the R-packages CoxBoost and mboost, assuming the validity of the proportional hazards assumption and the linearity of the effects of the predictors. We show that the variability in selected predictors and prediction ability of the model is reduced by averaging over several repetitions of cross-validation in the selection of the tuning parameters.  相似文献   

15.
This paper develops a robust and efficient estimation procedure for quantile partially linear additive models with longitudinal data, where the nonparametric components are approximated by B spline basis functions. The proposed approach can incorporate the correlation structure between repeated measures to improve estimation efficiency. Moreover, the new method is empirically shown to be much more efficient and robust than the popular generalized estimating equations method for non-normal correlated random errors. However, the proposed estimating functions are non-smooth and non-convex. In order to reduce computational burdens, we apply the induced smoothing method for fast and accurate computation of the parameter estimates and its asymptotic covariance. Under some regularity conditions, we establish the asymptotically normal distribution of the estimators for the parametric components and the convergence rate of the estimators for the nonparametric functions. Furthermore, a variable selection procedure based on smooth-threshold estimating equations is developed to simultaneously identify non-zero parametric and nonparametric components. Finally, simulation studies have been conducted to evaluate the finite sample performance of the proposed method, and a real data example is analyzed to illustrate the application of the proposed method.  相似文献   

16.
This article proposes a local smoothing procedure for detecting jump location curves of regression surfaces. This procedure simplifies the computation of some existing jump detectors in the statistical literature. It also generalizes the Sobel edge detector in the image processing literature such that more observations can be used to smooth away random noise in the data. The problem to evaluate the performance of jump detectors is discussed and a new measurement of jump detection performance is suggested.  相似文献   

17.
An essential problem in nonparametric smoothing of noisy data is a proper choice of the bandwidth or window width, which depends on a smoothing parameter $k$ . One way to choose $k$ based on the data is leave-one-out-cross-validation. The selection of the cross-validation criterion is similarly important as the choice of the smoother. Especially, when outliers are present, robust cross-validation criteria are needed. So far little is known about the behaviour of robust cross-validated smoothers in the presence of discontinuities in the regression function. We combine different smoothing procedures based on local constant fits with each of several cross-validation criteria. These combinations are compared in a simulation study under a broad variety of data situations with outliers and abrupt jumps. There is not a single overall best cross-validation criterion, but we find Boente-cross-validation to perform well in case of large percentages of outliers and the Tukey-criterion in case of data situations with jumps, even if the data are contaminated with outliers.  相似文献   

18.
Wiener processes with random effects for degradation data   总被引:12,自引:0,他引:12  
This article studies the maximum likelihood inference on a class of Wiener processes with random effects for degradation data. Degradation data are special case of functional data with monotone trend. The setting for degradation data is one on which n independent subjects, each with a Wiener process with random drift and diffusion parameters, are observed at possible different times. Unit-to-unit variability is incorporated into the model by these random effects. EM algorithm is used to obtain the maximum likelihood estimators of the unknown parameters. Asymptotic properties such as consistency and convergence rate are established. Bootstrap method is used for assessing the uncertainties of the estimators. Simulations are used to validate the method. The model is fitted to bridge beam data and corresponding goodness-of-fit tests are carried out. Failure time distributions in terms of degradation level passages are calculated and illustrated.  相似文献   

19.
In this paper we consider the problem of estimating an unknown joint distribution which is defined over mixed discrete and continuous variables. A nonparametric kernel approach is proposed with smoothing parameters obtained from the cross-validated minimization of the estimator's integrated squared error. We derive the rate of convergence of the cross-validated smoothing parameters to their ‘benchmark’ optimal values, and we also establish the asymptotic normality of the resulting nonparametric kernel density estimator. Monte Carlo simulations illustrate that the proposed estimator performs substantially better than the conventional nonparametric frequency estimator in a range of settings. The simulations also demonstrate that the proposed approach does not suffer from known limitations of the likelihood cross-validation method which breaks down with commonly used kernels when the continuous variables are drawn from fat-tailed distributions. An empirical application demonstrates that the proposed method can yield superior predictions relative to commonly used parametric models.  相似文献   

20.
In the problem of selecting the explanatory variables in the linear mixed model, we address the derivation of the (unconditional or marginal) Akaike information criterion (AIC) and the conditional AIC (cAIC). The covariance matrices of the random effects and the error terms include unknown parameters like variance components, and the selection procedures proposed in the literature are limited to the cases where the parameters are known or partly unknown. In this paper, AIC and cAIC are extended to the situation where the parameters are completely unknown and they are estimated by the general consistent estimators including the maximum likelihood (ML), the restricted maximum likelihood (REML) and other unbiased estimators. We derive, related to AIC and cAIC, the marginal and the conditional prediction error criteria which select superior models in light of minimizing the prediction errors relative to quadratic loss functions. Finally, numerical performances of the proposed selection procedures are investigated through simulation studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号