首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
Gaussian model selection   总被引:1,自引:0,他引:1  
Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and importance of model selection come from the fact that it provides a suitable approach to many different types of problems, starting from model selection per se (among a family of parametric models, which one is more suitable for the data at hand), which includes for instance variable selection in regression models, to nonparametric estimation, for which it provides a very powerful tool that allows adaptation under quite general circumstances. Our approach to model selection also provides a natural connection between the parametric and nonparametric points of view and copes naturally with the fact that a model is not necessarily true. The method is based on the penalization of a least squares criterion which can be viewed as a generalization of Mallows’C p . A large part of our efforts will be put on choosing properly the list of models and the penalty function for various estimation problems like classical variable selection or adaptive estimation for various types of l p -bodies. Received February 1, 1999 / final version received January 10, 2001?Published online April 3, 2001  相似文献   

2.
Semiparametric partially linear varying coefficient models (SPLVCM) are frequently used in statistical modeling. With high-dimensional covariates both in parametric and nonparametric part for SPLVCM, sparse modeling is often considered in practice. In this paper, we propose a new estimation and variable selection procedure based on modal regression, where the nonparametric functions are approximated by $B$ -spline basis. The outstanding merit of the proposed variable selection procedure is that it can achieve both robustness and efficiency by introducing an additional tuning parameter (i.e., bandwidth $h$ ). Its oracle property is also established for both the parametric and nonparametric part. Moreover, we give the data-driven bandwidth selection method and propose an EM-type algorithm for the proposed method. Monte Carlo simulation study and real data example are conducted to examine the finite sample performance of the proposed method. Both the simulation results and real data analysis confirm that the newly proposed method works very well.  相似文献   

3.
Semiparametric models with diverging number of predictors arise in many contemporary scientific areas.Variable selection for these models consists of two components:model selection for non-parametric components and selection of significant variables for the parametric portion.In this paper,we consider a variable selection procedure by combining basis function approximation with SCAD penalty.The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components.With appropriate selection of tuning parameters,we establish the consistency and sparseness of this procedure.  相似文献   

4.
Semiparametric linear transformation models have received much attention due to their high flexibility in modeling survival data. A useful estimating equation procedure was recently proposed by Chen et al. (2002) [21] for linear transformation models to jointly estimate parametric and nonparametric terms. They showed that this procedure can yield a consistent and robust estimator. However, the problem of variable selection for linear transformation models has been less studied, partially because a convenient loss function is not readily available under this context. In this paper, we propose a simple yet powerful approach to achieve both sparse and consistent estimation for linear transformation models. The main idea is to derive a profiled score from the estimating equation of Chen et al. [21], construct a loss function based on the profile scored and its variance, and then minimize the loss subject to some shrinkage penalty. Under regularity conditions, we have shown that the resulting estimator is consistent for both model estimation and variable selection. Furthermore, the estimated parametric terms are asymptotically normal and can achieve a higher efficiency than that yielded from the estimation equations. For computation, we suggest a one-step approximation algorithm which can take advantage of the LARS and build the entire solution path efficiently. Performance of the new procedure is illustrated through numerous simulations and real examples including one microarray data.  相似文献   

5.
Abstract

An essential feature of longitudinal data is the existence of autocorrelation among the observations from the same unit or subject. Two-stage random-effects linear models are commonly used to analyze longitudinal data. These models are not flexible enough, however, for exploring the underlying data structures and, especially, for describing time trends. Semi-parametric models have been proposed recently to accommodate general time trends. But these semi-parametric models do not provide a convenient way to explore interactions among time and other covariates although such interactions exist in many applications. Moreover, semi-parametric models require specifying the design matrix of the covariates (time excluded). We propose nonparametric models to resolve these issues. To fit nonparametric models, we use the novel technique of the multivariate adaptive regression splines for the estimation of mean curve and then apply an EM-like iterative procedure for covariance estimation. After giving a general algorithm of model building, we show how to design a fast algorithm. We use both simulated and published data to illustrate the use of our proposed method.  相似文献   

6.
In this paper we discuss variable selection in a class of single-index models in which we do not assume the error term as additive. Following the idea of sufficient dimension reduction, we first propose a unified method to recover the direction, then reformulate it under the least square framework. Differing from many other existing results associated with nonparametric smoothing methods for density function, the bandwidth selection in our proposed kernel function essentially has no impact on its root-n consistency or asymptotic normality. To select the important predictors, we suggest using the adaptive lasso method which is computationally efficient. Under some regularity conditions, the adaptive lasso method enjoys the oracle property in a general class of single-index models. In addition, the resulting estimation is shown to be asymptotically normal, which enables us to construct a confidence region for the estimated direction. The asymptotic results are augmented through comprehensive simulations, and illustrated by an analysis of air pollution data.  相似文献   

7.
The semilinear in-slide models (SLIMs) have been shown to be effective methods for normalizing microarray data [J. Fan, P. Tam, G. Vande Woude, Y. Ren, Normalization and analysis of cDNA micro-arrays using within-array replications applied to neuroblastoma cell response to a cytokine, Proceedings of the National Academy of Science (2004) 1135-1140]. Using a backfitting method, [J. Fan, H. Peng, T. Huang, Semilinear high-dimensional model for normalization of microarray data: a theoretical analysis and partial consistency, Journal of American Statistical Association, 471, (2005) 781-798] proposed a profile least squares (PLS) estimation for the parametric and nonparametric components. The general asymptotic properties for their estimator is not developed. In this paper, we consider a new approach, two-stage estimation, which enables us to establish the asymptotic normalities for both of the parametric and nonparametric component estimators. We further propose a plug-in bandwidth selector using the asymptotic normality of the nonparametric component estimator. The proposed method allow for the modeling of the aggregated SLIMs case where we can explicitly show that taking the aggregated information into account can improve both of the parametric and nonparametric component estimator by the proposed two-stage approach. Some simulation studies are conducted to illustrate the finite sample performance of the proposed procedures.  相似文献   

8.
This paper develops a robust and efficient estimation procedure for quantile partially linear additive models with longitudinal data, where the nonparametric components are approximated by B spline basis functions. The proposed approach can incorporate the correlation structure between repeated measures to improve estimation efficiency. Moreover, the new method is empirically shown to be much more efficient and robust than the popular generalized estimating equations method for non-normal correlated random errors. However, the proposed estimating functions are non-smooth and non-convex. In order to reduce computational burdens, we apply the induced smoothing method for fast and accurate computation of the parameter estimates and its asymptotic covariance. Under some regularity conditions, we establish the asymptotically normal distribution of the estimators for the parametric components and the convergence rate of the estimators for the nonparametric functions. Furthermore, a variable selection procedure based on smooth-threshold estimating equations is developed to simultaneously identify non-zero parametric and nonparametric components. Finally, simulation studies have been conducted to evaluate the finite sample performance of the proposed method, and a real data example is analyzed to illustrate the application of the proposed method.  相似文献   

9.
多数基于线性混合效应模型的变量选择方法分阶段对固定效应和随机效应进行选择,方法繁琐、易产生模型偏差,且大部分非参数和半参数的线性混合效应模型只涉及非参数部分的光滑度或者固定效应的选择,并未涉及非参变量或随机效应的选择。本文用B样条函数逼近非参数函数部分,从而把半参数线性混合效应模型转化为带逼近误差的线性混合效应模型。对随机效应的协方差矩阵采用改进的乔里斯基分解并重新参数化线性混合效应模型,接着对该模型的极大似然函数施加集群ALASSO惩罚和ALASSO惩罚两类惩罚,该法能实现非参数变量、固定效应和随机效应的联合变量选择,基于该法得出的估计量也满足相合性、稀疏性和Oracle性质。文章最后做了个数值模拟,模拟结果表明,本文提出的估计方法在变量选择的准确性、参数估计的精度两个方面均表现较好。  相似文献   

10.

In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models are indeed special cases of the new models. Backfitting estimates and the corresponding modified EM algorithms are proposed to achieve optimal convergence rates for both parametric and nonparametric parts. We establish the identifiability results of the proposed two models and investigate the asymptotic properties of the proposed estimation procedures. Simulation studies are conducted to demonstrate the finite sample performance of the proposed models. Two real data applications using the new models reveal some interesting findings.

  相似文献   

11.
The varying-coefficient single-index models (VCSIM) have been applied in many fields since they combine the advantages of single-index models and varying-coefficient models. In this paper, their estimation method is proposed based on B-spline approximation technique and two calculation methods can be used. The first one is to directly calculate the parametric and nonparametric parts simultaneously by Newton-Raphson iteration algorithm. The second one is to calculate the two parts by profile method individually. We suggest that the second method is for our preference when the large amount of parameters are involved, otherwise the first method will be more convenient. Two simulated examples are given to illustrate the performances of the proposed estimation methodologies and calculation procedures.  相似文献   

12.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes' promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

13.
Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes’ promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.  相似文献   

14.

In this paper, we investigate the quantile varying coefficient model for longitudinal data, where the unknown nonparametric functions are approximated by polynomial splines and the estimators are obtained by minimizing the quadratic inference function. The theoretical properties of the resulting estimators are established, and they achieve the optimal convergence rate for the nonparametric functions. Since the objective function is non-smooth, an estimation procedure is proposed that uses induced smoothing and we prove that the smoothed estimator is asymptotically equivalent to the original estimator. Moreover, we propose a variable selection procedure based on the regularization method, which can simultaneously estimate and select important nonparametric components and has the asymptotic oracle property. Extensive simulations and a real data analysis show the usefulness of the proposed method.

  相似文献   

15.
This paper focuses on the variable selections for semiparametric varying coefficient partially linear models when the covariates in the parametric and nonparametric components are all measured with errors. A bias-corrected variable selection procedure is proposed by combining basis function approximations with shrinkage estimations. With appropriate selection of the tuning parameters, the consistency of the variable selection procedure and the oracle property of the regularized estimators are established. A simulation study and a real data application are undertaken to evaluate the finite sample performance of the proposed method.  相似文献   

16.
This paper is a survey of recent results on the adaptive robust non parametric methods for the continuous time regression model with the semi-martingale noises with jumps. The noises are modeled by the Lévy processes, the Ornstein–Uhlenbeck processes and semi-Markov processes. We represent the general model selection method and the sharp oracle inequalities methods which provide the robust efficient estimation in the adaptive setting. Moreover, we present the recent results on the improved model selection methods for the nonparametric estimation problems.  相似文献   

17.
This paper reports a robust kernel estimation for fixed design nonparametric regression models. A Stahel-Donoho kernel estimation is introduced, in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points. Based on a local approximation, a computational technique is given to approximate to the incomputable depths of the errors. As a result the new estimator is computationally efficient. The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error. Unlike the depth-weighted estimator for parametric regression models, this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one. Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.  相似文献   

18.
本文在多种复杂数据下, 研究一类半参数变系数部分线性模型的统计推断理论和方法. 首先在纵向数据和测量误差数据等复杂数据下, 研究半参数变系数部分线性模型的经验似然推断问题, 分别提出分组的和纠偏的经验似然方法. 该方法可以有效地处理纵向数据的组内相关性给构造经验似然比函数所带来的困难. 其次在测量误差数据和缺失数据等复杂数据下, 研究模型的变量选择问题, 分别提出一个“纠偏” 的和基于借补值的变量选择方法. 该变量选择方法可以同时选择参数分量及非参数分量中的重要变量, 并且变量选择与回归系数的估计同时进行. 通过选择适当的惩罚参数, 证明该变量选择方法可以相合地识别出真实模型, 并且所得的正则估计具有oracle 性质.  相似文献   

19.

This paper considers estimation and inference in semiparametric quantile regression models when the response variable is subject to random censoring. The paper considers both the cases of independent and dependent censoring and proposes three iterative estimators based on inverse probability weighting, where the weights are estimated from the censoring distribution using the Kaplan–Meier, a fully parametric and the conditional Kaplan–Meier estimators. The paper proposes a computationally simple resampling technique that can be used to approximate the finite sample distribution of the parametric estimator. The paper also considers inference for both the parametric and nonparametric components of the quantile regression model. Monte Carlo simulations show that the proposed estimators and test statistics have good finite sample properties. Finally, the paper contains a real data application, which illustrates the usefulness of the proposed methods.

  相似文献   

20.
Based on the double penalized estimation method,a new variable selection procedure is proposed for partially linear models with longitudinal data.The proposed procedure can avoid the effects of the nonparametric estimator on the variable selection for the parameters components.Under some regularity conditions,the rate of convergence and asymptotic normality of the resulting estimators are established.In addition,to improve efficiency for regression coefficients,the estimation of the working covariance matrix is involved in the proposed iterative algorithm.Some simulation studies are carried out to demonstrate that the proposed method performs well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号