首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

We study the asymptotic properties of a new version of the Sparse Group Lasso estimator (SGL), called adaptive SGL. This new version includes two distinct regularization parameters, one for the Lasso penalty and one for the Group Lasso penalty, and we consider the adaptive version of this regularization, where both penalties are weighted by preliminary random coefficients. The asymptotic properties are established in a general framework, where the data are dependent and the loss function is convex. We prove that this estimator satisfies the oracle property: the sparsity-based estimator recovers the true underlying sparse model and is asymptotically normally distributed. We also study its asymptotic properties in a double-asymptotic framework, where the number of parameters diverges with the sample size. We show by simulations and on real data that the adaptive SGL outperforms other oracle-like methods in terms of estimation precision and variable selection.

  相似文献   

2.
The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high-dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying model, regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.  相似文献   

3.
This article proposes a Bayesian approach for the sparse group selection problem in the regression model. In this problem, the variables are partitioned into different groups. It is assumed that only a small number of groups are active for explaining the response variable, and it is further assumed that within each active group only a small number of variables are active. We adopt a Bayesian hierarchical formulation, where each candidate group is associated with a binary variable indicating whether the group is active or not. Within each group, each candidate variable is also associated with a binary indicator, too. Thus, the sparse group selection problem can be solved by sampling from the posterior distribution of the two layers of indicator variables. We adopt a group-wise Gibbs sampler for posterior sampling. We demonstrate the proposed method by simulation studies as well as real examples. The simulation results show that the proposed method performs better than the sparse group Lasso in terms of selecting the active groups as well as identifying the active variables within the selected groups. Supplementary materials for this article are available online.  相似文献   

4.
We describe adaptive Markov chain Monte Carlo (MCMC) methods for sampling posterior distributions arising from Bayesian variable selection problems. Point-mass mixture priors are commonly used in Bayesian variable selection problems in regression. However, for generalized linear and nonlinear models where the conditional densities cannot be obtained directly, the resulting mixture posterior may be difficult to sample using standard MCMC methods due to multimodality. We introduce an adaptive MCMC scheme that automatically tunes the parameters of a family of mixture proposal distributions during simulation. The resulting chain adapts to sample efficiently from multimodal target distributions. For variable selection problems point-mass components are included in the mixture, and the associated weights adapt to approximate marginal posterior variable inclusion probabilities, while the remaining components approximate the posterior over nonzero values. The resulting sampler transitions efficiently between models, performing parameter estimation and variable selection simultaneously. Ergodicity and convergence are guaranteed by limiting the adaptation based on recent theoretical results. The algorithm is demonstrated on a logistic regression model, a sparse kernel regression, and a random field model from statistical biophysics; in each case the adaptive algorithm dramatically outperforms traditional MH algorithms. Supplementary materials for this article are available online.  相似文献   

5.
In high‐dimensional data settings where p  ? n , many penalized regularization approaches were studied for simultaneous variable selection and estimation. However, with the existence of covariates with weak effect, many existing variable selection methods, including Lasso and its generations, cannot distinguish covariates with weak and no contribution. Thus, prediction based on a subset model of selected covariates only can be inefficient. In this paper, we propose a post selection shrinkage estimation strategy to improve the prediction performance of a selected subset model. Such a post selection shrinkage estimator (PSE) is data adaptive and constructed by shrinking a post selection weighted ridge estimator in the direction of a selected candidate subset. Under an asymptotic distributional quadratic risk criterion, its prediction performance is explored analytically. We show that the proposed post selection PSE performs better than the post selection weighted ridge estimator. More importantly, it improves the prediction performance of any candidate subset model selected from most existing Lasso‐type variable selection methods significantly. The relative performance of the post selection PSE is demonstrated by both simulation studies and real‐data analysis. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

6.
在使用变量选择方法选出模型后,如何评价模型中变量系数的显著性是统计学重点关注的前沿问题之一.文章从适应性Lasso变量选择方法的选择结果出发,在考虑实践中误差分布多样性的前提下,基于选择事件构造了模型保留变量系数的条件检验统计量,并给出了该统计量的一致收敛性质的证明过程.模拟研究显示,在多种误差分布下所提方法均可进一步优化变量选择结果,有较强的实用价值.应用此方法对CEPS学生数据进行了实证分析,最终选取了学生认知能力等10个变量作为影响中学生成绩的主要因素,为相关研究提供了有益的参考.  相似文献   

7.
In this paper we study the asymptotic properties of the adaptive Lasso estimate in high-dimensional sparse linear regression models with heteroscedastic errors. It is demonstrated that model selection properties and asymptotic normality of the selected parameters remain valid but with a suboptimal asymptotic variance. A weighted adaptive Lasso estimate is introduced and investigated. In particular, it is shown that the new estimate performs consistent model selection and that linear combinations of the estimates corresponding to the non-vanishing components are asymptotically normally distributed with a smaller variance than those obtained by the “classical” adaptive Lasso. The results are illustrated in a data example and by means of a small simulation study.  相似文献   

8.
We study the properties of the Lasso in the high-dimensional partially linear model where the number of variables in the linear part can be greater than the sample size. We use truncated series expansion based on polynomial splines to approximate the nonparametric component in this model. Under a sparsity assumption on the regression coefficients of the linear component and some regularity conditions, we derive the oracle inequalities for the prediction risk and the estimation error. We also provide sufficient conditions under which the Lasso estimator is selection consistent for the variables in the linear part of the model. In addition, we derive the rate of convergence of the estimator of the nonparametric function. We conduct simulation studies to evaluate the finite sample performance of variable selection and nonparametric function estimation.  相似文献   

9.
Lasso是机器学习中比较常用的一种变量选择方法,适用于具有稀疏性的回归问题.当样本量巨大或者海量的数据存储在不同的机器上时,分布式计算是减少计算时间提高效率的重要方式之一.本文在给出Lasso模型等价优化模型的基础上,将ADMM算法应用到此优化变量可分离的模型中,构造了一种适用于Lasso变量选择的分布式算法,证明了...  相似文献   

10.
In this paper, we consider the problem of estimating a high dimensional precision matrix of Gaussian graphical model. Taking advantage of the connection between multivariate linear regression and entries of the precision matrix, we propose Bayesian Lasso together with neighborhood regression estimate for Gaussian graphical model. This method can obtain parameter estimation and model selection simultaneously. Moreover, the proposed method can provide symmetric confidence intervals of all entries of the precision matrix.  相似文献   

11.
Considering a parameter estimation and variable selection problem in logistic regression, we propose Smooth LASSO and Spline LASSO. When the variables is continuous, using Smooth LASSO can select local constant coefficient in each group. However, in some case, the coefficient might be different and change smoothly. Using Spline Lasso to estimate parameter is more appropriate. In this article, we prove the reliability of the model by theory. Finally using coordinate descent algorithm to solve the model. Simulations show that the model works very effectively both in feature selection and prediction accuracy.  相似文献   

12.
Bayesian l0‐regularized least squares is a variable selection technique for high‐dimensional predictors. The challenge is optimizing a nonconvex objective function via search over model space consisting of all possible predictor combinations. Spike‐and‐slab (aka Bernoulli‐Gaussian) priors are the gold standard for Bayesian variable selection, with a caveat of computational speed and scalability. Single best replacement (SBR) provides a fast scalable alternative. We provide a link between Bayesian regularization and proximal updating, which provides an equivalence between finding a posterior mode and a posterior mean with a different regularization prior. This allows us to use SBR to find the spike‐and‐slab estimator. To illustrate our methodology, we provide simulation evidence and a real data example on the statistical properties and computational efficiency of SBR versus direct posterior sampling using spike‐and‐slab priors. Finally, we conclude with directions for future research.  相似文献   

13.
The threshold autoregressive model with generalized autoregressive conditionally heteroskedastic (GARCH) specification is a popular nonlinear model that captures the well‐known asymmetric phenomena in financial market data. The switching mechanisms of hysteretic autoregressive GARCH models are different from threshold autoregressive model with GARCH as regime switching may be delayed when the hysteresis variable lies in a hysteresis zone. This paper conducts a Bayesian model comparison among competing models by designing an adaptive Markov chain Monte Carlo sampling scheme. We illustrate the performance of three kinds of criteria by comparing models with fat‐tailed and/or skewed errors: deviance information criteria, Bayesian predictive information, and an asymptotic version of Bayesian predictive information. A simulation study highlights the properties of the three Bayesian criteria and the accuracy as well as their favorable performance as model selection tools. We demonstrate the proposed method in an empirical study of 12 international stock markets, providing evidence to strongly support for both models with skew fat‐tailed innovations. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

14.
??Considering a parameter estimation and variable selection problem in logistic regression, we propose Smooth LASSO and Spline LASSO. When the variables is continuous, using Smooth LASSO can select local constant coefficient in each group. However, in some case, the coefficient might be different and change smoothly. Using Spline Lasso to estimate parameter is more appropriate. In this article, we prove the reliability of the model by theory. Finally using coordinate descent algorithm to solve the model. Simulations show that the model works very effectively both in feature selection and prediction accuracy.  相似文献   

15.
The adaptive lasso is a model selection method shown to be both consistent in variable selection and asymptotically normal in coefficient estimation. The actual variable selection performance of the adaptive lasso depends on the weight used. It turns out that the weight assignment using the OLS estimate (OLS-adaptive lasso) can result in very poor performance when collinearity of the model matrix is a concern. To achieve better variable selection results, we take into account the standard errors of the OLS estimate for weight calculation, and propose two different versions of the adaptive lasso denoted by SEA-lasso and NSEA-lasso. We show through numerical studies that when the predictors are highly correlated, SEA-lasso and NSEA-lasso can outperform OLS-adaptive lasso under a variety of linear regression settings while maintaining the same theoretical properties of the adaptive lasso.  相似文献   

16.
分位数变系数模型是一种稳健的非参数建模方法.使用变系数模型分析数据时,一个自然的问题是如何同时选择重要变量和从重要变量中识别常数效应变量.本文基于分位数方法研究具有稳健和有效性的估计和变量选择程序.利用局部光滑和自适应组变量选择方法,并对分位数损失函数施加双惩罚,我们获得了惩罚估计.通过BIC准则合适地选择调节参数,提出的变量选择方法具有oracle理论性质,并通过模拟研究和脂肪实例数据分析来说明新方法的有用性.数值结果表明,在不需要知道关于变量和误差分布的任何信息前提下,本文提出的方法能够识别不重要变量同时能区分出常数效应变量.  相似文献   

17.
以2008年1月至2014年9月相关数据为基础,利用贝叶斯adaptive Lasso分位数回归(BALQR)模型对影响房地产业信用风险的宏观经济因素进行了分析.结果表明,在任何分位点上,对我国房地产行业信用风险影响最大的均是GDP增长率,其次是CPI增长率和消费者信心指数增长率,其中前者为负向作用后两者为正向作用,而资本市场的景气状态则对房地产行业信用风险基本没有显著作用.不过,在不同分位点上,不同宏观因素在房地产行业的不同信用风险水平上的影响程度又存在差异性.  相似文献   

18.
The Lasso is a popular model selection and estimation procedure for linear models that enjoys nice theoretical properties. In this paper, we study the Lasso estimator for fitting autoregressive time series models. We adopt a double asymptotic framework where the maximal lag may increase with the sample size. We derive theoretical results establishing various types of consistency. In particular, we derive conditions under which the Lasso estimator for the autoregressive coefficients is model selection consistent, estimation consistent and prediction consistent. Simulation study results are reported.  相似文献   

19.
We discuss Bayesian modelling of the delay between dates of diagnosis and settlement of claims in Critical Illness Insurance using a Burr distribution. The data are supplied by the UK Continuous Mortality Investigation and relate to claims settled in the years 1999-2005. There are non-recorded dates of diagnosis and settlement and these are included in the analysis as missing values using their posterior predictive distribution and MCMC methodology. The possible factors affecting the delay (age, sex, smoker status, policy type, benefit amount, etc.) are investigated under a Bayesian approach. A 3-parameter Burr generalised-linear-type model is fitted, where the covariates are linked to the mean of the distribution. Variable selection using Bayesian methodology to obtain the best model with different prior distribution setups for the parameters is also applied. In particular, Gibbs variable selection methods are considered, and results are confirmed using exact marginal likelihood findings and related Laplace approximations. For comparison purposes, a lognormal model is also considered.  相似文献   

20.
生长曲线模型是一个典型的多元线性模型, 在现代统计学上占有重要地位. 文章首先基于Potthoff-Roy变换后的生长曲线模型, 采用自适应LASSO为惩罚函数给出了参数矩阵的惩罚最小二乘估计, 实现了变量的选择. 其次, 基于局部渐近二次估计, 对生长曲线模型的惩罚最小二乘估计给出了统一的近似估计表达式. 接着, 讨论了经过Potthoff-Roy变换后模型的惩罚最小二乘估计, 证明了自适应LASSO具有Oracle性质. 最后对几种变量选择方法进行了数据模拟. 结果表明自适应LASSO效果比较好. 另外, 综合考虑, Potthoff-Roy变换优于拉直变换.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号