首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
利用正则化方法来进行变量选择是近年来研究的热点.在实际应用中解释变量常常以组的形式存在,通常我们希望将重要的组和组内重要的协变量选择出来,即双重变量选择.基于两种非凸惩罚函数SCAD和MCP,分别提出了稀疏Group SCAD和稀疏Group MCP估计方法,通过分块坐标下降迭代算法,达到组内和组间变量同时稀疏的效果.数值模拟结果表明本文提出的两种方法在模型预测和变量选择能力上优于Group Lasso和稀疏Group Lasso算法.并将该算法有效地应用于实际的初生儿体重数据集分析中.  相似文献   

2.
当真实的潜在模型具有稀疏表示时通常需要使用变量选择方法,确定模型中的重要预测因子可提高被拟合模型的预测性能,许多文献研究了这类问题,其中张和吕[1]针对右删失数据开发了一种基于比例风险模型的变量选择方法.本文研究了基于当前状态数据的加法风险模型的变量选择问题.在文献[1]的启发下,我们提出一种自适应Lasso方法来解决...  相似文献   

3.
传统函数型回归模型变量选择方法,忽略了对稀疏函数型数据的讨论.提出了稀疏函数型数据情形下函数型回归模型的变量选择方法,基于条件期望对稀疏函数型自变量进行函数型主成分分析,并以估计的正交特征函数作为基函数对模型进行展开.这种方法可以有效解决对稀疏函数型变量的选择.作为实证分析,选取2002年到2011年全国34个气象观测站的年降水量,月度平均气温,光照时长,湿度,最高气温和最低气温数据,分别比较讨论了密集和稀疏情形下,原始样本和Bootstrap样本的函数型回归模型变量选择的结果,结果显示新方法具有较好的选择效果.  相似文献   

4.
针对存在缺失数据的超高维可加分位回归模型,本文提出一种有效的变量筛选方法.具体而言,将典型相关分析的思想引入到最优变换的最大相关系数,通过协变量和模型残差最优变换后的最大相关系数重要变量的边际贡献进行排序,从而进行变量筛选.然后,在筛选的基础上,利用稀疏光滑惩罚进一步做变量选择.所提变量筛选方法有三点优势:(1)基于最优变换的最大相关可以更全面的反映响应变量对协变量的非线性依赖结构;(2)在迭代过程中利用残差可以获取模型的相关信息,从而提高变量筛选的准确度;(3)变量筛选过程和模型估计分开,可以避免对冗余协变量的回归.在适当的条件下,证明了变量筛选方法的确定性独立筛选性质以及稀疏光滑惩罚下估计量的稀疏性和相合性.同时,通过蒙特卡罗模拟给出了所提方法的表现并通过一组小鼠基因数据说明了所提方法的有效性.  相似文献   

5.
研究具有Log型惩罚函数的稀疏正则化,给出一种新的非凸变量选择及压缩感知策略,提出一种高效快速阈值迭代算法.并通过变量选择问题和稀疏信号重建验证了所提出的Log型稀疏正则化模型的有效性.  相似文献   

6.
稀疏正则化方法在参数重构中起到了越来越重要的作用.与传统的正则化方法相比,稀疏正则化方法能较好地重构稀疏变量.由于稀疏正则化的不可微性,需要对已有的经典算法进行改进.本文构建同伦摄动稀疏正则化方法克服标准稀疏正则化的不可微性,并将该方法应用到基于布莱克一斯科尔斯期权定价模型重构隐含波动率和基于托达罗模型重构政策参数.数值实验表明,所提出的方法是收敛和稳定的.  相似文献   

7.
本文针对带有组结构的广义线性稀疏模型,引入布雷格曼散度作为一般性的损失函数,进行参数估计和变量选择,使得该方法不局限于特定模型或特定的损失函数.本文比较研究了Ridge,SACD,Lasso,自适应Lasso,组Lasso,分层Lasso,自适应分层Lasso和稀疏组Lasso共8种惩罚函数的特点和引入模型后参数估计和...  相似文献   

8.
由于高维数据的稀疏性,导致高维空间中的数据处理方法与低维空间中存在显著差异,合理的变量选择方法是解决高维数据问题的一个前提.从理论方面探讨Logistic模型中参数的MCP方法的Oracle性质,证明了MCP估计具有良好的理论性质.在搜索引擎广告转化率预测模型中,对比了几种不同变量选择方法的预测效果.结果表明MCP方法在处理高维稀疏数据时,准确率最高.通过方法筛选出若干显著影响广告转化率的特征变量,为广告主制定广告策略提供相应的理论依据.  相似文献   

9.
综合考虑主基因效应以及基因间的交互效应对植物选育种的作用是基因组选择研究关注的热点问题之一.目前已有的研究大多忽略了基因的交互效应,这主要是由于考虑交互效应会大大增加备选基因的数目,从而导致已有的统计建模方法不稳定.本文将基因效应与基因间的交互效应同时引入模型,提出三步模型构建方法以达到简化计算和提高模型预测精度的目标.第一步,不考虑具体模型,通过距离相关筛除方法删掉与响应变量显著无关的基因;第二步,在剩下的基因中,利用贝叶斯方法筛选可能的基因;第三步,基于选出的基因,同时考虑单基因效应和交互效应,利用惩罚方法选择模型并估计参数.通过模拟计算说明我们提出的方法与已有的一步模型选择方法相比具有计算简单、稳健、运行时间少并且预测精度高等优点.最后,将本文的方法应用于油菜花数据,实证分析表明,我们提出的方法显著地提高花期性状的预测精度.  相似文献   

10.
Lasso是机器学习中比较常用的一种变量选择方法,适用于具有稀疏性的回归问题.当样本量巨大或者海量的数据存储在不同的机器上时,分布式计算是减少计算时间提高效率的重要方式之一.本文在给出Lasso模型等价优化模型的基础上,将ADMM算法应用到此优化变量可分离的模型中,构造了一种适用于Lasso变量选择的分布式算法,证明了...  相似文献   

11.
This article proposes a Bayesian approach for the sparse group selection problem in the regression model. In this problem, the variables are partitioned into different groups. It is assumed that only a small number of groups are active for explaining the response variable, and it is further assumed that within each active group only a small number of variables are active. We adopt a Bayesian hierarchical formulation, where each candidate group is associated with a binary variable indicating whether the group is active or not. Within each group, each candidate variable is also associated with a binary indicator, too. Thus, the sparse group selection problem can be solved by sampling from the posterior distribution of the two layers of indicator variables. We adopt a group-wise Gibbs sampler for posterior sampling. We demonstrate the proposed method by simulation studies as well as real examples. The simulation results show that the proposed method performs better than the sparse group Lasso in terms of selecting the active groups as well as identifying the active variables within the selected groups. Supplementary materials for this article are available online.  相似文献   

12.
In order to alleviate the staircase effect or the edge blurring in the course of the image denoising, we propose a two-step model based on the duality strategy. In fact, this strategy follows the observation that the dual variable of the restored image can be looked at as the normal vector. So we first obtain the dual variable and then reconstruct the image by fitting the dual variable. Following the augmented Lagrangian strategy, we propose a projection gradient method for solving this two-step model. We also give some convergence analyses of the proposed projection gradient method. Several numerical experiments are tested to compare our proposed model with the ROF model and the LLT model.  相似文献   

13.
Optimal subset selection among a general family of threshold autoregressive moving-average (TARMA) models is considered. The usual complexity of model/order selection is increased by capturing the uncertainty of unknown threshold levels and an unknown delay lag. The Monte Carlo method of Bayesian model averaging provides a possible way to overcome such model uncertainty. Incorporating with the idea of Bayesian model averaging, a modified stochastic search variable selection method is adapted to consider subset selection in TARMA models, by adding latent indicator variables for all potential model lags as part of the proposed Markov chain Monte Carlo sampling scheme. Metropolis–Hastings methods are employed to deal with the well-known difficulty of including moving-average terms in the model and a novel proposal mechanism is designed for this purpose. Bayesian comparison of two hyper-parameter settings is carried out via a simulation study. The results demonstrate that the modified method has favourable performance under reasonable sample size and appropriate settings of the necessary hyper-parameters. Finally, the application to four real datasets illustrates that the proposed method can provide promising and parsimonious models from more than 16 million possible subsets.  相似文献   

14.
An exhaustive search as required for traditional variable selection methods is impractical in high dimensional statistical modeling. Thus, to conduct variable selection, various forms of penalized estimators with good statistical and computational properties, have been proposed during the past two decades. The attractive properties of these shrinkage and selection estimators, however, depend critically on the size of regularization which controls model complexity. In this paper, we consider the problem of consistent tuning parameter selection in high dimensional sparse linear regression where the dimension of the predictor vector is larger than the size of the sample. First, we propose a family of high dimensional Bayesian Information Criteria (HBIC), and then investigate the selection consistency, extending the results of the extended Bayesian Information Criterion (EBIC), in Chen and Chen (2008) to ultra-high dimensional situations. Second, we develop a two-step procedure, the SIS+AENET, to conduct variable selection in p>n situations. The consistency of tuning parameter selection is established under fairly mild technical conditions. Simulation studies are presented to confirm theoretical findings, and an empirical example is given to illustrate the use in the internet advertising data.  相似文献   

15.
In this article, we introduce the Bayesian change point and variable selection algorithm that uses dynamic programming recursions to draw direct samples from a very high-dimensional space in a computationally efficient manner, and apply this algorithm to a geoscience problem that concerns the Earth's history of glaciation. Strong evidence exists for at least two changes in the behavior of the Earth's glaciers over the last five million years. Around 2.7 Ma, the extent of glacial cover on the Earth increased, but the frequency of glacial melting events remained constant at 41 kyr. A more dramatic change occurred around 1 Ma. For over three decades, the “Mid-Pleistocene Transition” has been described in the geoscience literature not only by a further increase in the magnitude of glacial cover, but also as the dividing point between the 41 kyr and the 100 kyr glacial worlds. Given such striking changes in the glacial record, it is clear that a model whose parameters can change through time is essential for the analysis of these data. The Bayesian change point algorithm provides a probabilistic solution to a data segmentation problem, while the exact Bayesian inference in regression procedure performs variable selection within each regime delineated by the change points. Together, they can model a time series in which the predictor variables as well as the parameters of the model are allowed to change with time. Our algorithm allows one to simultaneously perform variable selection and change point analysis in a computationally efficient manner. Supplementary materials including MATLAB code for the Bayesian change point and variable selection algorithm and the datasets described in this article are available online or by contacting the first author.  相似文献   

16.
By using instrumental variable technology and the partial group smoothly clipped absolute deviation penalty method, we propose a variable selection procedure for a class of partially varying coefficient models with endogenous variables. The proposed variable selection method can eliminate the influence of the endogenous variables. With appropriate selection of the tuning parameters, we establish the oracle property of this variable selection procedure. A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

17.
When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

18.
The accelerated failure time model always offers a valuable complement to the traditional Cox proportional hazards model due to its direct and meaningful interpretation. We propose a variable selection method in the context of the accelerated failure time model for survival data, which can simultaneously complete variable selection and parameter estimation. Meanwhile, the proposed method can deal with the potential outliers in survival times as well as heteroscedastic model errors, which are frequently encountered in practice. Specifically, utilizing the general nonconvex penalty, we propose the adaptive penalized weighted least absolute deviation estimator for the accelerated failure time model. Under some regularity conditions, we show that the proposed method yields consistent estimator and possesses the oracle property. In addition, we propose a new algorithm to compute the estimate in the high dimensional settings, and evaluate the practical utility of the proposed method through extensive simulation studies and two real examples.  相似文献   

19.
??When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

20.
Bayesian approaches to prediction and the assessment of predictive uncertainty in generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty. When there are linear dependencies among potential predictor variables in a generalized linear model, existing Markov chain Monte Carlo algorithms for sampling from the posterior distribution on the model and parameter space in Bayesian variable selection problems may not work well. This article describes a sampling algorithm based on the Swendsen-Wang algorithm for the Ising model, and which works well when the predictors are far from orthogonality. In problems of variable selection for generalized linear models we can index different models by a binary parameter vector, where each binary variable indicates whether or not a given predictor variable is included in the model. The posterior distribution on the model is a distribution on this collection of binary strings, and by thinking of this posterior distribution as a binary spatial field we apply a sampling scheme inspired by the Swendsen-Wang algorithm for the Ising model in order to sample from the model posterior distribution. The algorithm we describe extends a similar algorithm for variable selection problems in linear models. The benefits of the algorithm are demonstrated for both real and simulated data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号