首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
利用正则化方法来进行变量选择是近年来研究的热点.在实际应用中解释变量常常以组的形式存在,通常我们希望将重要的组和组内重要的协变量选择出来,即双重变量选择.基于两种非凸惩罚函数SCAD和MCP,分别提出了稀疏Group SCAD和稀疏Group MCP估计方法,通过分块坐标下降迭代算法,达到组内和组间变量同时稀疏的效果.数值模拟结果表明本文提出的两种方法在模型预测和变量选择能力上优于Group Lasso和稀疏Group Lasso算法.并将该算法有效地应用于实际的初生儿体重数据集分析中.  相似文献   

2.
《数理统计与管理》2019,(6):1014-1025
贝叶斯网络模型作为一种传统有效的大数据图模型,因其具有因果和概率性语义等特点受到学者们的广泛研究。为了解决基于高维数据构建贝叶斯网络的难题,本文提出了一种适用于高维数据的贝叶斯网络结构学习算法—LTB算法,该算法由Lasso、Tabu Search算法和BIC结合。首先,运用Lasso降低协变量的维数,筛选出与目标变量关系密切的协变量将作为贝叶斯网络的顶点。然后,选择Tabu Search作为元启发式算法,选择BIC作为计算得分的方法,两者结合构建全局最优的贝叶斯网络结构。实证分析表明,LTB算法应用于上证综指影响因素的研究,既可以获得上证综指与其影响因素间的因果关系,也可以利用条件概率得到上证综指影响因素间的组合方式。  相似文献   

3.
个人信用评价问题研究中,需要建立较多的虚拟变量作为解释变量.Group Lasso可以将相关的虚拟变量作为组进行整体剔除或保留在模型中.结合具体的个人信贷数据,应用Group Lasso方法进行变量选择建立Logistic模型,并与全模型、向前选择和向后选择建立的Logistic模型进行比较,发现Group Lasso方法建立的模型,在变量解释和预测正确率上,都是最优的.  相似文献   

4.
变量选择是统计学中重要的问题之一,而利用正则化方法来进行变量选择是近年来研究的热点.采用一种迭代光滑L_(1/2)算法,通过增加参数稀疏化阈值条件,使其中绝对值较小的回归参数稀疏为0,从而实现变量选择的功能.将该算法与Lasso(least absolute shrinkage and selection operator),自适应Lasso以及L_(1/2)正则化方法进行比较,数值模拟结果表明该算法同样具有良好的变量选择和预测能力,最后将该算法应用到实际的前列腺数据分析.  相似文献   

5.
改进的自适应Lasso方法在股票市场中的应用   总被引:1,自引:0,他引:1  
《数理统计与管理》2019,(4):750-760
在金融领域,自适应Lasso被广泛的用于股票价格预测模型中的变量选择和参数估计。然而,自适应Lasso是针对非时间序列模型提出的,忽略了时间序列模型特定的结构,比如时间序列模型中通常会出现滞后阶数越靠后,对未来的预测能力越弱的特性,从而,容易造成估计及预测不精确。因此,时间序列模型的变量选择惩罚参数的设计应与滞后阶数相关,即对越靠后的滞后阶数应加上越大的惩罚。为了充分考虑时间序列模型的特性且保留自适应Lasso的优点,本文针对时间序列AR(p)模型提出一种改进的自适应Lasso(MA Lasso)方法,通过在自适应Lasso惩罚基础上乘以一个关于滞后阶数单调不减的函数来达到目标。这样设计的惩罚参数的另一个优点是通过选取特定的惩罚参数,Lasso,自适应Lasso方法都是MA Lasso方法的特例。进一步,对于AR(p)模型中另一个重要参数p的选择问题,本文提出一种改进的BIC模型准则来选择p。最后,将MA Lasso方法应用到中证100指数中,实证分析表明,与Lasso和自适应Lasso相比,MA Lasso选择最简模型且预测效果最佳,即选择最少的预测变量的同时且具有最小的模型预测误差。  相似文献   

6.
在一种基于马尔可夫网络的分布估计算法中,利用解的适应度函数模型表示变量的概率分布.通过相关系数检验建立了一种衡量适应度函数模型有效性的方法,同时该方法也可以用来确定初始种群规模大小.在二维相关变量结构下,将该方法应用于一种多变量生物动态模型的优化问题,结论表明通过该方法可以选择适当的种群规模,保证适应度函数模型的有效性,并提高算法的优化性能.  相似文献   

7.
本文针对带有组结构的广义线性稀疏模型,引入布雷格曼散度作为一般性的损失函数,进行参数估计和变量选择,使得该方法不局限于特定模型或特定的损失函数.本文比较研究了Ridge,SACD,Lasso,自适应Lasso,组Lasso,分层Lasso,自适应分层Lasso和稀疏组Lasso共8种惩罚函数的特点和引入模型后参数估计和...  相似文献   

8.
在使用变量选择方法选出模型后,如何评价模型中变量系数的显著性是统计学重点关注的前沿问题之一.文章从适应性Lasso变量选择方法的选择结果出发,在考虑实践中误差分布多样性的前提下,基于选择事件构造了模型保留变量系数的条件检验统计量,并给出了该统计量的一致收敛性质的证明过程.模拟研究显示,在多种误差分布下所提方法均可进一步优化变量选择结果,有较强的实用价值.应用此方法对CEPS学生数据进行了实证分析,最终选取了学生认知能力等10个变量作为影响中学生成绩的主要因素,为相关研究提供了有益的参考.  相似文献   

9.
本文主要研究分组数据分位数回归模型的变量选择和估计问题.为了充分反映数据的分组信息,需要假定每组数据的回归系数可以分解成共性部分和分组后的个性部分.为了进行变量筛选,本文提出分解系数的Lasso估计,并进一步提出了自适应Lasso估计.在处理相应优化问题时,采用了变换观测矩阵的方法简化问题求解.本文给出了自适应Lasso估计的Oracle性质证明,并且通过数值模拟研究展示了所提方法的有限样本表现.最后,将此方法应用到乳腺浸润癌致病基因的变量筛选上来展示所提方法的实际应用表现.  相似文献   

10.
基于蚁群算法的模糊分类系统设计   总被引:1,自引:0,他引:1  
提出了一种基于最大-最小蚁群算法的模糊分类系统设计方法.该方法通过两个阶段来实现:特征变量选择和模型参数优化.首先采用蚁群算法对特征变量进行选择,得到一组具有较高分辩性能的特征变量,提高模型的解释性;在模型结构确定后,蚁群算法从训练样本中提取信息对模型的参数进行优化,在保证模型精确性的前提下,构造具有较少变量数目及规则数目的模糊模型,实现了精确性与解释性的折衷.最后将本方法运用到Iris和Wine数据样本分类问题中,并将结果与其它方法进行比较,仿真结果证明了该方法的有效性.  相似文献   

11.
The varying-coefficient model is flexible and powerful for modeling the dynamic changes of regression coefficients. We study the problem of variable selection and estimation in this model in the sparse, high-dimensional case. We develop a concave group selection approach for this problem using basis function expansion and study its theoretical and empirical properties. We also apply the group Lasso for variable selection and estimation in this model and study its properties. Under appropriate conditions, we show that the group least absolute shrinkage and selection operator (Lasso) selects a model whose dimension is comparable to the underlying model, regardless of the large number of unimportant variables. In order to improve the selection results, we show that the group minimax concave penalty (MCP) has the oracle selection property in the sense that it correctly selects important variables with probability converging to one under suitable conditions. By comparison, the group Lasso does not have the oracle selection property. In the simulation parts, we apply the group Lasso and the group MCP. At the same time, the two approaches are evaluated using simulation and demonstrated on a data example.  相似文献   

12.
We study the properties of the Lasso in the high-dimensional partially linear model where the number of variables in the linear part can be greater than the sample size. We use truncated series expansion based on polynomial splines to approximate the nonparametric component in this model. Under a sparsity assumption on the regression coefficients of the linear component and some regularity conditions, we derive the oracle inequalities for the prediction risk and the estimation error. We also provide sufficient conditions under which the Lasso estimator is selection consistent for the variables in the linear part of the model. In addition, we derive the rate of convergence of the estimator of the nonparametric function. We conduct simulation studies to evaluate the finite sample performance of variable selection and nonparametric function estimation.  相似文献   

13.
In this paper we study the asymptotic properties of the adaptive Lasso estimate in high-dimensional sparse linear regression models with heteroscedastic errors. It is demonstrated that model selection properties and asymptotic normality of the selected parameters remain valid but with a suboptimal asymptotic variance. A weighted adaptive Lasso estimate is introduced and investigated. In particular, it is shown that the new estimate performs consistent model selection and that linear combinations of the estimates corresponding to the non-vanishing components are asymptotically normally distributed with a smaller variance than those obtained by the “classical” adaptive Lasso. The results are illustrated in a data example and by means of a small simulation study.  相似文献   

14.

We study the asymptotic properties of a new version of the Sparse Group Lasso estimator (SGL), called adaptive SGL. This new version includes two distinct regularization parameters, one for the Lasso penalty and one for the Group Lasso penalty, and we consider the adaptive version of this regularization, where both penalties are weighted by preliminary random coefficients. The asymptotic properties are established in a general framework, where the data are dependent and the loss function is convex. We prove that this estimator satisfies the oracle property: the sparsity-based estimator recovers the true underlying sparse model and is asymptotically normally distributed. We also study its asymptotic properties in a double-asymptotic framework, where the number of parameters diverges with the sample size. We show by simulations and on real data that the adaptive SGL outperforms other oracle-like methods in terms of estimation precision and variable selection.

  相似文献   

15.
The Lasso is a popular model selection and estimation procedure for linear models that enjoys nice theoretical properties. In this paper, we study the Lasso estimator for fitting autoregressive time series models. We adopt a double asymptotic framework where the maximal lag may increase with the sample size. We derive theoretical results establishing various types of consistency. In particular, we derive conditions under which the Lasso estimator for the autoregressive coefficients is model selection consistent, estimation consistent and prediction consistent. Simulation study results are reported.  相似文献   

16.
We propose the Bayesian adaptive Lasso (BaLasso) for variable selection and coefficient estimation in linear regression. The BaLasso is adaptive to the signal level by adopting different shrinkage for different coefficients. Furthermore, we provide a model selection machinery for the BaLasso by assessing the posterior conditional mode estimates, motivated by the hierarchical Bayesian interpretation of the Lasso. Our formulation also permits prediction using a model averaging strategy. We discuss other variants of this new approach and provide a unified framework for variable selection using flexible penalties. Empirical evidence of the attractiveness of the method is demonstrated via extensive simulation studies and data analysis.  相似文献   

17.
Considering a parameter estimation and variable selection problem in logistic regression, we propose Smooth LASSO and Spline LASSO. When the variables is continuous, using Smooth LASSO can select local constant coefficient in each group. However, in some case, the coefficient might be different and change smoothly. Using Spline Lasso to estimate parameter is more appropriate. In this article, we prove the reliability of the model by theory. Finally using coordinate descent algorithm to solve the model. Simulations show that the model works very effectively both in feature selection and prediction accuracy.  相似文献   

18.
分位数变系数模型是一种稳健的非参数建模方法.使用变系数模型分析数据时,一个自然的问题是如何同时选择重要变量和从重要变量中识别常数效应变量.本文基于分位数方法研究具有稳健和有效性的估计和变量选择程序.利用局部光滑和自适应组变量选择方法,并对分位数损失函数施加双惩罚,我们获得了惩罚估计.通过BIC准则合适地选择调节参数,提出的变量选择方法具有oracle理论性质,并通过模拟研究和脂肪实例数据分析来说明新方法的有用性.数值结果表明,在不需要知道关于变量和误差分布的任何信息前提下,本文提出的方法能够识别不重要变量同时能区分出常数效应变量.  相似文献   

19.
??Considering a parameter estimation and variable selection problem in logistic regression, we propose Smooth LASSO and Spline LASSO. When the variables is continuous, using Smooth LASSO can select local constant coefficient in each group. However, in some case, the coefficient might be different and change smoothly. Using Spline Lasso to estimate parameter is more appropriate. In this article, we prove the reliability of the model by theory. Finally using coordinate descent algorithm to solve the model. Simulations show that the model works very effectively both in feature selection and prediction accuracy.  相似文献   

20.
We describe adaptive Markov chain Monte Carlo (MCMC) methods for sampling posterior distributions arising from Bayesian variable selection problems. Point-mass mixture priors are commonly used in Bayesian variable selection problems in regression. However, for generalized linear and nonlinear models where the conditional densities cannot be obtained directly, the resulting mixture posterior may be difficult to sample using standard MCMC methods due to multimodality. We introduce an adaptive MCMC scheme that automatically tunes the parameters of a family of mixture proposal distributions during simulation. The resulting chain adapts to sample efficiently from multimodal target distributions. For variable selection problems point-mass components are included in the mixture, and the associated weights adapt to approximate marginal posterior variable inclusion probabilities, while the remaining components approximate the posterior over nonzero values. The resulting sampler transitions efficiently between models, performing parameter estimation and variable selection simultaneously. Ergodicity and convergence are guaranteed by limiting the adaptation based on recent theoretical results. The algorithm is demonstrated on a logistic regression model, a sparse kernel regression, and a random field model from statistical biophysics; in each case the adaptive algorithm dramatically outperforms traditional MH algorithms. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号