共查询到20条相似文献,搜索用时 109 毫秒
1.
随着大数据时代的到来,在经济学、金融学和生物医学等众多研究领域中频繁收集到高维数据.高维数据的特征之一是变量维数p随着样本量n的增加而变大且通常会超过样本量,同时,异常值也容易出现在高维数据中.因此,如何克服异常值给高维统计推断带来的影响,从而得到更精确的模型,是目前统计学研究的热点问题之一.本文是对高维线性模型下的稳健变量选择方法进行综述.具体地,首先介绍评估稳健性的三个指标:影响函数、崩溃点和最大偏差.其次着重介绍了稳健变量选择方法,包括响应变量含有异常值,响应变量和协变量都含有异常值,高崩溃点且高效的变量选择方法.紧接着介绍相关算法,通过模拟和实例比较不同变量选择方法.最后,简要探讨了高维稳健有效变量选择方法存在的问题及未来的可能发展方向. 相似文献
2.
3.
4.
广义估计方程(GEE)是分析纵向数据下响应变量是离散的或非负的回归问题常用方法.本文研究了高维GEE的变量选择,在更弱的条件下证明了相关阵(或协方差)假定不一定正确,只要均值函数假定正确,模型选择是相合的,得到了变量选择的Oracle性质.改进了WANG(2011)和WANG,ZHOU,QU(2012)的结果. 相似文献
5.
高维回归分析的变量选择问题是目前统计学研究的一个热点和难点问题.提出了一个基于条件分布函数的相关性度量准则,并在此基础上提出三种变量选择方法.与现有的方法相比,提出的方法不依赖于统计模型,可以适用于线性模型和非参数可加模型.数值模拟结果表明,即使协变量之间存在一定的相关性,方法也有较为满意的表现. 相似文献
6.
高维线性回归估计是一个被大量学者研究的重要统计学问题.在误差分布未知的情况下,如何将有效性纳入高维估计仍是一个尚未解决且具有挑战性的问题.最小二乘估计在非Gauss误差密度下会损失估计的效率,而极大似然估计由于误差密度未知,无法直接被应用.基于惩罚估计方程,本文提出一种新的稀疏半参有效估计方法应用于高维线性回归的估计.本文证明了在误差密度未知的超高维回归下,新的估计渐近地与具有神谕性的极大似然估计一样有效,因此对于非Gauss误差密度,它比传统的惩罚最小二乘估计更有效.此外,本文证明了几种常用的高维回归估计是本文方法的特例.模拟和实际数据的结果验证了本文所提出方法的有效性. 相似文献
7.
本文主要研究分组数据分位数回归模型的变量选择和估计问题.为了充分反映数据的分组信息,需要假定每组数据的回归系数可以分解成共性部分和分组后的个性部分.为了进行变量筛选,本文提出分解系数的Lasso估计,并进一步提出了自适应Lasso估计.在处理相应优化问题时,采用了变换观测矩阵的方法简化问题求解.本文给出了自适应Lasso估计的Oracle性质证明,并且通过数值模拟研究展示了所提方法的有限样本表现.最后,将此方法应用到乳腺浸润癌致病基因的变量筛选上来展示所提方法的实际应用表现. 相似文献
8.
9.
研究高维线性模型中的经验似然推断.当协变量的维数随样本量增加时,常规的经验似然推断失效.在适当的正则条件下,对修正的经验似然比统计量给出了渐近分布理论. 相似文献
10.
11.
We consider approaches for improving the efficiency of algorithms for fitting nonconvex penalized regression models such as smoothly clipped absolute deviation (SCAD) and the minimax concave penalty (MCP) in high dimensions. In particular, we develop rules for discarding variables during cyclic coordinate descent. This dimension reduction leads to an improvement in the speed of these algorithms for high-dimensional problems. The rules we propose here eliminate a substantial fraction of the variables from the coordinate descent algorithm. Violations are quite rare, especially in the locally convex region of the solution path, and furthermore, may be easily corrected by checking the Karush–Kuhn–Tucker conditions. We extend these rules to generalized linear models, as well as to other nonconvex penalties such as the ?2-stabilized Mnet penalty, group MCP, and group SCAD. We explore three variants of the coordinate decent algorithm that incorporate these rules and study the efficiency of these algorithms in fitting models to both simulated data and on real data from a genome-wide association study. Supplementary materials for this article are available online. 相似文献
12.
In this article, for Lasso penalized linear regression models in high-dimensional settings, we propose a modified cross-validation (CV) method for selecting the penalty parameter. The methodology is extended to other penalties, such as Elastic Net. We conduct extensive simulation studies and real data analysis to compare the performance of the modified CV method with other methods. It is shown that the popular K-fold CV method includes many noise variables in the selected model, while the modified CV works well in a wide range of coefficient and correlation settings. Supplementary materials containing the computer code are available online. 相似文献
13.
灰色系统模型的优化岭回归算法 总被引:3,自引:0,他引:3
文献[1]指出了目前用普通最小二乘法估计灰微分方程参数的方法由于方程组的病态问题很难求解得合理的参数;文献[2]指出了根据初值求解灰色系统模型的时间响应式的方法由于初值的误差使所求得时间响应式产生系统误差.为了克服灰色模型的上述两个缺点,本文设计了一种求解灰色系统模型的优化岭回归算法,计算一个广泛引用的算例演示了这种算法的优越性. 相似文献
14.
We propose and study a new iterative coordinate descent algorithm (QICD) for solving nonconvex penalized quantile regression in high dimension. By permitting different subsets of covariates to be relevant for modeling the response variable at different quantiles, nonconvex penalized quantile regression provides a flexible approach for modeling high-dimensional data with heterogeneity. Although its theory has been investigated recently, its computation remains highly challenging when p is large due to the nonsmoothness of the quantile loss function and the nonconvexity of the penalty function. Existing coordinate descent algorithms for penalized least-squares regression cannot be directly applied. We establish the convergence property of the proposed algorithm under some regularity conditions for a general class of nonconvex penalty functions including popular choices such as SCAD (smoothly clipped absolute deviation) and MCP (minimax concave penalty). Our Monte Carlo study confirms that QICD substantially improves the computational speed in the p ? n setting. We illustrate the application by analyzing a microarray dataset. 相似文献
15.
16.
本文指出利用常用的逐步回归方法可以计算出回归分析中常用的5种准则下的局部最优回归子集,而模拟结果显示,在大部分情形下,局部最优回归子集是相重合的.这就为逐步回归方法在应用上的重要性提供了科学依据.最后作者对现今著名的几个数字例子进行计算,其效果也是十分满意的. 相似文献
17.
文章讨论带测量误差的线性模型中参数估计的问题.当带测量误差的线性模型存在复共线的时候,通过几乎无偏估计的思想,提出了几乎无偏岭估计,并对估计的性质进行分析.通过研究发现几乎无偏岭估计不但能克服复共线性,同时有比较小的均方误差. 相似文献
18.
19.
Li-Xiao Duan & Guo-Feng Zhang 《高等学校计算数学学报(英文版)》2021,14(3):714-737
The variants of randomized Kaczmarz and randomized Gauss-Seidel algorithms are two effective stochastic iterative methods for solving ridge regression
problems. For solving ordinary least squares regression problems, the greedy randomized Gauss-Seidel (GRGS) algorithm always performs better than the randomized Gauss-Seidel algorithm (RGS) when the system is overdetermined. In this paper, inspired by the greedy modification technique of the GRGS algorithm, we extend
the variant of the randomized Gauss-Seidel algorithm, obtaining a variant of greedy
randomized Gauss-Seidel (VGRGS) algorithm for solving ridge regression problems.
In addition, we propose a relaxed VGRGS algorithm and the corresponding convergence theorem is established. Numerical experiments show that our algorithms
outperform the VRK-type and the VRGS algorithms when $m > n$. 相似文献
20.
研究了部分线性回归模型附加有随机约束条件时的估计问题.基于Profile最小二乘方法和混合估计方法提出了参数分量随机约束下的Profile混合估计,并研究了其性质.为了克服共线性问题,构造了参数分量的Profile混合岭估计,并给出了估计量的偏和方差. 相似文献