共查询到18条相似文献,搜索用时 62 毫秒
1.
本文考虑了纵向数据线性EV模型的变量选择.基于二次推断函数方法和压缩方法的思想提出了一种新的偏差校正的变量选择方法.在选择适当的调整参数下,我们证明了所得到的估计量的相合性和渐近正态性.最后通过模拟研究验证了所提出的变量选择方法的有限样本性质. 相似文献
2.
高维回归分析的变量选择问题是目前统计学研究的一个热点和难点问题.提出了一个基于条件分布函数的相关性度量准则,并在此基础上提出三种变量选择方法.与现有的方法相比,提出的方法不依赖于统计模型,可以适用于线性模型和非参数可加模型.数值模拟结果表明,即使协变量之间存在一定的相关性,方法也有较为满意的表现. 相似文献
3.
在多元线性回归中,变量选择紧密依赖模型,与影响数据密切相关。本文从模型扰动的角度,研究了变量选择与数据的关系,用微分几何中的概念,提出了用曲线的变化率、加速率及其曲率三种量测,去评价数据对变量选择的影响,从而诊断影响数据。文中给出的数值例子表明,所提影响量测,对于诊断数据对变量选择的影响是有效的。 相似文献
4.
5.
利用一些辅助信息作为工具变量并结合光滑门限估计方程(SEE)方法,针对协变量含有测量误差广义线性模型提出一个工具变量类型的变量选择方法.该方法可以在估计模型中非零回归系数的同时,剔除模型中不显著的协变量,从而达到变量选择的目的.另外,该变量选择过程不需要求解任何凸优化问题,从而具有较强的适应性并且在实际应用比较容易计算.理论证明该变量选择方法是相合的,并且对非零回归系数的估计达到了最优的参数收敛速度.数值模拟结果表明所提出的变量选择方法可以有效地消除测量误差对估计精度的影响,并且具有较好的有限样本性质. 相似文献
6.
高维数据变量选择方法综述 总被引:2,自引:0,他引:2
变量选择是统计学知识结构中不可或缺的一部分。本文归纳梳理了近二十年多来的变量选择方法,着重介绍了处理高维数据以及超高维数据的变量选择方法。最后我们通过一个实例比较了不同变量选择方法的差异性。 相似文献
7.
8.
偏倚一方差分析方法是在模型选择过程中权衡模型对现有样本解释程度和未知样本估计准确度的分析方法,目的是使选定的模型检验误差尽量小.在分类或回归过程中进行有效的变量筛选可以获得更准确的模型表达,但也会因此带来一定误差.提出"选择误差"的概念,用于刻画带有变量选择的分类问题中由于变量的某种选择方法所引起的误差.将分类问题的误差分解为偏倚—方差—选择误差进行研究,考察偏倚、方差和选择误差对分类问题的总误差所产生的影响. 相似文献
9.
当前上市公司信用风险数据所呈现出的高维度以及高相关性的特点严重影响了信用风险模型的准确性。为此本文结合已有算法以及信用风险模型的特点设计了一种新的基于非参数的变量选择方法。通过该方法对上市公司用风险相关变量进行分析筛选可以消除数据集中包含的噪声变量以及线性相关变量。本文同时还针对该方法设计了高变量维度下最优解求解算法。文章以Logistic模型为例对上市公司信用风险做了实证分析,研究结果表明与以往的变量选择方法相比该方法可以有效的降低数据维度,消除变量间的相关性,并同时提高模型的可靠性和预测精度。 相似文献
10.
11.
Peter F. Thall Kathy E. Russell Richard M. Simon 《Journal of computational and graphical statistics》2013,22(4):416-434
Abstract A new algorithm—backward elimination via repeated data splitting (BERDS)—is proposed for variable selection in regression. Initially, the data are partitioned into two sets {E, V}, and an exhaustive backward elimination (BE) is performed in E. For each p value cutoff α used in BE, the corresponding fitted model from E is validated in V by computing the sum of squared deviations of observed from predicted values. This is repeated m times, and the α minimizing the sum of the m sums of squares is used as the cutoff in a final BE on the entire data set. BERDS is a modification of the algorithm BECV proposed by Thall, Simon, and Grier (1992). An extensive simulation study shows that, compared to BECV, BERDS has a smaller model error and higher probabilities of excluding noise variables, of selecting each of several uncorrelated true predictors, and of selecting exactly one of two or three highly correlated true predictors. BERDS is also superior to standard BE with cutoffs .05 or .10, and this superiority increases with the number of noise variables in the data and the degree of correlation among true predictors. An application is provided for illustration. 相似文献
12.
Il Do Ha Jianxin Pan Seungyoung Oh Youngjo Lee 《Journal of computational and graphical statistics》2013,22(4):1044-1060
Variable selection methods using a penalized likelihood have been widely studied in various statistical models. However, in semiparametric frailty models, these methods have been relatively less studied because the marginal likelihood function involves analytically intractable integrals, particularly when modeling multicomponent or correlated frailties. In this article, we propose a simple but unified procedure via a penalized h-likelihood (HL) for variable selection of fixed effects in a general class of semiparametric frailty models, in which random effects may be shared, nested, or correlated. We consider three penalty functions (least absolute shrinkage and selection operator [LASSO], smoothly clipped absolute deviation [SCAD], and HL) in our variable selection procedure. We show that the proposed method can be easily implemented via a slight modification to existing HL estimation approaches. Simulation studies also show that the procedure using the SCAD or HL penalty performs well. The usefulness of the new method is illustrated using three practical datasets too. Supplementary materials for the article are available online. 相似文献
13.
《Journal of computational and graphical statistics》2013,22(4):782-798
Variable selection is an important aspect of high-dimensional statistical modeling, particularly in regression and classification. In the regularization framework, various penalty functions are used to perform variable selection by putting relatively large penalties on small coefficients. The L1 penalty is a popular choice because of its convexity, but it produces biased estimates for the large coefficients. The L0 penalty is attractive for variable selection because it directly penalizes the number of non zero coefficients. However, the optimization involved is discontinuous and non convex, and therefore it is very challenging to implement. Moreover, its solution may not be stable. In this article, we propose a new penalty that combines the L0 and L1 penalties. We implement this new penalty by developing a global optimization algorithm using mixed integer programming (MIP). We compare this combined penalty with several other penalties via simulated examples as well as real applications. The results show that the new penalty outperforms both the L0 and L1 penalties in terms of variable selection while maintaining good prediction accuracy. 相似文献
14.
投资组合选择中的系统误差与估计误差是决定样本期外绩效的重要因素,其权衡受到资产基数N的影响。本文在变动基数的设定下,将Bootstrapping和样本期外滚动的方法应用到均权重、最小方差组合及其误差修正策略的绩效和尾部风险检验过程中,并在不同的市场状态下进行分组讨论。研究发现:(1)最小方差组合与均权重策略的样本期外夏普比率差异与N存在倒U型的关系。(2)最小方差组合的尾部风险随N的扩大而迅速降低,总体来看最小方差组合的尾部风险低于均权重策略。(3)最小方差组合的换手率与N存在正相关关系,盲目增加投资组合选择中的资产基数会带来无谓损失。研究结果表明,投资者应理性选择资产基数,充分利用最小方差组合带来的分散化收益。 相似文献
15.
《Journal of computational and graphical statistics》2013,22(4):988-1006
Many modern treatments of high-dimensional datasets involve reducing the initial collection of features to a much smaller set, from which a predictive model may be built. However, strong relationships between the remaining variables can limit the parsimony or even the predictive performance of such a model. We propose a semi-automatic approach using generalized correlation to detect and quantify these relationships, as well as exploring ways to represent this information graphically. The method can detect both symmetric and asymmetric relationships, as well as nonlinear patterns. Its utility is demonstrated on a range of real and simulated datasets. Supplemental material for performing the real-data analyses in this article is available online. 相似文献
16.
主要研究因变量存在缺失且协变量部分包含测量误差情形下,如何对变系数部分线性模型同时进行参数估计和变量选择.我们利用插补方法来处理缺失数据,并结合修正的profile最小二乘估计和SCAD惩罚对参数进行估计和变量选择.并且证明所得的估计具有渐近正态性和Oracle性质.通过数值模拟进一步研究所得估计的有限样本性质. 相似文献
17.
本文研究纵向数据下非参数部分带有测量误差的部分线性变系数模型的估计.利用B样条函数近似模型中的变系数函数,构造偏差修正的二次推断函数,得到模型中未知参数和变系数函数的估计.证明变系数函数估计量的相合性和参数估计量的渐近正态性.数值模拟和实例分析结果表明所提估计方法在有限样本下的有效性. 相似文献