共查询到20条相似文献,搜索用时 0 毫秒
1.
??Considering a parameter estimation and variable selection problem in logistic regression, we propose Smooth LASSO and Spline LASSO. When the variables is continuous, using Smooth LASSO can select local constant coefficient in each group. However, in some case, the coefficient might be different and change smoothly. Using Spline Lasso to estimate parameter is more appropriate. In this article, we prove the reliability of the model by theory. Finally using coordinate
descent algorithm to solve the model. Simulations show that the model works very effectively both in feature selection and prediction accuracy. 相似文献
2.
3.
纵向数据常常用正态混合效应模型进行分析.然而,违背正态性的假定往往会导致无效的推断.与传统的均值回归相比较,分位回归可以给出响应变量条件分布的完整刻画,对于非正态误差分布也可以给稳健的估计结果.本文主要考虑右删失响应下纵向混合效应模型的分位回归估计和变量选择问题.首先,逆删失概率加权方法被用来得到模型的参数估计.其次,结合逆删失概率加权和LASSO惩罚变量选择方法考虑了模型的变量选择问题.蒙特卡洛模拟显示所提方法要比直接删除删失数据的估计方法更具优势.最后,分析了一组艾滋病数据集来展示所提方法的实际应用效果. 相似文献
4.
与传统的的媒体营销模式相比,搜索引擎广告因其精准和投入低等特点获得巨大成功。但已有的搜索引擎广告点击率模型不能有效解决数据量大及特征维度高的问题,使预测结果的准确性大打折扣。本文构建了一种基于LASSO变量选择方法的广告点击率预测模型,能有效克服现有广告点击率模型在处理数据高维性和稀疏性方面的不足。利用某公司的竞价数据对模型进行验证,结果表明影响广告点击率的关键因素是广告关键词中的商标信息、地域信息和每点击成本。该研究结果为企业制定搜索引擎广告营销策略提供一定的理论依据。 相似文献
5.
为避免模型出现过拟合,将自适应LASSO变量选择方法引入二元选择分位回归模型,利用贝叶斯方法构建Gibbs抽样算法并在抽样中设置不影响预测结果的约束条件‖β‖=1以提高抽样值的稳定性.通过数值模拟,表明改进的模型有更为良好的参数估计效率、变量选择功能和分类能力. 相似文献
6.
??When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method. 相似文献
7.
When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method. 相似文献
8.
9.
Jeff Goldsmith Lei Huang Ciprian M. Crainiceanu 《Journal of computational and graphical statistics》2013,22(1):46-64
We develop scalar-on-image regression models when images are registered multidimensional manifolds. We propose a fast and scalable Bayes’ inferential procedure to estimate the image coefficient. The central idea is the combination of an Ising prior distribution, which controls a latent binary indicator map, and an intrinsic Gaussian Markov random field, which controls the smoothness of the nonzero coefficients. The model is fit using a single-site Gibbs sampler, which allows fitting within minutes for hundreds of subjects with predictor images containing thousands of locations. The code is simple and is provided in the online Appendix (see the “Supplementary Materials” section). We apply this method to a neuroimaging study where cognitive outcomes are regressed on measures of white-matter microstructure at every voxel of the corpus callosum for hundreds of subjects. 相似文献
10.
11.
Yihong Zhao Huaihou Chen R. Todd Ogden 《Journal of computational and graphical statistics》2013,22(3):655-675
One useful approach for fitting linear models with scalar outcomes and functional predictors involves transforming the functional data to wavelet domain and converting the data-fitting problem to a variable selection problem. Applying the LASSO procedure in this situation has been shown to be efficient and powerful. In this article, we explore two potential directions for improvements to this method: techniques for prescreening and methods for weighting the LASSO-type penalty. We consider several strategies for each of these directions which have never been investigated, either numerically or theoretically, in a functional linear regression context. We compare the finite-sample performance of the proposed methods through both simulations and real-data applications with both 1D signals and 2D image predictors. We also discuss asymptotic aspects. We show that applying these procedures can lead to improved estimation and prediction as well as better stability. Supplementary materials for this article are available online. 相似文献
12.
基于Logistic回归的水质预测研究 总被引:1,自引:0,他引:1
在环境系统评价中,水环境质量等级评价是其中十分重要的工作.鉴于对水环境研究中,水质级别为分类变量不能利用传统回归方法分析的特征,基于logistic回归方法建立了一种水质级别预测模型.利用长江流域的水质监测数据,将logistic回归应用于水质数据分析,进行水质建模,对水质级别做出预测.研究结果表明利用logistic回归进行水质分析,具有良好的拟合和预测效果. 相似文献
13.
在带有罚函数的变量选择中,调节参数的选择是一个关键性问题,但遗憾的是,在大多数文献中,调节参数选择的方法较为模糊,多凭经验,缺乏系统的理论方法.本文基于含随机效应的面板数据模型,提出分位回归中适应性LASSO调节参数的选择标准惩罚交叉验证准则(PCV),并讨论比较了该准则与其他选择调节参数的准则的效果.通过对不同分位点进行模拟,我们发现当残差E来自尖峰分布和厚尾分布时,该准则能更好地估计模型参数,尤其对于高分位点和低分位点而言.选取其他分位点时,PCV的效果虽稍逊色于Schwarz信息准则,但明显优于A1kaike 信息准则和交叉验证准则.且在选择变量的准确性方面,该准则比Schwarz信息准则、Akaike信息准则等更加有效.文章最后对我国各地区多个宏观经济指标的面板数据进行建模分析,展示了惩罚交叉验证准则的性能,得到了在不同分位点处宏观经济指标之间的回归关系. 相似文献
14.
本文研究测量误差模型的自适应LASSO(least absolute shrinkage and selection operator)变量选择和系数估计问题.首先分别给出协变量有测量误差时的线性模型和部分线性模型自适应LASSO参数估计量,在一些正则条件下研究估计量的渐近性质,并且证明选择合适的调整参数,自适应LASSO参数估计量具有oracle性质.其次讨论估计的实现算法及惩罚参数和光滑参数的选择问题.最后通过模拟和一个实际数据分析研究了自适应LASSO变量选择方法的表现,结果表明,变量选择和参数估计效果良好. 相似文献
15.
Lasso是机器学习中比较常用的一种变量选择方法,适用于具有稀疏性的回归问题.当样本量巨大或者海量的数据存储在不同的机器上时,分布式计算是减少计算时间提高效率的重要方式之一.本文在给出Lasso模型等价优化模型的基础上,将ADMM算法应用到此优化变量可分离的模型中,构造了一种适用于Lasso变量选择的分布式算法,证明了... 相似文献
16.
当前上市公司信用风险数据所呈现出的高维度以及高相关性的特点严重影响了信用风险模型的准确性。为此本文结合已有算法以及信用风险模型的特点设计了一种新的基于非参数的变量选择方法。通过该方法对上市公司用风险相关变量进行分析筛选可以消除数据集中包含的噪声变量以及线性相关变量。本文同时还针对该方法设计了高变量维度下最优解求解算法。文章以Logistic模型为例对上市公司信用风险做了实证分析,研究结果表明与以往的变量选择方法相比该方法可以有效的降低数据维度,消除变量间的相关性,并同时提高模型的可靠性和预测精度。 相似文献
17.
We develop an approach to tuning of penalized regression variable selection methods by calculating the sparsest estimator contained in a confidence region of a specified level. Because confidence intervals/regions are generally understood, tuning penalized regression methods in this way is intuitive and more easily understood by scientists and practitioners. More importantly, our work shows that tuning to a fixed confidence level often performs better than tuning via the common methods based on Akaike information criterion (AIC), Bayesian information criterion (BIC), or cross-validation (CV) over a wide range of sample sizes and levels of sparsity. Additionally, we prove that by tuning with a sequence of confidence levels converging to one, asymptotic selection consistency is obtained, and with a simple two-stage procedure, an oracle property is achieved. The confidence-region-based tuning parameter is easily calculated using output from existing penalized regression computer packages. Our work also shows how to map any penalty parameter to a corresponding confidence coefficient. This mapping facilitates comparisons of tuning parameter selection methods such as AIC, BIC, and CV, and reveals that the resulting tuning parameters correspond to confidence levels that are extremely low, and can vary greatly across datasets. Supplemental materials for the article are available online. 相似文献
18.
Peter F. Thall Richard Simon David A. Grier 《Journal of computational and graphical statistics》2013,22(1):41-61
Abstract Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of α is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test-based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection. 相似文献
19.
Markov chain Monte Carlo (MCMC) is nowadays a standard approach to numerical computation of integrals of the posterior density π of the parameter vector η. Unfortunately, Bayesian inference using MCMC is computationally intractable when the posterior density π is expensive to evaluate. In many such problems, it is possible to identify a minimal subvector β of η responsible for the expensive computation in the evaluation of π. We propose two approaches, DOSKA and INDA, that approximate π by interpolation in ways that exploit this computational structure to mitigate the curse of dimensionality. DOSKA interpolates π directly while INDA interpolates π indirectly by interpolating functions, for example, a regression function, upon which π depends. Our primary contribution is derivation of a Gaussian processes interpolant that provably improves over some of the existing approaches by reducing the effective dimension of the interpolation problem from dim(η) to dim(β). This allows a dramatic reduction of the number of expensive evaluations necessary to construct an accurate approximation of π when dim(η) is high but dim(β) is low. We illustrate the proposed approaches in a case study for a spatio-temporal linear model for air pollution data in the greater Boston area. Supplemental materials include proofs, details, and software implementation of the proposed procedures. 相似文献
20.
本文考虑半参数回归模型yi =XTiβ g(ti) ei(i =1 ,2 ,… ,n) .先利用差分的方法估计 ^β ,接着用样条函数的方法定义 ^g(T) ,最后讨论 ^g(t)的性质 . 相似文献