首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
在使用变量选择方法选出模型后,如何评价模型中变量系数的显著性是统计学重点关注的前沿问题之一.文章从适应性Lasso变量选择方法的选择结果出发,在考虑实践中误差分布多样性的前提下,基于选择事件构造了模型保留变量系数的条件检验统计量,并给出了该统计量的一致收敛性质的证明过程.模拟研究显示,在多种误差分布下所提方法均可进一步优化变量选择结果,有较强的实用价值.应用此方法对CEPS学生数据进行了实证分析,最终选取了学生认知能力等10个变量作为影响中学生成绩的主要因素,为相关研究提供了有益的参考.  相似文献   

2.
当前上市公司信用风险数据所呈现出的高维度以及高相关性的特点严重影响了信用风险模型的准确性。为此本文结合已有算法以及信用风险模型的特点设计了一种新的基于非参数的变量选择方法。通过该方法对上市公司用风险相关变量进行分析筛选可以消除数据集中包含的噪声变量以及线性相关变量。本文同时还针对该方法设计了高变量维度下最优解求解算法。文章以Logistic模型为例对上市公司信用风险做了实证分析,研究结果表明与以往的变量选择方法相比该方法可以有效的降低数据维度,消除变量间的相关性,并同时提高模型的可靠性和预测精度。  相似文献   

3.
从弹性网(Elastic net)方法所选择的模型出发,构造基于模型选择条件下的系数的精确分布,并通过分布进行推断从而得到检验系数显著性的p值及模型系数的置信区间等.通过方法可对传统弹性网方法所选模型做进一步调整,模拟研究说明了本文所提方法在变量选择中的适用性。如对噪声变量有较强的识别能力等.在实证分析中,使用基于变量选择事件的弹性网方法对我国劳动者工资收入的影响原因进行了筛选,分析表明在传统弹性网方法选取的解释变量中,宗教活动频率、工龄、身体健康程度以及个体身高不是影响劳动收入的最主要原因,可依据实际情况剔除这些变量,减少研究成本且提高分析效率,在实际应用中有一定的参考价值.  相似文献   

4.
变量选择直接决定着空间计量经济模型的有效程度与实证研究结果。为有效解决空间自回归模型(即SAR模型)的变量选择问题,本文利用Kullback-Laible信息量最大化,把AIC准则运用到SAR模型构建,推导出Spatial AIC统计量,提出Spatial AIC准则。然后利用统计理论证明Spatial AIC准则选择SAR模型变量的渐近最优性;利用蒙特卡洛模拟方法,比较Spatial AIC准则、经典AIC准则和Lasso方法用于SAR模型变量选择的有限大样本性质;利用空间相关的沪深300成分股股票收益率数据,采用Spatial AIC准则和Lasso方法,分别构建股票收益率财务因素的空间自相关模型,实证比较其相对有效性。三种结果均表明Spatial AIC准则能够更好地解决SAR模型变量选择问题。  相似文献   

5.
为确定某型飞机的燃料油消耗量同其它因素之间的相关关系,借助SPSS统计分析软件,运用逐步回归分析方法,对某场站近三年燃料油消耗的实际数据进行研究.首先,运用逐步回归分析的方法,分别挑选出影响飞机在起飞滑跑、空中飞行及降落滑跑等三个阶段燃油消耗量的变量,并建立对应的回归模型.然后,对回归模型作统计诊断,检验模型的有效性及可能存在的异常点.最后,根据回归分析及统计诊断的结果,得出结论.  相似文献   

6.
变量选择有助于简化模型,提高估计和预测的精度,但目前鲜有涉及面板半参数空间自回归模型变量选择的研究。本文在ALASSO的基础上提出了SSAR-ALASSO法,该法的核心在于惩罚函数的选择和目标函数的构建。SSAR-ALASSO在变量和参数的对应关系、惩罚函数的选择、特殊参数的取值区间以及适用模型等方面与ALASSO存在差异。模拟结果显示,SSAR-ALASSO法在变量选择的准确性和参数估计的精度两方面均表现良好,随着样本容量的增加表现效果更佳。本文在碳排放量影响因素实证中采用SSAR-ALASSO法对STIRPAT模型进行变量选择。研究结果表明人均财富、技术水平、产业结构、所有制结构和产业集聚显著影响碳排放量,城市化、对外开放、能源价格和环境政策对碳排放量无显著影响。  相似文献   

7.
<正> 选择回归变量以取得最优或较优的回归方程是多元回归中引起关注和着重研究的课题之一.目前,已有许多关于在多元回归中选择变量,建立最优或较优方程的方法.主要有前进法、后退法、逐步回归法、反向逐步回归法、一切可能回归法、最优回归法,其中最流行的是逐步回归法.Beale、Mantel、Hocking对前进法、后退法以及逐步回归和一切可能回归法等进行了讨论.认为逐步回归法存在一些问题,最主要是可能遗漏最优方程.这里,最优是指在相同变量数的方程中残差平方和最小.我们认为遗漏的原因在于:逐步回归法是利用某一固定 F 界限(本文用 F_α 表示)来作为选择变量的阈值的.每次选入方程外回归贡献  相似文献   

8.
高维回归分析的变量选择问题是目前统计学研究的一个热点和难点问题.提出了一个基于条件分布函数的相关性度量准则,并在此基础上提出三种变量选择方法.与现有的方法相比,提出的方法不依赖于统计模型,可以适用于线性模型和非参数可加模型.数值模拟结果表明,即使协变量之间存在一定的相关性,方法也有较为满意的表现.  相似文献   

9.
信用风险是目前商业银行面临的风险中最为重要和最为复杂的,新巴塞尔协议要求各国条件的银行通过实施内部评级法来度量并控制信用风险,内部评级法即通过银行收集的客户相历史数据来构建数学模型,测算客户的违约概率进而对客户进行评分。文章针对信用评分模型解释变量维数较高,类型丰富,好坏客户类型数量不均衡等特点,利用广义半参数可加模型对户违约概率进行建模,并将Group LASSO方法应用于模型进行变量选择和估计。实证研究表明本文提出的模型和方法与以往常用的线性logistic回归模型相比,在模型的判别能力和预测能以及解释性和计算效率上均有较大优势。  相似文献   

10.
本文研究泊松逆高斯回归模型的贝叶斯统计推断.基于应用Gibbs抽样,Metropolis-Hastings算法以及Multiple-Try Metropolis算法等MCMC统计方法计算模型未知参数和潜变量的联合贝叶斯估计,并引入两个拟合优度统计量来评价提出的泊松逆高斯回归模型的合理性.若干模拟研究与一个实证分析说明方法的可行性.  相似文献   

11.
从工程实践的需要出发,提出多元组内回归分析的模型和算法。该模型参考逐步回归的思想,将副除的自变量转变为因变量,既充分利用数据信息,又获得完整的结果。该模型克服传统多元回归模型在因变量集合不确定的情况下难以应用的不足,应用前景广阔。  相似文献   

12.
This work deals with log‐symmetric regression models, which are particularly useful when the response variable is continuous, strictly positive, and following an asymmetric distribution, with the possibility of modeling atypical observations by means of robust estimation. In these regression models, the distribution of the random errors is a member of the log‐symmetric family, which is composed by the log‐contaminated‐normal, log‐hyperbolic, log‐normal, log‐power‐exponential, log‐slash and log‐Student‐t distributions, among others. One way to select the best family member in log‐symmetric regression models is using information criteria. In this paper, we formulate log‐symmetric regression models and conduct a Monte Carlo simulation study to investigate the accuracy of popular information criteria, as Akaike, Bayesian, and Hannan‐Quinn, and their respective corrected versions to choose adequate log‐symmetric regressions models. As a business application, a movie data set assembled by authors is analyzed to compare and obtain the best possible log‐symmetric regression model for box offices. The results provide relevant information for model selection criteria in log‐symmetric regressions and for the movie industry. Economic implications of our study are discussed after the numerical illustrations.  相似文献   

13.
本文指出利用常用的逐步回归方法可以计算出回归分析中常用的5种准则下的局部最优回归子集,而模拟结果显示,在大部分情形下,局部最优回归子集是相重合的.这就为逐步回归方法在应用上的重要性提供了科学依据.最后作者对现今著名的几个数字例子进行计算,其效果也是十分满意的.  相似文献   

14.
We consider the use ofB-spline nonparametric regression models estimated by the maximum penalized likelihood method for extracting information from data with complex nonlinear structure. Crucial points inB-spline smoothing are the choices of a smoothing parameter and the number of basis functions, for which several selectors have been proposed based on cross-validation and Akaike information criterion known as AIC. It might be however noticed that AIC is a criterion for evaluating models estimated by the maximum likelihood method, and it was derived under the assumption that the ture distribution belongs to the specified parametric model. In this paper we derive information criteria for evaluatingB-spline nonparametric regression models estimated by the maximum penalized likelihood method in the context of generalized linear models under model misspecification. We use Monte Carlo experiments and real data examples to examine the properties of our criteria including various selectors proposed previously.  相似文献   

15.
Abstract

This article deals with regression function estimation when the regression function is smooth at all but a finite number of points. An important question is: How can one produce discontinuous output without knowledge of the location of discontinuity points? Unlike most commonly used smoothers that tend to blur discontinuity in the data, we need to find a smoother that can detect such discontinuity. In this article, linear splines are used to estimate discontinuous regression functions. A procedure of knot-merging is introduced for the estimation of regression functions near discontinuous points. The basic idea is to use multiple knots for spline estimates. We use an automatic procedure involving the least squares method, stepwise knot addition, stepwise basis deletion, knot-merging, and the Bayes information criterion to select the final model. The proposed method can produce discontinuous outputs. Numerical examples using both simulated and real data are given to illustrate the performance of the proposed method.  相似文献   

16.
Fast stepwise procedures of selection of variables by using AIC and BIC criteria are proposed in this paper. We shall use a short name “FSP” for these new procedures. FSP are similar to the well-known stepwise regression procedures in computing steps. But FSP have two advantages. One of these advantages is that FSP are definitely convergent with a faster rate in finite computing steps. Another advantage is that FSP can be used for large number of candidate variables. In this paper we also show some asymptotic properties of FSP, and some simulation results.  相似文献   

17.
Multivariate adaptive regression splines (MARS) has become a popular data mining (DM) tool due to its flexible model building strategy for high dimensional data. Compared to well-known others, it performs better in many areas such as finance, informatics, technology and science. Many studies have been conducted on improving its performance. For this purpose, an alternative backward stepwise algorithm is proposed through Conic-MARS (CMARS) method which uses a penalized residual sum of squares for MARS as a Tikhonov regularization problem. Additionally, by modifying the forward step of MARS via mapping approach, a time efficient procedure has been introduced by S-FMARS. Inspiring from the advantages of MARS, CMARS and S-FMARS, two hybrid methods are proposed in this study, aiming to produce time efficient DM tools without degrading their performances especially for large datasets. The resulting methods, called SMARS and SCMARS, are tested in terms of several performance criteria such as accuracy, complexity, stability and robustness via simulated and real life datasets. As a DM application, the hybrid methods are also applied to an important field of finance for predicting interest rates offered by a Turkish bank to its customers. The results show that the proposed hybrid methods, being the most time efficient with competing performances, can be considered as powerful choices particularly for large datasets.  相似文献   

18.
均匀设计中有重复试验的统计分析   总被引:5,自引:0,他引:5  
本文先简要介绍有重复试验的回归分析理论及公式 ,再对一个例子重新给以计算及提出问题 ;最后笔者对此类数据的处理及 SAS统计软件包提出讨论意见 ,其中特别对常规的用平均数去取代试验结果的方法及当变量数超过样本数时用流行的逐步回归选取变量法提出批评。  相似文献   

19.
With advanced capability in data collection, applications of linear regression analysis now often involve a large number of predictors. Variable selection thus has become an increasingly important issue in building a linear regression model. For a given selection criterion, variable selection is essentially an optimization problem that seeks the optimal solution over 2m possible linear regression models, where m is the total number of candidate predictors. When m is large, exhaustive search becomes practically impossible. Simple suboptimal procedures such as forward addition, backward elimination, and backward-forward stepwise procedure are fast but can easily be trapped in a local solution. In this article we propose a relatively simple algorithm for selecting explanatory variables in a linear regression for a given variable selection criterion. Although the algorithm is still a suboptimal algorithm, it has been shown to perform well in extensive empirical study. The main idea of the procedure is to partition the candidate predictors into a small number of groups. Working with various combinations of the groups and iterating the search through random regrouping, the search space is substantially reduced, hence increasing the probability of finding the global optimum. By identifying and collecting “important” variables throughout the iterations, the algorithm finds increasingly better models until convergence. The proposed algorithm performs well in simulation studies with 60 to 300 predictors. As a by-product of the proposed procedure, we are able to study the behavior of variable selection criteria when the number of predictors is large. Such a study has not been possible with traditional search algorithms.

This article has supplementary material online.  相似文献   

20.
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号