首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
多项式回归的建模方法比较研究   总被引:18,自引:0,他引:18  
在实际工作中,人们在采用回归模型解释因果变量间的相关关系时,经常会遇到自变量之间存在幂乘关系的情况。在这种情况下,多项式回归模型成为一个合理的选择。由于多项式回归模型中自变量之间存在较强的相关关系,采用普通最小二乘回归方法来估计变量的回归系数,则会存在较大的误差。在本文中,为了提高多项式回归模型的预测准确性和可靠性,提出使用主成分分析、偏最小二乘回归建模,并采用仿真数据来比较它们的异同。  相似文献   

2.
在线性回归模型建模中, 回归自变量选择是一个受到广泛关注、文献众多, 具有很强的理论和实际意义的问题. 回归自变量选择子集的相合性是其中一个重要问题, 如果某种自变量选择方法选择的子集在样本量趋于无穷时是相合的, 而且预测均方误差较小, 则这种方法是可取的. 利用BIC准则可以挑选相合的自变量子集, 但是在自变量个数很多时计算量过大; 适应lasso方法具有较高计算效率, 也能找到相合的自变量子集; 本文提出一种更简单的自变量选择方法, 只需要计算两次普通线性回归: 第一次进行全集回归, 得到全集的回归系数估计, 然后利用这些回归系数估计挑选子集, 然后只要在挑选的自变量子集上再进行一次普通线性回归就得到了回归结果. 考虑如下的回归模型: 其中回归系数中非零分量下标的集合为, 设是本文方法选择的自变量子集下标集合, 是本文方法估计的回归系数(未选中的自变量对应的系数为零), 本文证明了, 在适当条件下, 其中表示的 分量下标在中的元素的组成的向量, 是误差方差, 是与 矩阵极限有关的矩阵和常数. 数值模拟结果表明本文方法具有很好的中小样本性质.  相似文献   

3.
The aggregation of financial and economic time series occurs in a number of ways. Temporal aggregation or systematic sampling is the commonly used approach. In this paper, we investigate the time interval effect of multiple regression models in which the variables are additive or systematically sampled. The correlation coefficient changes with the selected time interval when one is additive and the other is systematically sampled. It is shown that the squared correlation coefficient decreases monotonically as the differencing interval increases, approaching zero in the limit. When two random variables are both added or systematically sampled, the correlation coefficient is invariant with time and equal to the one-period values. We find that the partial regression and correlation coefficients between two additive or systematically sampled variables approach one-period values as n increases. When one of the variables is systematically sampled, they will approach zero in the limit. The time interval for the association analyses between variables is not selected arbitrarily or the statistical results are likely affected.  相似文献   

4.
通过测算贷款、存款等投入要素对净利息收入的贡献,评价商业银行的投入产出效率,对银行的资本运营和监管机构的银行资本监管具有重要意义.原始投入变量过多和变量之间的高度相关都会对评价模型的估计和检验产生影响.创新和特色在于:一是通过提取互不相关的2个主成分,反映6个原始投入变量95%以上的信息.建立基于主成分的SFA模型,克服变量过多和变量高度相关对模型参数估计和检验的影响,解决原始投入变量高度相关导致的系数检验不显著和符号不正确问题.二是利用主成分回归,将主成分与投入变量的关系表达式代入基于主成分的SFA模型,进而确定投入变量的权重系数,建立银行的投入产出模型,反映6个投入变量对净利息收入的影响规律.实证研究结果表明:一是利用主成分建立的SFA模型系数检验显著,技术效率随时间增加.二是利息支出、贷款余额、总资产、存款总额、固定资产和员工人数产出弹性分别为0.287,0.272,0.254,0.086,0.072和0.053.因此影响银行净利息收入的主要因素为利息支出、贷款余额、总资产.存款总额、固定资产和员工人数对净利息收入的影响较小.三是18家商业银行的规模系数为1.025,银行的净利息收入表现出规模经济特征.  相似文献   

5.
校准是最常用的加权调整方法,然而传统加权调整设计效应模型只考虑有差异权数导致的精度损失,忽略使用辅助信息后的精度改进,因此应用于设计效应计算时存在一定的缺陷。本文在Spencer模型的基础上进行拓展,引入反映辅助变量和调查变量相关关系的广义回归估计量,构建了校准加权设计效应的一般模型。数值分析结果显示,校准加权设计效应模型的效果优于传统加权调整设计效应模型;尤其在调查变量与辅助变量高度相关的情形下,校准加权设计效应模型能够准确地估计出不等概率抽样设计和校准调整的综合效率。  相似文献   

6.
In regression model with stochastic design, the observations have been primarily treated as a simple random sample from a bivariate distribution. It is of enormous practical significance to generalize the situation to stochastic processes. In this paper, estimation and hypothesis testing problems in stochastic volatility model are considered, when the volatility depends on a nonlinear function of the state variable of other stochastic process, but the correlation coefficient |ρ|≠±1. The methods are applied to estimate the volatility of stock returns from Shanghai stock exchange. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
在模型的部分协变量为内生性协变量的情况下,考虑广义变系数模型的一类估计问题.通过结合基函数逼近和一些辅助变量信息,提出了一个基于工具变量的估计过程.并得到了估计的相合性和收敛速度等渐近性质.所提出的估计方法可以有效地消除协变量的内生性对估计精度的影响,并且具有较好的有限样本性质.  相似文献   

8.
分位数变系数模型是一种稳健的非参数建模方法.使用变系数模型分析数据时,一个自然的问题是如何同时选择重要变量和从重要变量中识别常数效应变量.本文基于分位数方法研究具有稳健和有效性的估计和变量选择程序.利用局部光滑和自适应组变量选择方法,并对分位数损失函数施加双惩罚,我们获得了惩罚估计.通过BIC准则合适地选择调节参数,提出的变量选择方法具有oracle理论性质,并通过模拟研究和脂肪实例数据分析来说明新方法的有用性.数值结果表明,在不需要知道关于变量和误差分布的任何信息前提下,本文提出的方法能够识别不重要变量同时能区分出常数效应变量.  相似文献   

9.
Summary The paper considers the problem of optimum stratification on an auxiliary variablex when the information on the auxiliary variablex is also used to estimate the population mean using ratio or regression methods of estimation. Assuming the form of the regression of the estimation variabley on the auxiliary variablex as also the form of the conditional variance function V(y/x), the problem of determining optimum strata boundaries (OSB) is shown to be a particular case of optimum stratification on the auxiliary variable for stratified simple random sampling estimate. A numerical investigation has also been made to study the amount of gain in efficiency that can be brough about by stratifying the population.  相似文献   

10.
The accurate estimation of rare event probabilities is a crucial problem in engineering to characterize the reliability of complex systems. Several methods such as Importance Sampling or Importance Splitting have been proposed to perform the estimation of such events more accurately (i.e., with a lower variance) than crude Monte Carlo method. However, these methods assume that the probability distributions of the input variables are exactly defined (e.g., mean and covariance matrix perfectly known if the input variables are defined through Gaussian laws) and are not able to determine the impact of a change in the input distribution parameters on the probability of interest. The problem considered in this paper is the propagation of the input distribution parameter uncertainty defined by intervals to the rare event probability. This problem induces intricate optimization and numerous probability estimations in order to determine the upper and lower bounds of the probability estimate. The calculation of these bounds is often numerically intractable for rare event probability (say 10?5), due to the high computational cost required. A new methodology is proposed to solve this problem with a reduced simulation budget, using the adaptive Importance Sampling. To this end, a method for estimating the Importance Sampling optimal auxiliary distribution is proposed, based on preceding Importance Sampling estimations. Furthermore, a Kriging-based adaptive Importance Sampling is used in order to minimize the number of evaluations of the computationally expensive simulation code. To determine the bounds of the probability estimate, an evolutionary algorithm is employed. This algorithm has been selected to deal with noisy problems since the Importance Sampling probability estimate is a random variable. The efficiency of the proposed approach, in terms of accuracy of the found results and computational cost, is assessed on academic and engineering test cases.  相似文献   

11.
This paper studies the empirical likelihood inferences for a class of semiparametric instrumental variable models. We focus on the case that some covariates are endogenous variables, and some auxiliary instrumental variables are available. An instrumental variable based empirical likelihood method is proposed, and it is shown that the proposed empirical log-likelihood ratio is asymptotically chi-squared. Then, the confidence intervals for the regression coefficients are constructed. Some simulation studies are undertaken to assess the finite sample performance of the proposed empirical likelihood procedure.  相似文献   

12.
Common approaches to monotonic regression focus on the case of a unidimensional covariate and continuous response variable. Here a general approach is proposed that allows for additive structures where one or more variables have monotone influence on the response variable. In addition the approach allows for response variables from an exponential family, including binary and Poisson distributed response variables. Flexibility of the smooth estimate is gained by expanding the unknown function in monotonic basis functions. For the estimation of coefficients and the selection of basis functions a likelihood-based boosting algorithm is proposed which is simple to implement. Stopping criteria and inference are based on AIC-type measures. The method is applied to several datasets.  相似文献   

13.
Both the Walsh transform and a modified Pearson correlation coefficient can be used to infer the structure of a Boolean network from time series data. Unlike the correlation coefficient, the Walsh transform is also able to represent higher-order correlations. These correlations of several combined input variables with one output variable give additional information about the dependency between variables, but are also more sensitive to noise. Furthermore computational complexity increases exponentially with the order. We first show that the Walsh transform of order 1 and the modified Pearson correlation coefficient are equivalent for the reconstruction of Boolean functions. Secondly, we also investigate under which conditions (noise, number of samples, function classes) higher-order correlations can contribute to an improvement of the reconstruction process. We present the merits, as well as the limitations, of higher-order correlations for the inference of Boolean networks.  相似文献   

14.
删失回归模型是一种很重要的模型,它在计量经济学中有着广泛的应用. 然而,它的变量选择问题在现今的参考文献中研究的比较少.本文提出了一个LASSO型变量选择和估计方法,称之为多样化惩罚$L_1$限制方法, 简称为DPLC. 另外,我们给出了非0回归系数估计的大样本渐近性质. 最后,大量的模拟研究表明了DPLC方法和一般的最优子集选择方法在变量选择和估计方面有着相同的能力.  相似文献   

15.
The regression estimation of the mean of a primary survey variable and the estimation of the regression equation are considered in the finite population with transformed auxiliary variables. Large sample properties of estimators are developed. The effects of estimating auxiliary variates on estimators are investigated.  相似文献   

16.
We extend the least angle regression algorithm using the information geometry of dually flat spaces. The extended least angle regression algorithm is used for estimating parameters in generalized linear regression, and it can be also used for selecting explanatory variables. We use the fact that a model manifold of an exponential family is a dually flat space. In estimating parameters, curves corresponding to bisectors in the Euclidean space play an important role. Originally, the least angle regression algorithm is used for estimating parameters and selecting explanatory variables in linear regression. It is an efficient algorithm in the sense that the number of iterations is the same as the number of explanatory variables. We extend the algorithm while keeping this efficiency. However, the extended least angle regression algorithm differs significantly from the original algorithm. The extended least angle regression algorithm reduces one explanatory variable in each iteration while the original algorithm increases one explanatory variable in each iteration. We show results of the extended least angle regression algorithm for two types of datasets. The behavior of the extended least angle regression algorithm is shown. Especially, estimates of parameters become smaller and smaller, and vanish in turn.  相似文献   

17.
The elastic net (supervised enet henceforth) is a popular and computationally efficient approach for performing the simultaneous tasks of selecting variables, decorrelation, and shrinking the coefficient vector in the linear regression setting. Semisupervised regression, currently unrelated to the supervised enet, uses data with missing response values (unlabeled) along with labeled data to train the estimator. In this article, we propose the joint trained elastic net (jt-enet), which elegantly incorporates the benefits of semisupervised regression with the supervised enet. The supervised enet and other approaches like it rely on shrinking the linear estimator in a way that simultaneously performs variable selection and decorrelates the data. Both the variable selection and decorrelation components of the supervised enet inherently rely on the pairwise correlation structure in the feature data. In circumstances in which the number of variables is high, the feature data are relatively easy to obtain, and the response is expensive to generate, it seems reasonable that one would want to be able to use any existing unlabeled observations to more accurately define these correlations. However, the supervised enet is not able to incorporate this information and focuses only on the information within the labeled data. In this article, we propose the jt-enet, which allows the unlabeled data to influence the variable selection, decorrelation, and shrinkage capabilities of the linear estimator. In addition, we investigate the impact of unlabeled data on the risk and bias of the proposed estimator. The jt-enet is demonstrated on two applications with encouraging results. Online supplementary material is available for this article.  相似文献   

18.
混合时空地理加权回归模型作为一种有效处理空间数据全局平稳和局部非平稳的分析方法得到了广泛的应用.但其参数估计方法中假定固定系数变量已知且不存在时空效应,这一较强的前提使回归系数的估计值变得极不稳定.为探究当固定系数变量存在时空效应时的参数估计方法,本文提出一种变量选择(Variable Selection)方法来剔除指标间的交互效应,并给出相应的算法过程.通过乌鲁木齐市商品住宅真实价格数据对不同估计方法进行对比验证,结果表明,利用变量选择方法后得到的MGTWR模型性能和拟合效果得到提升,固定回归系数的估计更加稳定,原有参数估计方法得到改善.  相似文献   

19.
We consider the problems whose mathematical model is determined by some Markov chain terminating with probability one; moreover, we have to estimate linear functionals of a solution to an integral equation of the second kind with the corresponding substochastic kernel and free term [1]. To construct weighted modifications of numerical statistical models, we supplement the coordinates of the phase space with auxiliary variables whose random values functionally define the transitions in the initial chain. Having implemented each auxiliary random variable, we multiply the weight by the ratio of the corresponding densities of the initial and numerically modeled distributions. We solve the minimization problem for the variances of estimators of linear functionals by choosing the modeled distribution of the first auxiliary random variable.  相似文献   

20.
当前上市公司信用风险数据所呈现出的高维度以及高相关性的特点严重影响了信用风险模型的准确性。为此本文结合已有算法以及信用风险模型的特点设计了一种新的基于非参数的变量选择方法。通过该方法对上市公司用风险相关变量进行分析筛选可以消除数据集中包含的噪声变量以及线性相关变量。本文同时还针对该方法设计了高变量维度下最优解求解算法。文章以Logistic模型为例对上市公司信用风险做了实证分析,研究结果表明与以往的变量选择方法相比该方法可以有效的降低数据维度,消除变量间的相关性,并同时提高模型的可靠性和预测精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号