首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Technology evaluation has become a critical part of technology investment, and accurate evaluation can lead more funds to the companies that have innovative technology. However, existing processes have a weakness in that it considers only accepted applicants at the application stage. We analyse the effectiveness of technology evaluation model that encompasses both accepted and rejected applicants and compare its performance with the original accept-only model. Also, we include the analysis of reject inference technique, bivariate probit model, in order to see if the reject inference technique is of use against the accept-only model. The results show that sample selection bias of the accept-only model exists and the reject inference technique improves the accept-only model. However, the reject inference technique does not completely resolve the problem of sample selection bias.  相似文献   

2.
在响应变量带有单调缺失的情形下考虑高维纵向线性回归模型的变量选择.主要基于逆概率加权广义估计方程提出了一种自动的变量选择方法,该方法不使用现有的惩罚函数,不涉及惩罚函数非凸最优化的问题,并且可以自动地剔除零回归系数,同时得到非零回归系数的估计.在一定正则条件下,证明了该变量选择方法具有Oracle性质.最后,通过模拟研究验证了所提出方法的有限样本性质.  相似文献   

3.
In this paper,we present a variable selection procedure by combining basis function approximations with penalized estimating equations for varying-coefficient models with missing response at random.With appropriate selection of the tuning parameters,we establish the consistency of the variable selection procedure and the optimal convergence rate of the regularized estimators.A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

4.
In this paper, we present a variable selection procedure by combining basis function approximations with penalized estimating equations for semiparametric varying-coefficient partially linear models with missing response at random. The proposed procedure simultaneously selects significant variables in parametric components and nonparametric components. With appropriate selection of the tuning parameters, we establish the consistency of the variable selection procedure and the convergence rate of the regularized estimators. A simulation study is undertaken to assess the finite sample performance of the proposed variable selection procedure.  相似文献   

5.
One of the aims of credit scoring models is to predict the probability of repayment of any applicant and yet such models are usually parameterised using a sample of accepted applicants only. This may lead to biased estimates of the parameters. In this paper we examine two issues. First, we compare the classification accuracy of a model based only on accepted applicants, relative to one based on a sample of all applicants. We find only a minimal difference, given the cutoff scores for the old model used by the data supplier. Using a simulated model we examine the predictive performance of models estimated from bands of applicants, ranked by predicted creditworthiness. We find that the lower the risk band of the training sample, the less accurate the predictions for all applicants. We also find that the lower the risk band of the training sample, the greater the overestimate of the true performance of the model, when tested on a sample of applicants within the same risk band — as a financial institution would do. The overestimation may be very large. Second, we examine the predictive accuracy of a bivariate probit model with selection (BVP). This parameterises the accept–reject model allowing for (unknown) omitted variables to be correlated with those of the original good–bad model. The BVP model may improve accuracy if the loan officer has overridden a scoring rule. We find that a small improvement when using the BVP model is sometimes possible.  相似文献   

6.

It is well known that variable selection in multiple regression can be unstable and that the model uncertainty can be considerable. The model uncertainty can be quantified and explored by bootstrap resampling, see Sauerbrei et al. (Biom J 57:531–555, 2015). Here approaches are introduced that use the results of bootstrap replications of the variable selection process to obtain more detailed information about the data. Analyses will be based on dissimilarities between the results of the analyses of different bootstrap samples. Dissimilarities are computed between the vector of predictions, and between the sets of selected variables. The dissimilarities are used to map the models by multidimensional scaling, to cluster them, and to construct heatplots. Clusters can point to different interpretations of the data that could arise from different selections of variables supported by different bootstrap samples. A new measure of variable selection instability is also defined. The methodology can be applied to various regression models, estimators, and variable selection methods. It will be illustrated by three real data examples, using linear regression and a Cox proportional hazards model, and model selection by AIC and BIC.

  相似文献   

7.
结构方程模型在社会学、教育学、医学、市场营销学和行为学中有很广泛的应用。在这些领域中,缺失数据比较常见,很多学者提出了带有缺失数据的结构方程模型,并对此模型进行过很多研究。在这一类模型的应用中,模型选择非常重要,本文将一个基于贝叶斯准则的统计量,称为L_v测度,应用到此类模型中进行模型选择。最后,本文通过一个模拟研究及实例分析来说明L_v测度的有效性及应用,并在实例分析中给出了根据贝叶斯因子进行模型选择的结果,以此来进一步说明该测度的有效性。  相似文献   

8.
We describe adaptive Markov chain Monte Carlo (MCMC) methods for sampling posterior distributions arising from Bayesian variable selection problems. Point-mass mixture priors are commonly used in Bayesian variable selection problems in regression. However, for generalized linear and nonlinear models where the conditional densities cannot be obtained directly, the resulting mixture posterior may be difficult to sample using standard MCMC methods due to multimodality. We introduce an adaptive MCMC scheme that automatically tunes the parameters of a family of mixture proposal distributions during simulation. The resulting chain adapts to sample efficiently from multimodal target distributions. For variable selection problems point-mass components are included in the mixture, and the associated weights adapt to approximate marginal posterior variable inclusion probabilities, while the remaining components approximate the posterior over nonzero values. The resulting sampler transitions efficiently between models, performing parameter estimation and variable selection simultaneously. Ergodicity and convergence are guaranteed by limiting the adaptation based on recent theoretical results. The algorithm is demonstrated on a logistic regression model, a sparse kernel regression, and a random field model from statistical biophysics; in each case the adaptive algorithm dramatically outperforms traditional MH algorithms. Supplementary materials for this article are available online.  相似文献   

9.
Markowitz的均值-方差模型在投资组合优化中得到了广泛的运用和拓展,其中多数拓展模型仅局限于对随机投资组合或模糊投资组合的研究,而忽略了实际问题同时包含了随机信息和模糊信息两个方面。本文首先定义随机模糊变量的方差用以度量投资组合的风险,提出具有阀值约束的最小方差随机模糊投资组合模型,基于随机模糊理论,将该模型转化为具有线性等式和不等式约束的凸二次规划问题。为了提高上述模型的有效性,本文以投资者期望效用最大化为压缩目标对投资组合权重进行压缩,构建等比例-最小方差混合的随机模糊投资组合模型,并求解该模型的最优解。最后,运用滚动实际数据的方法,比较上述两个模型的夏普比率以验证其有效性。  相似文献   

10.
主要研究因变量存在缺失且协变量部分包含测量误差情形下,如何对变系数部分线性模型同时进行参数估计和变量选择.我们利用插补方法来处理缺失数据,并结合修正的profile最小二乘估计和SCAD惩罚对参数进行估计和变量选择.并且证明所得的估计具有渐近正态性和Oracle性质.通过数值模拟进一步研究所得估计的有限样本性质.  相似文献   

11.
Unbiased Recursive Partitioning: A Conditional Inference Framework   总被引:1,自引:0,他引:1  
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.  相似文献   

12.
We develop a methodology to efficiently implement the reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms of Green, applicable for example to model selection inference in a Bayesian framework, which builds on the “dragging fast variables” ideas of Neal. We call such algorithms annealed importance sampling reversible jump (aisRJ). The proposed procedures can be thought of as being exact approximations of idealized RJ algorithms which in a model selection problem would sample the model labels only, but cannot be implemented. Central to the methodology is the idea of bridging different models with fictitious intermediate models, whose role is to introduce smooth intermodel transitions and, as we shall see, improve performance. Efficiency of the resulting algorithms is demonstrated on two standard model selection problems and we show that despite the additional computational effort incurred, the approach can be highly competitive computationally. Supplementary materials for the article are available online.  相似文献   

13.
In this paper, we propose a new approach to deal with the non-zero slacks in data envelopment analysis (DEA) assessments that is based on restricting the multipliers in the dual multiplier formulation of the used DEA model. It guarantees strictly positive weights, which ensures reference points on the Pareto-efficient frontier, and consequently, zero slacks. We follow a two-step procedure which, after specifying some weight bounds, results in an “Assurance Region”-type model that will be used in the assessment of the efficiency. The specification of these bounds is based on a selection criterion among the optimal solutions for the multipliers of the unbounded DEA models that tries to avoid the extreme dissimilarity between the weights that is often found in DEA applications. The models developed do not have infeasibility problems and we do not have problems with the alternate optima in the choice of weights that is made. To use our multiplier bound approach we do not need a priori information about substitutions between inputs and outputs, and it is not required the existence of full dimensional efficient facets on the frontier either, as is the case of other existing approaches that address this problem.  相似文献   

14.
Statistical estimation with model selection   总被引:1,自引:0,他引:1  
The purpose of this paper is to explain the interest and importance of (approximate) models and model selection in Statistics. Starting from the very elementary example of histograms we present a general notion of finite dimensional model for statistical estimation and we explain what type of risk bounds can be expected from the use of one such model. We then give the performance of snitable model selection procedures from a family of such models. We illustrate our point of view by two main examples: the choice of a partition for designing a histogram from an n-sample and the problem of variable selection in the context of Gaussian regression.  相似文献   

15.
An exhaustive search as required for traditional variable selection methods is impractical in high dimensional statistical modeling. Thus, to conduct variable selection, various forms of penalized estimators with good statistical and computational properties, have been proposed during the past two decades. The attractive properties of these shrinkage and selection estimators, however, depend critically on the size of regularization which controls model complexity. In this paper, we consider the problem of consistent tuning parameter selection in high dimensional sparse linear regression where the dimension of the predictor vector is larger than the size of the sample. First, we propose a family of high dimensional Bayesian Information Criteria (HBIC), and then investigate the selection consistency, extending the results of the extended Bayesian Information Criterion (EBIC), in Chen and Chen (2008) to ultra-high dimensional situations. Second, we develop a two-step procedure, the SIS+AENET, to conduct variable selection in p>n situations. The consistency of tuning parameter selection is established under fairly mild technical conditions. Simulation studies are presented to confirm theoretical findings, and an empirical example is given to illustrate the use in the internet advertising data.  相似文献   

16.
Partially linear regression models with fixed effects are useful tools for making econometric analyses and normalizing microarray data. Baltagi and Li (2002) [7] proposed a computation friendly difference-based series estimation (DSE) for them. We show that the DSE is not asymptotically efficient in most cases and further propose a weighted difference-based series estimation (WDSE). The weights in it do not involve any unknown parameters. The asymptotic properties of the resulting estimators are established for both balanced and unbalanced cases, and it is shown that they achieve a semiparametric efficient boundary. Additionally, we propose a variable selection procedure for identifying significant covariates in the parametric part of the semiparametric fixed-effects regression model. The method is based on a combination of the nonconcave penalization (Fan and Li, 2001 [13]) and weighted difference-based series estimation techniques. The resulting estimators have the oracle property; that is, they can correctly identify the true model as if the true model (the subset of variables with nonvanishing coefficients) were known in advance. Simulation studies are conducted and an application is given to demonstrate the finite sample performance of the proposed procedures.  相似文献   

17.
If a credit scoring model is built using only applicants who have been previously accepted for credit such a non-random sample selection may produce bias in the estimated model parameters and accordingly the model's predictions of repayment performance may not be optimal. Previous empirical research suggests that omission of rejected applicants has a detrimental impact on model estimation and prediction. This paper explores the extent to which, given the previous cutoff score applied to decide on accepted applicants, the number of included variables influences the efficacy of a commonly used reject inference technique, reweighting. The analysis benefits from the availability of a rare sample, where virtually no applicant was denied credit. The general indication is that the efficacy of reject inference is little influenced by either model leanness or interaction between model leanness and the rejection rate that determined the sample. However, there remains some hint that very lean models may benefit from reject inference where modelling is conducted on data characterized by a very high rate of applicant rejection.  相似文献   

18.
This paper deals in the nonparametric estimation of additive models in the presence of missing data in the response variable. Specifically in the case of additive models estimated by the Backfitting algorithm with local polynomial smoothers [1]. Three estimators are presented, one based on the available data and two based on a complete sample from imputation techniques. We also develop a data-driven local bandwidth selector based on a Wild Bootstrap approximation of the mean squared error of the estimators. The performance of the estimators and the local bootstrap bandwidth selection method are explored through simulation experiments.  相似文献   

19.
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context.  相似文献   

20.
This article suggests a method for variable and transformation selection based on posterior probabilities. Our approach allows for consideration of all possible combinations of untransformed and transformed predictors along with transformed and untransformed versions of the response. To transform the predictors in the model, we use a change-point model, or “change-point transformation,” which can yield more interpretable models and transformations than the standard Box–Tidwell approach. We also address the problem of model uncertainty in the selection of models. By averaging over models, we account for the uncertainty inherent in inference based on a single model chosen from the set of models under consideration. We use a Markov chain Monte Carlo model composition (MC3) method which allows us to average over linear regression models when the space of models under consideration is very large. This considers the selection of variables and transformations at the same time. In an example, we show that model averaging improves predictive performance as compared with any single model that might reasonably be selected, both in terms of overall predictive score and of the coverage of prediction intervals. Software to apply the proposed methodology is available via StatLib.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号