首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 265 毫秒
1.
Abstract

Akaike's information criterion (AIC), derived from asymptotics of the maximum likelihood estimator, is widely used in model selection. However, it has a finite-sample bias that produces overfitting in linear regression. To deal with this problem, Ishiguro, Sakamoto, and Kitagawa proposed a bootstrap-based extension to AIC which they called EIC. This article compares model-selection performance of AIC, EIC, a bootstrap-smoothed likelihood cross-validation (BCV) and its modification (632CV) in small-sample linear regression, logistic regression, and Cox regression. Simulation results show that EIC largely overcomes AIC's overfitting problem and that BCV may be better than EIC. Hence, the three methods based on bootstrapping the likelihood establish themselves as important alternatives to AIC in model selection with small samples.  相似文献   

2.
We consider the use ofB-spline nonparametric regression models estimated by the maximum penalized likelihood method for extracting information from data with complex nonlinear structure. Crucial points inB-spline smoothing are the choices of a smoothing parameter and the number of basis functions, for which several selectors have been proposed based on cross-validation and Akaike information criterion known as AIC. It might be however noticed that AIC is a criterion for evaluating models estimated by the maximum likelihood method, and it was derived under the assumption that the ture distribution belongs to the specified parametric model. In this paper we derive information criteria for evaluatingB-spline nonparametric regression models estimated by the maximum penalized likelihood method in the context of generalized linear models under model misspecification. We use Monte Carlo experiments and real data examples to examine the properties of our criteria including various selectors proposed previously.  相似文献   

3.
We derive an information criterion to select a parametric model of complete-data distribution when only incomplete or partially observed data are available. Compared with AIC, our new criterion has an additional penalty term for missing data, which is expressed by the Fisher information matrices of complete data and incomplete data. We prove that our criterion is an asymptotically unbiased estimator of complete-data divergence, namely the expected Kullback–Leibler divergence between the true distribution and the estimated distribution for complete data, whereas AIC is that for the incomplete data. The additional penalty term of our criterion for missing data turns out to be only half the value of that in previously proposed information criteria PDIO and AICcd. The difference in the penalty term is attributed to the fact that our criterion is derived under a weaker assumption. A simulation study with the weaker assumption shows that our criterion is unbiased while the other two criteria are biased. In addition, we review the geometrical view of alternating minimizations of the EM algorithm. This geometrical view plays an important role in deriving our new criterion.  相似文献   

4.
An Application of Multiple Comparison Techniques to Model Selection   总被引:1,自引:0,他引:1  
Akaike's information criterion (AIC) is widely used to estimate the best model from a given candidate set of parameterized probabilistic models. In this paper, considering the sampling error of AIC, a set of good models is constructed rather than choosing a single model. This set is called a confidence set of models, which includes the minimum {AIC} model at an error rate smaller than the specified significance level. The result is given as P-value for each model, from which the confidence set is immediately obtained. A variant of Gupta's subset selection procedure is devised, in which a standardized difference of AIC is calculated for every pair of models. The critical constants are computed by the Monte-Carlo method, where the asymptotic normal approximation of AIC is used. The proposed method neither requires the full model nor assumes a hierarchical structure of models, and it has higher power than similar existing methods.  相似文献   

5.
With uncorrelated Gaussian factors extended to mutually independent factors beyond Gaussian, the conventional factor analysis is extended to what is recently called independent factor analysis. Typically, it is called binary factor analysis (BFA) when the factors are binary and called non-Gaussian factor analysis (NFA) when the factors are from real non-Gaussian distributions. A crucial issue in both BFA and NFA is the determination of the number of factors. In the literature of statistics, there are a number of model selection criteria that can be used for this purpose. Also, the Bayesian Ying-Yang (BYY) harmony learning provides a new principle for this purpose. This paper further investigates BYY harmony learning in comparison with existing typical criteria, including Akaik’s information criterion (AIC), the consistent Akaike’s information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion on selection of the number of factors. This comparative study is made via experiments on the data sets with different sample sizes, data space dimensions, noise variances, and hidden factors numbers. Experiments have shown that for both BFA and NFA, in most cases BIC outperforms AIC, CAIC, and CV while the BYY criterion is either comparable with or better than BIC. In consideration of the fact that the selection by these criteria has to be implemented at the second stage based on a set of candidate models which have to be obtained at the first stage of parameter learning, while BYY harmony learning can provide not only a new class of criteria implemented in a similar way but also a new family of algorithms that perform parameter learning at the first stage with automated model selection, BYY harmony learning is more preferred since computing costs can be saved significantly.  相似文献   

6.
Abstract

Logspline density estimation is developed for data that may be right censored, left censored, or interval censored. A fully automatic method, which involves the maximum likelihood method and may involve stepwise knot deletion and either the Akaike information criterion (AIC) or Bayesian information criterion (BIC), is used to determine the estimate. In solving the maximum likelihood equations, the Newton–Raphson method is augmented by occasional searches in the direction of steepest ascent. Also, a user interface based on S is described for obtaining estimates of the density function, distribution function, and quantile function and for generating a random sample from the fitted distribution.  相似文献   

7.
This paper deals with the bias correction of the cross-validation (CV) criterion to estimate the predictive Kullback-Leibler information. A bias-corrected CV criterion is proposed by replacing the ordinary maximum likelihood estimator with the maximizer of the adjusted log-likelihood function. The adjustment is just slight and simple, but the improvement of the bias is remarkable. The bias of the ordinary CV criterion is O(n-1), but that of the bias-corrected CV criterion is O(n-2). We verify that our criterion has smaller bias than the AIC, TIC, EIC and the ordinary CV criterion by numerical experiments.  相似文献   

8.
This paper considers generalized linear models in a data‐rich environment in which a large number of potentially useful explanatory variables are available. In particular, it deals with the case that the sample size and the number of explanatory variables are of similar sizes. We adopt the idea that the relevant information of explanatory variables concerning the dependent variable can be represented by a small number of common factors and investigate the issue of selecting the number of common factors while taking into account the effect of estimated regressors. We develop an information criterion under model mis‐specification for both the distributional and structural assumptions and show that the proposed criterion is a natural extension of the Akaike information criterion (AIC). Simulations and empirical data analysis demonstrate that the proposed new criterion outperforms the AIC and Bayesian information criterion. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
Summary  The aim of this paper is to propose new selection criteria for the orders of selfexciting threshold autoregressive (SETAR) models. These criteria use bootstrap methodology; they are based on a weighted mean of the apparent error rate in the sample and the average error rate obtained from bootstrap samples not containing the point being predicted. These new criteria are compared with the traditional ones based on the Akaike information criterion (AIC). A simulation study and an example on a real data set end the paper.  相似文献   

10.
A method for feature selection in linear regression based on an extension of Akaike’s information criterion is proposed. The use of classical Akaike’s information criterion (AIC) for feature selection assumes the exhaustive search through all the subsets of features, which has unreasonably high computational and time cost. A new information criterion is proposed that is a continuous extension of AIC. As a result, the feature selection problem is reduced to a smooth optimization problem. An efficient procedure for solving this problem is derived. Experiments show that the proposed method enables one to efficiently select features in linear regression. In the experiments, the proposed procedure is compared with the relevance vector machine, which is a feature selection method based on Bayesian approach. It is shown that both procedures yield similar results. The main distinction of the proposed method is that certain regularization coefficients are identical zeros. This makes it possible to avoid the underfitting effect, which is a characteristic feature of the relevance vector machine. A special case (the so-called nondiagonal regularization) is considered in which both methods are identical.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号