首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we present the intermediate approach to investigating asymptotic power and measuring the efficiency of nonparametric goodness-of-fit tests for testing uniformity. Contrary to the classical Pitman approach, the intermediate approach allows the explicit quantitative comparison of powers and calculation of efficiencies. For standard tests, like the Cramér-von Mises test, an intermediate approach gives conclusions consistent with qualitative results obtained using the Pitman approach. For other more complicated cases the Pitman approach does not give the right picture of power behaviour. An example is the data driven Neyman test we present in this paper. In this case the intermediate approach gives results consistent with finite sample results. Moreover, using this setting, we prove that the data driven Neyman test is asymptotically the most powerful and efficient under any smooth departures from uniformity. This result shows that, contrary to classical tests being efficient and the most powerful under one particular type of departure from uniformity, the new test is an adaptive one.  相似文献   

2.
A new statistical methodology is developed for fitting left-truncated loss data by using the G-component finite mixture model with any combination of Gamma, Lognormal, and Weibull distributions. The EM algorithm, along with the emEM initialization strategy, is employed for model fitting. We propose a new grid map which considers the model selection criterion (AIC or BIC) and risk measures at the same time, by using the entire space of models under consideration. A simulation study validates our proposed approach. The application of the proposed methodology and use of new grid maps are illustrated through analyzing a real data set that includes left-truncated insurance losses.  相似文献   

3.
With uncorrelated Gaussian factors extended to mutually independent factors beyond Gaussian, the conventional factor analysis is extended to what is recently called independent factor analysis. Typically, it is called binary factor analysis (BFA) when the factors are binary and called non-Gaussian factor analysis (NFA) when the factors are from real non-Gaussian distributions. A crucial issue in both BFA and NFA is the determination of the number of factors. In the literature of statistics, there are a number of model selection criteria that can be used for this purpose. Also, the Bayesian Ying-Yang (BYY) harmony learning provides a new principle for this purpose. This paper further investigates BYY harmony learning in comparison with existing typical criteria, including Akaik’s information criterion (AIC), the consistent Akaike’s information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion on selection of the number of factors. This comparative study is made via experiments on the data sets with different sample sizes, data space dimensions, noise variances, and hidden factors numbers. Experiments have shown that for both BFA and NFA, in most cases BIC outperforms AIC, CAIC, and CV while the BYY criterion is either comparable with or better than BIC. In consideration of the fact that the selection by these criteria has to be implemented at the second stage based on a set of candidate models which have to be obtained at the first stage of parameter learning, while BYY harmony learning can provide not only a new class of criteria implemented in a similar way but also a new family of algorithms that perform parameter learning at the first stage with automated model selection, BYY harmony learning is more preferred since computing costs can be saved significantly.  相似文献   

4.
This paper discusses the topic of model selection for finite-dimensional normal regression models. We compare model selection criteria according to prediction errors based upon prediction with refitting, and prediction without refitting. We provide a new lower bound for prediction without refitting, while a lower bound for prediction with refitting was given by Rissanen. Moreover, we specify a set of sufficient conditions for a model selection criterion to achieve these bounds. Then the achievability of the two bounds by the following selection rules are addressed: Rissanen's accumulated prediction error criterion (APE), his stochastic complexity criterion, AIC, BIC and the FPE criteria. In particular, we provide upper bounds on overfitting and underfitting probabilities needed for the achievability. Finally, we offer a brief discussion on the issue of finite-dimensional vs. infinite-dimensional model assumptions.Support from the National Science Foundation, grant DMS 8802378 and support from ARO, grant DAAL03-91-G-007 to B. Yu during the revision are gratefully acknowledged.  相似文献   

5.
In software defect prediction with a regression model, too many metrics extracted from static code and aggregated (sum, avg, max, min) from methods into classes can be candidate features, and the classical feature selection methods, such as AIC, BIC, should be processed at a given model. As a result, the selected feature sets are significantly different for various models without a reasonable interpretation. Maximal information coefficient (MIC) presented by Reshef et al.\ucite{4} is a novel method to measure the degree of the interdependence between two continuous variables, and an available computing method is also given based on the observations. This paper firstly use the MIC between defect counts and each feature to select features, and then conduct the power transformation on the selected features, and finally build up the principal component Poisson and negative binomial regression model. All experiments are conducted on KC1 data set in NASA repository on the level of class. The block-regularized $m\times 2$ cross-validated sequential $t$-test is employed to test the difference of performance of two models. The performance measures of a model in this paper are FPA, AAE, ARE. The experimental results show that 1) the aggregated features, such as sum, avg, max, are selected by MIC except min, which are significantly different from AIC, BIC; 2) the power transformation to the features can improve the performance for majority of models; 3) after PCA and factorial analysis, two clear factors are obtained in the model. One corresponds to the aggregated features via avg and max, and the other corresponds to the aggregated features with sum. Therefore, the model owns a reasonable interpretation. Conclusively, the aggregated features with sum, avg, max are significantly effective for software defect prediction, and the regression model based on the selected features by MIC has some advantages.  相似文献   

6.
Summary The purpose of the present paper is to propose a simple but practically useful procedure for the analysis of multidimensional contingency tables of survey data. By the procedure we can determine the predictor on which a specific variable has the strongest dependence and also the optimal combination of predictors. The procedure is very simply realized by the search for the minimum of the statistic AIC within a set of models proposed in this paper. The practical utility of the procedure is demonstrated by the results of some successful applications to the analysis of the survey data of the Japanese national character. The difference between the present procedure and the conventional test procedure is briefly discussed. The Institute of Statistical Mathematics  相似文献   

7.
We develop an approach to tuning of penalized regression variable selection methods by calculating the sparsest estimator contained in a confidence region of a specified level. Because confidence intervals/regions are generally understood, tuning penalized regression methods in this way is intuitive and more easily understood by scientists and practitioners. More importantly, our work shows that tuning to a fixed confidence level often performs better than tuning via the common methods based on Akaike information criterion (AIC), Bayesian information criterion (BIC), or cross-validation (CV) over a wide range of sample sizes and levels of sparsity. Additionally, we prove that by tuning with a sequence of confidence levels converging to one, asymptotic selection consistency is obtained, and with a simple two-stage procedure, an oracle property is achieved. The confidence-region-based tuning parameter is easily calculated using output from existing penalized regression computer packages. Our work also shows how to map any penalty parameter to a corresponding confidence coefficient. This mapping facilitates comparisons of tuning parameter selection methods such as AIC, BIC, and CV, and reveals that the resulting tuning parameters correspond to confidence levels that are extremely low, and can vary greatly across datasets. Supplemental materials for the article are available online.  相似文献   

8.
Abstract

Logspline density estimation is developed for data that may be right censored, left censored, or interval censored. A fully automatic method, which involves the maximum likelihood method and may involve stepwise knot deletion and either the Akaike information criterion (AIC) or Bayesian information criterion (BIC), is used to determine the estimate. In solving the maximum likelihood equations, the Newton–Raphson method is augmented by occasional searches in the direction of steepest ascent. Also, a user interface based on S is described for obtaining estimates of the density function, distribution function, and quantile function and for generating a random sample from the fitted distribution.  相似文献   

9.
The generalized information criterion (GIC) proposed by Rao and Wu [A strongly consistent procedure for model selection in a regression problem, Biometrika 76 (1989) 369-374] is a generalization of Akaike's information criterion (AIC) and the Bayesian information criterion (BIC). In this paper, we extend the GIC to select linear mixed-effects models that are widely applied in analyzing longitudinal data. The procedure for selecting fixed effects and random effects based on the extended GIC is provided. The asymptotic behavior of the extended GIC method for selecting fixed effects is studied. We prove that, under mild conditions, the selection procedure is asymptotically loss efficient regardless of the existence of a true model and consistent if a true model exists. A simulation study is carried out to empirically evaluate the performance of the extended GIC procedure. The results from the simulation show that if the signal-to-noise ratio is moderate or high, the percentages of choosing the correct fixed effects by the GIC procedure are close to one for finite samples, while the procedure performs relatively poorly when it is used to select random effects.  相似文献   

10.
We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator. The statistic has an asymptotic chisquared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997, the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001. The statistic is easy to compute in the sense that it requires none of the following methods: using a bootstrap method to find its critical values, partitioning the sample data or inverting a high-dimensional matrix. We present some results on simulation and on analysis of two real examples. Moreover, we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart. This work was supported by the 11.5 Natural Scientific Plan (Grant No. 2006BAD09A04) and Nanjing University Start Fund (Grant No. 020822410110)  相似文献   

11.
In the framework of generalized linear models, the nonrobustness of classical estimators and tests for the parameters is a well known problem, and alternative methods have been proposed in the literature. These methods are robust and can cope with deviations from the assumed distribution. However, they are based on first order asymptotic theory, and their accuracy in moderate to small samples is still an open question. In this paper, we propose a test statistic which combines robustness and good accuracy for moderate to small sample sizes. We combine results from Cantoni and Ronchetti [E. Cantoni, E. Ronchetti, Robust inference for generalized linear models, Journal of the American Statistical Association 96 (2001) 1022–1030] and Robinson, Ronchetti and Young [J. Robinson, E. Ronchetti, G.A. Young, Saddlepoint approximations and tests based on multivariate M-estimators, The Annals of Statistics 31 (2003) 1154–1169] to obtain a robust test statistic for hypothesis testing and variable selection, which is asymptotically χ2-distributed as the three classical tests but with a relative error of order O(n−1). This leads to reliable inference in the presence of small deviations from the assumed model distribution, and to accurate testing and variable selection, even in moderate to small samples.  相似文献   

12.
Recent empirical results indicate that many financial time series, including stock volatilities, often have long‐range dependencies. Comparing volatilities in stock returns is a crucial part of the risk management of stock investing. This paper proposes two test statistics for testing the equality of mean volatilities of stock returns using the analysis of variance (ANOVA) model with long memory errors. They are modified versions of the ordinary F statistic used in the ANOVA models with independently and identically distributed errors. One has a form of the ordinary F statistic multiplied by a correction factor, which reflects slowly decaying autocorrelations, that is, long‐range dependence. The other is a test statistic such that the degrees of freedom of the denominator in the ordinary F test statistic is calibrated by the so‐called effective sample size. Empirical sizes and powers of the proposed test statistics are examined via Monte Carlo simulation. An application to German stock returns is presented. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

13.
Summary A model selection rule of the form minimize [−2 log (maximized likelihood)+complexity] is considered, which is equivalent to Akaike's minimum AIC rule if the complexity of a model is defined to be twice the number of independently adjusted parameters of the model. Under reasonable assumptions, when applied to a locally asymptotically normal sequence of experiments, the model selection rule is shown to be locally asymptotically admissible with respect to a loss function of the form [inaccuracy+complexity], where the inaccuracy is defined as twice the Kullback-Leibler measure of the discrepancy between the true model and the fitted version of the selected model. This research was supported by NSF Grant No. MCS 80-02732.  相似文献   

14.
考虑在纵向数据情况下不会导致风险显著增加的最高剂量的估计问题.基于A IC(A kaike In form ation C riterion)的思想提出了一个新的估计方法,证明了该估计的强相合性,并通过模拟的结果说明了其对于中小样本的优良性.  相似文献   

15.
实证研究中预测模型的选择:从逐步回归到信息标准   总被引:1,自引:0,他引:1  
本文首先对显著性变量同变量显著性之间的关系予以讨论并区分,进而评价逐步回归模型选择法的缺陷性。在此基础上,我们对以AIC和B IC为代表的各种基于信息标准的模型选择法予以介绍和评论。同逐步回归法相比,信息标准模型选择法有着坚实的统计理论基础及清晰而优良的统计性质。本文通过基于近十年中国股市数据的实证检验说明,信息标准同逐步回归相比往往能产生具有更强预测能力的计量模型,因此值得在未来的实证研究中注意并推广。  相似文献   

16.
This paper proposes a new statistic to test independence of high-dimensional data. The simulation results suggest that the performance of the test based on our statistic is comparable to the existing ones, and under some circumstances it may have higher power. Therefore, the new statistic can be employed in practice as an alternative choice.  相似文献   

17.
本文给出了响应变量随机右删失情况下线性模型的FIC (focused information criterion) 模型选择方法和光滑FIC 模型平均估计方法, 证明了兴趣参数的FIC 模型选择估计和光滑FIC 模型平均估计的渐近正态性, 通过随机模拟研究了估计的有限样本性质, 模拟结果显示, 从均方误差和一定置信水平置信区间的经验覆盖概率看, 兴趣参数的光滑FIC 模型平均估计均优于FIC, AIC (Akaikeinformation criterion) 和BIC (Bayesian information citerion) 等模型选择估计; 而FIC 模型选择估计与AIC 和BIC 等模型选择估计相比, 也表现出了一定的优越性. 通过分析原发性胆汁性肝硬化数据集, 说明了本文方法在实际问题中的应用.  相似文献   

18.
LetX1, …, Xnbe observations from a multivariate AR(p) model with unknown orderp. A resampling procedure is proposed for estimating the orderp. The classical criteria, such as AIC and BIC, estimate the orderpas the minimizer of the function[formula]wherenis the sample size,kis the order of the fitted model, Σ2kis an estimate of the white noise covariance matrix, andCnis a sequence of specified constants (for AIC,Cn=2m2/n, for Hannan and Quinn's modification of BIC,Cn=2m2(ln ln n)/n, wheremis the dimension of the data vector). A resampling scheme is proposed to estimate an improved penalty factorCn. Conditional on the data, this procedure produces a consistent estimate ofp. Simulation results support the effectiveness of this procedure when compared with some of the traditional order selection criteria. Comments are also made on the use of Yule–Walker as opposed to conditional least squares estimations for order selection.  相似文献   

19.
Regression models with a large number of predictors arise in diverse fields of social sciences and natural sciences. For proper interpretation, we often would like to identify a smaller subset of the variables that shows the strongest information. In such a large size of candidate predictors setting, one would encounter a computationally cumbersome search in practice by optimizing some criteria for selecting variables, such as AIC, \(C_{P}\) and BIC, through all possible subsets. In this paper, we present two efficient optimization algorithms vis Markov chain Monte Carlo (MCMC) approach for searching the global optimal subset. Simulated examples as well as one real data set exhibit that our proposed MCMC algorithms did find better solutions than other popular search methods in terms of minimizing a given criterion.  相似文献   

20.
The evaluation of performance of a design for complex discrete event systems through simulation is usually very time consuming. Optimizing the system performance becomes even more computationally infeasible. Ordinal optimization (OO) is a technique introduced to attack this difficulty in system design by looking at “order” in performances among designs instead of “value” and providing a probability guarantee for a good enough solution instead of the best for sure. The selection rule, known as the rule to decide which subset of designs to select as the OO solution, is a key step in applying the OO method. Pairwise elimination and round robin comparison are two selection rule examples. Many other selection rules are also frequently used in the ordinal optimization literature. To compare selection rules, we first identify some general facts about selection rules. Then we use regression functions to quantify the efficiency of a group of selection rules, including some frequently used rules. A procedure to predict good selection rules is proposed and verified by simulation and by examples. Selection rules that work well most of the time are recommended.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号