首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Risk Estimation for Classification Trees
Abstract:This article is a study of techniques for bias reduction of estimates of risk both globally and within terminal nodes of CARTR classification trees. In Section 5.4 of Classification and Regression Trees, Leo Breiman presented an estimator that has two free parameters. An empirical Bayes method was put forth for estimating them. Here we explain why the estimator should be successful in the many examples for which it is. We give numerical evidence from simulations in the two-class case with attention to ordinary resubstitution and seven other methods of estimation. There are 14 sampling distributions, all but one simulated and the remaining concerning E. coli promoter regions. We report on varying minimum node sizes of the trees; prior probabilities and misclassification costs; and, when relevant, the numbers of bootstraps or cross-validations. A variation of Breiman's method in which repeated cross-validation is employed to estimate global rates of misclassification was the most accurate from among the eight methods. Exceptions are cases for which the Bayes risk of the Bayes rule is small. For them, either a local bootstrap .632 estimate or Breiman's method modified to use a bootstrap estimate of the global misclassification rate is most accurate.
Keywords:,632 bootstrap,Empirical bayes,Breiman's method
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号