Risk Estimation for Classification Trees |
| |
Abstract: | This article is a study of techniques for bias reduction of estimates of risk both globally and within terminal nodes of CARTR classification trees. In Section 5.4 of Classification and Regression Trees, Leo Breiman presented an estimator that has two free parameters. An empirical Bayes method was put forth for estimating them. Here we explain why the estimator should be successful in the many examples for which it is. We give numerical evidence from simulations in the two-class case with attention to ordinary resubstitution and seven other methods of estimation. There are 14 sampling distributions, all but one simulated and the remaining concerning E. coli promoter regions. We report on varying minimum node sizes of the trees; prior probabilities and misclassification costs; and, when relevant, the numbers of bootstraps or cross-validations. A variation of Breiman's method in which repeated cross-validation is employed to estimate global rates of misclassification was the most accurate from among the eight methods. Exceptions are cases for which the Bayes risk of the Bayes rule is small. For them, either a local bootstrap .632 estimate or Breiman's method modified to use a bootstrap estimate of the global misclassification rate is most accurate. |
| |
Keywords: | ,632 bootstrap,Empirical bayes,Breiman's method |
|
|