On the Relation between Prediction and Imputation Accuracy under Missing Covariates |
| |
Authors: | Burim Ramosaj Justus Tulowietzki Markus Pauly |
| |
Affiliation: | Faculty of Statistics, TU Dortmund University, Joseph-Von-Fraunhofer Str. 2-4, 44227 Dortmund, Germany; (J.T.); (M.P.) |
| |
Abstract: | Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the use of modern Machine-Learning algorithms for imputation. This originates from their capability of showing favorable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine-Learning-based methods for both imputation and prediction are used. We see that even a slight decrease in imputation accuracy can seriously affect the prediction accuracy. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as the coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study. |
| |
Keywords: | missing covariates imputation accuracy prediction accuracy prediction intervals bagging boosting |
|
|