A Note on K-fold Least Squares Cross-Validation in Density Estimation

Daniel B. Rubin, Division of Biostatistics, School of Public Health, University of California, Berkeley
Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, Berkeley

Abstract

We take another look at the well-known least squares cross-validation method for selecting among candidate density estimators, introduced in Rudemo (1982) and Bowman et al. (1984). To our knowledge, theoretical results, simulation studies, implementations, and data analyses have treated this procedure almost exclusively as a version of leave-one-out cross-validation. We instead consider using the least squares cross-validation loss function as part of a holdout sample or K-fold procedure, provide a finite sample oracle inequality for the integrated mean squared error of this modified selector, and discuss simulations comparing the leave-one-out and K-fold versions. Our conclusion is that K-fold least squares cross-validation, although apparently rarely used or considered, is potentially a general and practical tool in density estimation.