Abstract
Spatial prediction is an important problem in many scientific disciplines. Super Learner is an ensemble prediction approach related to stacked generalization that uses cross-validation to search for the optimal predictor amongst all convex combinations of a heterogeneous candidate set. It has been applied to non-spatial data, where theoretical results demonstrate it will perform asymptotically at least as well as the best candidate under consideration. We review these optimality properties and discuss the assumptions required in order for them to hold for spatial prediction problems. We present results of a simulation study confirming Super Learner works well in practice under a variety of sample sizes, sampling designs, and data-generating functions. We also apply Super Learner to a real world dataset.
Disciplines
Biostatistics
Suggested Citation
Davies, Molly M. and van der Laan, Mark J., "Optimal Spatial Prediction Using Ensemble Machine Learning" (December 2012). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 305.
https://biostats.bepress.com/ucbbiostat/paper305
Comments
Molly M. Davies is a Doctoral student in Biostatistics, University of California - Berkeley. Mailing address: Group in Biostatistics, University of California - Berkeley, 101 Haviland Hall, Berkeley, CA 94720. (email: molly_davies@berkeley.edu).
Mark J. van der Laan is a Professor of Statistics and Biostatistics, University of California - Berkeley. Mailing address: Group in Biostatistics, University of California - Berkeley, 108 Haviland Hall, Berkeley, CA 94720. (email: laan@stat.berkeley.edu).
Financial support for this research was provided through the NIH grant R01 AI074345. Molly M. Davies would also like to acknowledge University of California, Berkeley's Mentored Research Fellowship program for their support.