"Subsemble: An Ensemble Method for Combining Subset-Specific Algorithm " by Stephanie Sapp, Mark J. van der Laan et al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Subsemble: An Ensemble Method for Combining Subset-Specific Algorithm Fits

Authors

Stephanie Sapp, University of California - BerkeleyFollow
Mark J. van der Laan, University of California - BerkeleyFollow
John Canny, University of California, BerkeleyFollow

Abstract

Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be a beneficial tool for small to moderate sized datasets, and often has better prediction performance than the underlying algorithm fit just once on the full dataset. We also describe how to include Subsemble as a candidate in a SuperLearner library, providing a practical way to evaluate the performance of Subsemble relative to the underlying algorithm fit just once on the full dataset.

Disciplines

Applied Statistics | Biostatistics

Suggested Citation

Sapp, Stephanie; van der Laan, Mark J.; and Canny, John, "Subsemble: An Ensemble Method for Combining Subset-Specific Algorithm Fits" (May 2013). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 313.
https://biostats.bepress.com/ucbbiostat/paper313

Download

Included in

Applied Statistics Commons, Biostatistics Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics