Residual-based Tree for Clustered Binary Data


An earlier version was submitted for publication to Statistics in Medicine in 2015.


Tree-based methods are widely used for classification in health sciences research, where data are often clustered. In this paper, we proposed a variant of the standard classification and regression tree paradigm (CART) to handle clustered binary outcome settings where covariates are observed both at the cluster- and individual- levels. Using residuals from a null generalized linear mixed model as the response, we build a regression tree to partition the covariate space into rectangles. This circumvents modeling the correlation structure implicitly while still accounting for the cluster-correlated design, thereby allowing us to adopt the standard CART machinery in tree growing, pruning, and cross-validation. Class predictions for each terminal node in the final tree are estimated as the success probabilities within the specific node. Our method also allows easy extension to ensemble of trees and random forest. Based on extensive simulations, we compare our residual-based trees to the standard classification tree. Finally, the methods are illustrated using data from a study of kidney cancer treatment receipt and a study of surgical mortality after colectomy.


Applied Statistics | Biostatistics

This document is currently not available here.