Published 2004 in Bernoulli, Vol. 10, No. 6, p. 1011-1037.


Over the last two decades, non-parametric and semi-parametric approaches that adapt well known techniques such as regression methods to the analysis of right censored data, e.g. right censored survival data, became popular in the statistics literature. However, the problem of choosing the best model (predictor) among a set of proposed models (predictors) in the right censored data setting have not gained much attention. In this paper, we develop a new cross-validation based model selection method to select among predictors of right censored outcomes such as survival times. The proposed method considers the risk of a given predictor based on the training sample as a parameter of the full data distribution in a right censored data model. Then, the doubly robust locally efficient estimation method or an ad hoc inverse probability of censoring weighting method as presented in Robins and Rotnitzky (1992) and van der Laan and Robins (2002) is used to estimate this conditional risk parameter based on the validation sample. We prove that, under general conditions, the proposed cross-validated selector is asymptotically equivalent with an oracle benchmark selector based on the true data generating distribution. The presented method covers model selection with right censored data in prediction (univariate and multivariate) and density/hazard estimation problems.


Statistical Methodology | Statistical Models | Statistical Theory | Survival Analysis

Previous Versions

January 28, 2003