We consider the random design nonparametric regression problem when the response variable is subject to a general mode of missingness or censoring. A traditional approach to such problems is imputation, in which the missing or censored responses are replaced by well-chosen values, and then the resulting covariate/response data are plugged into algorithms designed for the uncensored setting. We present a general methodology for imputation with the property of double robustness, in that the method works well if either a parameter of the full data distribution (covariate and response distribution) or a parameter of the censoring mechanism is well approximated. These procedures can be used advantageously when something is known about the censoring mechanism (i.e. when the censoring variable is independent of the survival time and response, in survival analysis), while methods based on maximizing a likelihood ignore this relevant information. We show how the methodology can be applied to examples where the response variable is missing, corresponds to a counterfactual outcome in a point treatment study, is right censored, or is subject to censoring as in current status data. To deal with identifiability problems (i.e. the conditional mean survival time may not be available from right censored data because of a lack of information regarding the survival distribution's tails), we show for these examples how the response of interest can be transformed, so that nonparametric regression remains a worthwhile endeavor. We remark on how our imputation procedure can be implemented by using general tools from efficiency theory and semiparametric estimation. General results are presented demonstrating how imputation procedures can accurately approximate regression functions when the imputed responses are entered into commonly used nonparametric regression procedures, including least squares estimators, complexity regularized least squares estimators, penalized least squares estimators, locally weighted average estimators, and estimators selected with cross-validation.


Statistical Methodology | Statistical Theory