In the presence of missing response, reweighting the complete case subsample by the inverse of nonmissing probability is both intuitive and easy to implement. However, inverse probability weighting is not efficient in general and is not robust against misspecification of the missing probability model. Calibration was developed by survey statisticians for improving efficiency of inverse probability weighting estimators when population totals of auxiliary variables are known and when inclusion probability is known by design. In missing data problem we can calibrate auxiliary variables in the complete case subsample to the full sample. However, the inclusion probability is unknown in general and need to be estimated in missing data problems and it is unclear whether calibration is robust against misspecification of the missing probability model. It is also unclear how efficient calibration is for general missing data problem. This paper answers these two questions and presents two rather unexpected results. First, when the missing data probability is correctly specified and multiple working outcome regression models are posited, calibration enjoys an oracle property where the same semiparametric efficiency bound is attained as if the true outcome model is known in advance. Second, when the missing mechanism is misspecified, calibration can still be a consistent estimator when any one of the outcome regression model is correctly specified. This is a multiple robustness property more general than double robustness considered the missing data literature. We provide connections of a wide class of calibration estimator constructed based on generalized empirical likelihood to many existing estimators in biostatistics, econometrics and survey sampling and perform simulation studies to study the finite sample properties of calibration estimators.


Statistical Methodology | Statistical Theory