It has long been recognized that covariate adjustment can increase precision, even when it is not strictly necessary. The phenomenon is particularly emphasized in clinical trials, whether using continuous, categorical, or censored time-to-event outcomes. Adjustment is often straightforward when a discrete covariate partitions the sample into a handful of strata, but becomes more involved when modern studies collect copious amounts of baseline information on each subject.

The dilemma helped motivate locally efficient estimation for coarsened data structures, as surveyed in the books of van der Laan and Robins (2003) and Tsiatis (2006). Here one fits a relatively small working model for the full data distribution, often with maximum likelihood, giving a nuisance parameter fit in an estimating equation for the parameter of interest. The usual advertisement is that the estimator is asymptotically efficient if the working model is correct, but otherwise is still consistent and asymptotically Normal.

However, the working model will almost always be misspecified in practice. By applying standard likelihood based fits, one can poorly estimate the parameter of interest. We propose a new method, empirical efficiency maximization, to target the element of a working model minimizing asymptotic variance for the resulting parameter estimate, whether or not the working model is correctly specified.

Our procedure is illustrated in three examples. It is shown to be a potentially major improvement over existing covariate adjustment methods for estimating disease prevalence in two-phase epidemiological studies, treatment effects in two-arm randomized trials, and marginal survival curves. Numerical asymptotic efficiency calculations demonstrate gains relative to standard locally efficient estimators.


Clinical Trials | Epidemiology | Statistical Methodology | Statistical Theory | Survival Analysis