A significant revision of this technical report was published as: S. Rose, M.J. van der Laan (2011). "A Targeted Maximum Likelihood Estimator for Two-Stage Designs," Int J Biostat: 7(1): Article 17.


A nested case-control study is conducted within a well-defined cohort arising out of a population of interest. This design is often used in epidemiology to reduce the costs associated with collecting data on the full cohort; however, the case control sample within the cohort is a biased sample. Methods for analyzing case-control studies have largely focused on logistic regression models that provide conditional and not marginal causal estimates of the odds ratio. We previously developed a Case-Control Weighted Targeted Maximum Likelihood Estimation (TMLE) procedure for case-control study designs, which relies on the prevalence probability q0. We propose the use of Case-Control Weighted TMLE in nested case-control samples, with either known q0 or q0 estimated from the full cohort. We show that this procedure is efficient for a reduced data structure, the data structure where covariate information is not collected or available on non-case-control subjects, and recognize that it is not fully efficient for the full data. However, in many common scenarios, the full data is not available, thus our procedure is maximally efficient for the data given. For statistical inference, we view the nested case-control sample as a missing data problem (Robins et al., 1994). Case-Control Weighted TMLE on the reduced data structure is illustrated in simulations for cohorts with and without right censoring and also effect modification in randomized controlled trials.


Biostatistics | Epidemiology | Statistical Methodology | Statistical Theory