An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification of any of these models can lead to severely baised estimates of causal effects, the dependence of current methods on such parametric models represents a major limitation. In this article we introduce estimators that allow the marginal structural model as well as the parametric model for the relevant nuisance parameter to be selected data-adaptively. Our methodology is based on the unified loss-based estimation approach recently developed by van der Laan and Dudoit (2003) that in particular extends loss-based estimation to missing data problems. We study the practical performance of our proposed estimators in an extensive simulation study and also apply them to data derived from an epidemiologic study to assess the causal effect of forced expiratory volume on mortality in the elderly. All of the estimators presented in this article are made publicly available in the R package cvDSA.


Epidemiology | Numerical Analysis and Computation | Statistical Methodology | Statistical Models | Statistical Theory