Mark van der Laan was supported by the NIH research grant (R01 A1074345) titled, "Targeted Empirical Super Learning in HIV Research".



Suppose that we observe a population of causally connected units according to a network. On each unit we observe a set of potentially connected units that contains the true connections, and a longitudinal data structure, which includes time-dependent exposure or treatment, time-dependent covariates, a final outcome of interest. The target quantity of interest is defined as the mean outcome for this group of units if the exposures of the units would be probabilistically assigned according to a known specified mechanism, where the latter is called a stochastic intervention. Causal effects of interest are defined as contrasts of the mean of the unit specific outcomes under different stochastic interventions one wishes to evaluate. By varying the network structure, this covers a large range of estimation problems ranging from independent units, independent clusters of units, anda single cluster of units in which each unit has a limited number of connections to other units. We present a few motivating classes of examples, propose a structural causal model, define the desired causal quantities, address the identification of these quantities from the observed data, and define maximum likelihood based estimators based on cross-validation.

Such smoothed/regularized maximum likelihood estimators are not targeted and will thereby be overly bias w.r.t. the target parameter, and, as a consequence, generally not result in asymptotically normally distributed estimators of the statistical target parameter. Therefore, we formulated targeted maximum likelihood estimators of this estimand, and showed that the robustness of the efficient influence curve implies that the bias of the TMLE will be a second order term involving squared differences of two nuisance parameters. In order to deal with the curse of dimensionality, we present super-learning based on cross-validation, and we develop targeted maximum likelihood estimators, which are less biased than maximum likelihood estimators due to a targeted bias reduction step. Due to the causal dependencies between units, the data set may correspond with the realization of a single experiment, so that establishing a (e.g., normal) limit distribution for the estimators, and corresponding statistical inference, is a challenging topic. In order to establish a formal theorem, we focus on the point-treatment longitudinal data structure, thereby also putting down a foundation for its generalization to the general longitudinal data structure, which we reserve for future research.We conclude with a discussion.



Included in

Biostatistics Commons