I would like to thank Maya Petersen, Victor DeGruttolas, Alan Hubbard, and Ira Tager for helpful comments. NIH Award # R01 AI074345


Suppose one assigns two interventions to a small number K of different populations or communities, and one measures covariates and outcomes on a random sample of independent individuals from each of the K populations. We investigate the problem of identification and estimation of the causal effect of the choice of intervention assigned at the community level, and, if the intervention is time-dependent, the causal effect of the changes in the intervention at time t, on the outcome. The challenge one is confronted with is that different populations have different environmental factors and that the intervention and environment are assigned to the whole population instead of to the individual. The question we wish to address is if one can still estimate the causal effect of the intervention one would have obtained if one would have combined all units across the multiple populations, each unit having their assigned environment and individual covariates, randomly assign the intervention among the two possible interventions to the unit, and then compare the outcome distributions for the two treatment groups: i.e., if one would have carried out the ideal experiment of randomizing treatment allocation to the units of the combined population, thereby dealing with confounding due to different units having different environments and corresponding individual covariates.

We apply the roadmap based on causal modeling with a nonparametric structural equation model, which involves 1) defining the target causal effect as a parameter on the nonparametric structural equation model, 2) addressing the identifiability from the observed data, and, 3) given an identifiability result under the required assumptions, the efficient estimation of the resulting statistical target parameter through targeted maximum likelihood substitution estimators, using cross-validation to fine tune the estimators. The fundamental identifiability assumption we make is that one collects baseline covariates on the individual that block the effect of the environment on the outcome of interest, which is formulated as an exclusion restriction assumption in the nonparametric structural equation model.

In addition, we utilize the understanding of the causal identifiability assumptions to evaluate the matched sampling design in which the units of different communities are matched on individual factors. We present efficient weighted targeted maximum likelihood estimators for these matched sampling designs, and we establish the concrete theoretical gain in information for the target parameter relative to independent sampling, by application of general results on case-control biased sampling in van der Laan (2008).

Our methods can be reasonably well applied to the case that the intervention causes infectious behavior among individuals, possibly resulting in an enhanced effect, and to the case that interaction between individuals creates dependence between the individuals. However, the methods would not take into account the effect of this dependence among individuals on the assessment of uncertainty in the point estimates. For that purpose we also propose an estimate of standard error of the point estimate that takes into account arbitrary (and unknown to the user) dependence structures that still permit a central limit theorem based normal approximations.

Our framework and methods are extended to the case that the communities are followed up over time and exposed to a single time-dependent treatment regimen, while also being subjected to changes in environment over time. In particular, we consider the case of estimation of a causal effect of a change in treatment over time based on observing a single community over time under a certain time-dependent treatment regimen.

We also generalize our results to causal effects of combined community based intervention and individually assigned treatment on an outcome of interest. It is shown that G-computation formulas and corresponding estimators developed for causal effects of individually assigned treatments can be fully utilized to estimate these causal effects.

Finally, we consider the case in which one is not willing to assume the exclusion restriction assumption, but many communities are sampled. For that purpose we propose statistical inference that naturally adapts to the degree at which the exclusion restriction assumption is approximated and the number of communities that are sampled. This allows for a unified framework for analyzing studies that involve community based interventions.



Included in

Biostatistics Commons