In many randomized and observational studies the allocation of treatment among a sample of n independent and identically distributed units is a function of the covariates of all sampled units. As a result, the treatment labels among the units are possibly dependent, complicating estimation and posing challenges for statistical inference. For example, cluster randomized trials frequently sample communities from some target population, construct matched pairs of communities from those included in the sample based on some metric of similarity in baseline community characteristics, and then randomly allocate a treatment and a control intervention within each matched pair. In this case, the observed data can neither be represented as the realization of n independent random variables, nor, contrary to current practice, as the realization of n/2 independent random variables (treating the matched pair as the independent sampling unit).

In this paper we study estimation of the average causal effect of a treatment under experimental designs in which treatment allocation potentially depends on the pre-intervention covariates of all units included in the sample. We define efficient targeted minimum loss based estimators for this general design, present a theorem that establishes the desired asymptotic normality of these estimators and allows for asymptotically valid statistical inference, and discuss implementation of these estimators. We further investigate the relative asymptotic efficiency of this design compared with a design in which unit-specific treatment assignment depends only on the units' covariates. Our findings have practical implications for the optimal design and analysis of pair matched cluster randomized trials, as well as for observational studies in which treatment decisions may depend on characteristics of the entire sample.



Included in

Biostatistics Commons