Suppose we observe n independent and identically distributed observations of a finite dimensional bounded random variable. This article is concerned with the construction of an efficient targeted minimum loss-based estimator (TMLE) of a pathwise differentiable target parameter based on a realistic statistical model.

The canonical gradient of the target parameter at a particular data distribution will depend on the data distribution through an infinite dimensional nuisance parameter which can be defined as the minimizer of the expectation of a loss function (e.g., log-likelihood loss). For many models and target parameters the nuisance parameter can be split up in two components, one required for evaluation of the target parameter and one real nuisance parameter. The only smoothness condition we will enforce on the statistical model is that these nuisance parameters are multivariate real valued cadlag functions and have a finite supremum and variation norm.

We propose a general one-step targeted minimum loss-based estimator (TMLE) based on an initial estimator of the nuisance parameters defined by a loss-based super-learner that uses cross-validation to combine a library of candidate estimators. We enforce this library to contain minimum loss based estimators minimizing the empirical risk over the parameter space under the additional constraint that the variation norm is bounded by a set constant, across a set of constants for which the maximal constant converges to infinity with sample size. We show that this super-learner is not only asymptotically equivalent with the best performing algorithm in the library, but also that it always converges to the true nuisance parameter values at a rate faster than $n^{-1/4}$. This minimal rate applies to each dimension of the data and even to nonparametric statistical models. We also demonstrate that the implementation of these constant-specific minimum loss-based estimators can be carried out by minimizing the empirical risk over linear combinations of basis functions under the constraint that the sum of the absolute value of the coefficients is smaller than the constant (e.g., Lasso regression), making our proposed estimators practically feasible.

Based on this rate of the super-learner of the nuisance parameter, we can establish that this one-step TMLE is asymptotically efficient at any data generating distribution in the model, under very weak structural conditions on the target parameter mapping and model. We demonstrate our general theorems by constructing such a one-step TMLE of the average causal effect in a nonparametric model, and presenting the corresponding efficiency theorem.



Included in

Biostatistics Commons