In order to answer scientific questions of interest one often carries out an ordered sequence of experiments generating the appropriate data over time. The design of each experiment involves making various decisions such as 1) What variables to measure on the randomly sampled experimental unit?, 2) How regularly to monitor the unit, and for how long?, 3) How to randomly assign a treatment or drug-dose to the unit?, among others. That is, the design of each experiment involves selecting a so called treatment mechanism/monitoring mechanism/ missingness/censoring mechanism, where these mechanisms represent a formally defined conditional distribution of one of these actions (i.e., assignment of treatment/monitoring indicator/missingness indicators/ right censoring indicators), given observed data characteristics on the unit. The choice of these design mechanisms are typically made a priori, and, it is common that during the course of the ordered sequence of experiments the observed data suggests that the chosen design is ineffective in answering the scientific question of interest, or is dissatisfying from other perspectives, and that a much better design (i.e., choice of mechanisms) should have been selected. This naturally raises the question: Why not learn a user supplied optimal unknown choice of the controlled components of the design based on the data collected in the previously initiated experiments, and thereby adjust/adapt these controlled components of the design for future experiments during the course of the study? Although, certain basic types of so called ”response adaptive designs” in clinical trials have been proposed and studied from a frequentist perspective (Hu and Rosenberger (2006)), allowing treatment randomization probabilities to be a function of outcomes collected in previous experiments, by far most designs in practice are static and most of the adaptive design literature has focussed on adaptive stopping times based on sequential testing or other adaptive stopping rules. In spite of the results on response adaptive clinical trial design as presented in Hu and Rosenberger (2006), among most practitioners there seems to be a widely accepted consensus that for formal frequentist statistical inference changing the design based on a look at the data in a clinical trial should be avoided even if it is not used for testing.

We present a general statistical framework which allows us to study adaptive designs and estimators based on data generated by these adaptive designs from a frequentist perspective in great generality. For each experimental unit we define a full data random variable and it is assumed that they are identically and independently distributed. For example, we can define the full data as the collection of setting-specific data structures which represents the data one would have observed on the unit if one had applied these particular settings in the design of this experiment, across all settings. In addition, one defines the observed data structure on an experimental unit as a specified many to one mapping of a choice of design setting and the full data random variable: this defines the observed data structure as a censored/missing data structure. The design settings (i.e., censoring variables) for experiment i are drawn from a conditional distribution, given the full data for the i-th unit, which satisfies the coarsening at random assumption (van der Laan and Robins (2003)) for the i-th censored data experiment. The choice of the conditional distribution of the design settings for the i-th experiment can be fully informed by the observed data collected in the previous i − 1 experiments, and any external data sources. We refer to the collection of these i-specific design mechanisms as the adaptive design. In particular, we define and provide a template for constructing targeted adaptive designs, which aim to learn a particular unknown optimal fixed design from the incoming data during the trial. In particular, we propose easy to implement influence curve based targeted adaptive designs. We provide a variety of examples of such targeted adaptive designs targeting an optimal fixed design such as the fixed design maximizing asymptotic efficiency for a treatment effect in a clinical trial among all fixed designs.

Within this statistical framework we prove consistency and asymptotic linearity and corresponding normality results for the maximum likelihood estimator according to a correctly specified parametric model. We present new double robust targeted maximum likelihood estimators for semi-parametric models which are consistent if one either correctly specified a lower dimensional model for the common distribution or if one correctly specifies the design mechanisms, where the latter is always true in a controlled adaptive designs in which the selected design mechanisms are known. These targeted maximum likelihood estimators for adaptive designs generalize the targeted maximum likelihood estimator for independent experiments introduced and developed in (van der Laan and Rubin (2006)). We also propose a new class of relatively easy to implement (double robust) iterative inverse probability of censoring weighted reduced data targeted maximum likelihood estimators. Finally, we present estimators based on Martingale estimating functions generalizing estimating equation methodology for i.i.d. censored data structures as fully presented in (van der Laan and Robins (2003)). Our generalization martingale estimating function methodology includes Inverse Probability of Censoring Weighted Reduced Data martingale estimating functions, which represents a new approach (also for i.i.d. data) in which estimating functions are decomposed as an orthogonal sum and the inverse probability of censoring (IPC) weighting is applied to each component, thereby achieving additional robustness not obtained with standard IPC-weighting.

Our results show that one can learn an unknown user supplied definition of an optimal target fixed design during the course of the study, and thereby adapt/improve the design at any point in time based on the available data, and that statistical inference based on a normal limit distribution is still readily available. We illustrate the theory and resulting methods with various examples of practical interest.

In addition, we present a targeted empirical Bayesian learning methodology which allows one to specify a prior on the target parameter of interest, and it maps it into a posterior distribution, where the center and spread corresponds with the frequentist targeted maximum likelihood estimator. We also show how adaptive designs and sequential testing procedures can be combined.

The general contributions of this article can be listed as 1) general definition and practical constructions of adaptive, and, in particular, targeted adaptive group sequential designs targeting a particular user supplied definition of optimal fixed design, 2) presentation of a variety of possible design adaptations of great practical interest for which our theory applies, 3) presentation of maximum likelihood estimators, new robust (iterative) targeted maximum likelihood estimators, new (iterative) inverse probability of censoring weighted reduced data targeted maximum likelihood estimators, and estimators defined as solutions of Martingale estimating equations, based on the data collected in these general targeted adaptive designs, 4) establishing that the targeted adaptive designs asymptotically converge to the wished optimal unknown design (i.e., we can learn the optimal design), 5) presentation of formal statistical inference for the scientific parameter of interest under general adaptive designs based on the above mentioned estimation methodologies, which shows, in particular, that the asymptotic efficiency of the targeted maximum likelihood estimator under targeted adaptive designs equals the asymptotic efficiency of the estimator under i.i.d. sampling from the unknown targeted optimal unknown fixed design, as learned during the study, 6) a new targeted empirical Bayesian learning methodology mapping a prior on parameter of interest into its posterior while enjoying the frequentist robust and efficiency properties of the targeted MLE in large semi-parametric models, and 7) sequential testing methods in general adaptive designs controlling the Type-I error at level alpha. In addition, we illustrate the results for a variety of examples of interest.


Statistical Methodology | Statistical Theory