Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal treatment rule. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the construction of confidence intervals on both mean rewards under the current estimate of the optimal treatment rule and under the optimal treatment rule itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference.

As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. Seen as two additional data-adaptive statistical parameters, they compare the sum of the rewards actually received during the course of the experiment with, either the sum of the means of the rewards, or the counterfactual rewards we would have obtained if we had used from the start the current estimate of the optimal treatment rule to assign treatment.

A simulation study illustrates the procedure. One of the cornerstones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral. ]]>

Methods: We compared exact *P*-values, valid by definition, with normal and logit-normal approximations in a simulated study of 40 cases and 160 controls. The key measure of biomarker performance was sensitivity at 90% specificity. Data for 3000 uninformative markers and 30 true markers were generated randomly, with 10 replications of the simulation. We also analyzed real data on 2371 antibody array markers measured in plasma from 121 cases with ER/PR positive breast cancer and 121 controls.

Results: Using the same discovery criterion, the valid exact *P*-values lead to discovery of 24 true and 82 false biomarkers while approximate *P*-values yielded 15 true and 15 false biomarkers (normal approximation) and 20 true and 86 false biomarkers (logit-normal approximation). Moreover, the estimated numbers of true markers among those discovered were substantially incorrect for approximate *P*-values: normal estimated 0 true markers discovered but found 15; logit-normal estimated 42 but found 20. The exact method estimated 22, close to the actual number of 24 true discoveries. With real data, exact and approximate *P*-values ranked candidate breast cancer biomarkers very differently.

Conclusions: Exact *P*-values should be used because they are universally valid. Approximate *P*-values can lead to inappropriate biomarker selection rules and incorrect conclusions.

Impact: Rigorous data analysis methodology in discovery research may improve the yield of biomarkers that validate clinically.

]]>enrollment in the program. Targeted minimum loss-based estimation was used to estimate the mean outcome, while Super Learning was implemented to estimate the required nuisance parameters. Analyses were conducted with the ltmle R package; analysis code is available at an online repository as an R package. Results showed that at 450 days, the probability of in-care survival for subjects with immediate availability and enrollment was 0:93 (95% CI: 0.91, 0.95) and 0:87 (95% CI: 0.86, 0.87) for subjects with immediate availability never enrolling. For subjects without LREC availability, it was 0:91 (95% CI: 0.90, 0.92). Immediate program availability without individual

enrollment, compared to no program availability, was estimated to slightly albeit significantly decrease survival by 4% (95% CI 0.03,0.06, p< 0:01). Immediately availability and enrollment resulted in a 7% higher in-care survival compared to immediate availability with non-enrollment after 450 days (95% CI -0.08,-0.05, p< 0:01). The results are consistent with a fairly small impact of both availability and enrollment in the LREC program on in-care survival. ]]>

**Methods**: We consider cluster randomized trials with staggered enrollment, in each of which the order of enrollment is based on the total number of ties (contacts) from individuals within a cluster to individuals in other clusters. These designs can accommodate connectivity based either on the total number of inter-cluster connections at baseline or on connections only to untreated clusters, and include options analogous both to traditional Parallel and Stepped Wedge designs. We further allow for control clusters to be “held-back” from re-randomization for some period. We investigate the performance of these designs in terms of epidemic control (time to end of epidemic and cumulative incidence) and power to detect vaccine effect by simulating vaccination trials during an SEIR-type epidemic outbreak using a network-structured agent-based model.

**Results**: In our simulations, connectivity-informed designs lead to lower peak infectiousness than comparable traditional study designs and a 20% reduction in cumulative incidence, but have little impact on epidemic length. Power to detect differences in incidence across clusters is reduced in all connectivity-informed designs. However the inclusion of even a brief “holdback” restores most of the power lost in comparison to a traditional Stepped Wedge approach.

**Conclusions**: Incorporating information about cluster connectivity in design of cluster randomized trials can increase their public health impact, especially in acute outbreak settings. Using this information helps control outbreaks – by minimizing the number of cross-cluster infections – with modest cost in power to detect an effective intervention.