Methods: We compare the performance of three estimators: the unadjusted estimator (which ignores baseline variables), and two adjusted estimators called the standardized and logistic regression estimators (which leverage information in baseline variables). Our comparisons are based on re-analyzing data from a phase 3 trial for treating severe stroke, called the CLEAR III trial, and by a simulation study that mimics features from that dataset.

Results: Re-analysis of the CLEAR III data shows that confidence intervals from the standardized estimator are 10% narrower than when the unadjusted estimator is used. In the simulations the standardized estimator requires 29% less sample size to achieve the same power as the unadjusted estimator. The simulations also show that the standardized estimator has slightly better precision compared to the logistic regression estimator.

]]>Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal treatment rule. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the construction of confidence intervals on both mean rewards under the current estimate of the optimal treatment rule and under the optimal treatment rule itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference.

As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. Seen as two additional data-adaptive statistical parameters, they compare the sum of the rewards actually received during the course of the experiment with, either the sum of the means of the rewards, or the counterfactual rewards we would have obtained if we had used from the start the current estimate of the optimal treatment rule to assign treatment.

A simulation study illustrates the procedure. One of the cornerstones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral. ]]>

Methods: We compared exact *P*-values, valid by definition, with normal and logit-normal approximations in a simulated study of 40 cases and 160 controls. The key measure of biomarker performance was sensitivity at 90% specificity. Data for 3000 uninformative markers and 30 true markers were generated randomly, with 10 replications of the simulation. We also analyzed real data on 2371 antibody array markers measured in plasma from 121 cases with ER/PR positive breast cancer and 121 controls.

Results: Using the same discovery criterion, the valid exact *P*-values lead to discovery of 24 true and 82 false biomarkers while approximate *P*-values yielded 15 true and 15 false biomarkers (normal approximation) and 20 true and 86 false biomarkers (logit-normal approximation). Moreover, the estimated numbers of true markers among those discovered were substantially incorrect for approximate *P*-values: normal estimated 0 true markers discovered but found 15; logit-normal estimated 42 but found 20. The exact method estimated 22, close to the actual number of 24 true discoveries. With real data, exact and approximate *P*-values ranked candidate breast cancer biomarkers very differently.

Conclusions: Exact *P*-values should be used because they are universally valid. Approximate *P*-values can lead to inappropriate biomarker selection rules and incorrect conclusions.

Impact: Rigorous data analysis methodology in discovery research may improve the yield of biomarkers that validate clinically.

]]>