"Practical Targeted Learning from Large Data Sets by Survey Sampling" by Patrice Bertail, Antoine Chambaz et al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Practical Targeted Learning from Large Data Sets by Survey Sampling

Authors

Patrice Bertail, Modal'X, Université Paris Ouest Nanterre
Antoine Chambaz, Modal'X, Université Paris Ouest Nanterre and Division of Biostatistics, University of California, BerkeleyFollow
Emilien Joly, Modal'X, Université Paris Ouest Nanterre

Abstract

We address the practical construction of asymptotic confidence intervals for smooth (i.e., pathwise differentiable), real-valued statistical
parameters by targeted learning from independent and identically
distributed data in contexts where sample size is so large that it poses
computational challenges. We observe some summary measure of all data and select a sub-sample from the complete data set by Poisson rejective sampling with unequal inclusion probabilities based on the summary measures. Targeted learning is carried out from the easier to handle sub-sample. We derive a central limit theorem for the targeted minimum loss estimator (TMLE) which enables the construction of the confidence intervals. The inclusion probabilities can be optimized to reduce the asymptotic variance of the TMLE. We illustrate the procedure with two examples where the parameters of interest are variable importance measures of an exposure (binary or continuous) on an outcome. We also conduct a simulation study and comment on its results.

Disciplines

Biostatistics

Suggested Citation

Bertail, Patrice; Chambaz, Antoine; and Joly, Emilien, "Practical Targeted Learning from Large Data Sets by Survey Sampling" (July 2016). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 353.
https://biostats.bepress.com/ucbbiostat/paper353

Download

Included in

Biostatistics Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics