"Estimating Function Based Cross-Validation and Learning" by Mark J. van der Laan and Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Estimating Function Based Cross-Validation and Learning

Authors

Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Daniel Rubin, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow

Abstract

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Given a model for the data generating distribution, assume that the parameter of interest can be characterized as the parameter value which makes the population mean of a possibly infinite dimensional estimating function equal to zero. Given a collection of candidate estimators of this parameter, and specification of the vector estimating function, we propose cross-validation criteria for selecting among these estimators. This cross-validation criteria is defined as the Euclidean norm of the empirical mean over the validation sample of the estimating function at the candidate estimator based on the training sample. We establish a finite sample inequality of this method relative to an oracle selector, and illustrate it with some examples. This finite sample inequality provides us with asymptotic equivalence of the selector with the oracle selector under general conditions. We also study the performance of this method in the case that the parameter of interest itself is path-wise differentiable (and thus, in principle, root-$n$ estimable), and show that the cross-validated selected estimator is typically efficient, and, at certain data generating distributions, superefficient (and thus non-regular). Finally, we combine 1) the selection of sequence of subspaces of the parameter space (i.e., a sieve), 2) the estimating equation as empirical criteria to generate a candidate estimator for each subspace, and 3) estimating function based cross-validation selector to select among the candidate estimators, in order to provide a new unified estimating function based methodology. In particular, we formally establish a finite sample inequality for this general estimator in the case that one uses epsilon-nets as sieve, and point out that this finite sample inequality corresponds with minimax adaptive rates of convergence w.r.t. to the norm implied by the estimating function.

Disciplines

Biostatistics

Suggested Citation

van der Laan, Mark J. and Rubin, Daniel, "Estimating Function Based Cross-Validation and Learning" (May 2005). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 180.
https://biostats.bepress.com/ucbbiostat/paper180

Download

Included in

Biostatistics Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics