Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Given a model for the data generating distribution, assume that the parameter of interest can be characterized as the parameter value which makes the population mean of a possibly infinite dimensional estimating function equal to zero. Given a collection of candidate estimators of this parameter, and specification of the vector estimating function, we propose cross-validation criteria for selecting among these estimators. This cross-validation criteria is defined as the Euclidean norm of the empirical mean over the validation sample of the estimating function at the candidate estimator based on the training sample. We establish a finite sample inequality of this method relative to an oracle selector, and illustrate it with some examples. This finite sample inequality provides us with asymptotic equivalence of the selector with the oracle selector under general conditions. We also study the performance of this method in the case that the parameter of interest itself is path-wise differentiable (and thus, in principle, root-$n$ estimable), and show that the cross-validated selected estimator is typically efficient, and, at certain data generating distributions, superefficient (and thus non-regular). Finally, we combine 1) the selection of sequence of subspaces of the parameter space (i.e., a sieve), 2) the estimating equation as empirical criteria to generate a candidate estimator for each subspace, and 3) estimating function based cross-validation selector to select among the candidate estimators, in order to provide a new unified estimating function based methodology. In particular, we formally establish a finite sample inequality for this general estimator in the case that one uses epsilon-nets as sieve, and point out that this finite sample inequality corresponds with minimax adaptive rates of convergence w.r.t. to the norm implied by the estimating function.



Included in

Biostatistics Commons