Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space typically results in ill-defined or too variable estimators of the parameter of interest (i.e., the risk minimizer for the true data generating distribution). In this article, we propose a cross-validated epsilon-net estimation methodology that covers a broad class of estimation problems, including multivariate outcome prediction and multivariate density estimation. An epsilon-net sieve of a subspace of the parameter space is defined as a collection of finite sets of points, the epsilon-nets indexed by epsilon, which approximate the subspace up till a resolution of epsilon. Given a collection of subspaces of the parameter space, one constructs an epsilon-net sieve for each of the subspaces. For each choice of subspace and each value of the resolution epsilon, one defines a candidate estimator as the minimizer of the empirical risk over the corresponding epsilon-net. The cross-validated epsilon-net estimator is then defined as the candidate estimator corresponding to the choice of subspace and epsilon-value minimizing the cross-validated empirical risk. We derive a finite sample inequality which proves that the proposed estimator achieves the adaptive optimal minimax rate of convergence, where the adaptivity is achieved by considering epsilon-net sieves for various subspaces. We also address the implementation of the cross-validated epsilon-net estimation procedure. In the context of a linear regression model, we present results of a preliminary simulation study comparing the cross-validated epsilon-net estimator to the cross-validated L^1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS). Finally, we discuss generalizations of the proposed estimation methodology to censored data structures.
Statistical Methodology | Statistical Theory | Survival Analysis
van der Laan, Mark J.; Dudoit, Sandrine; and van der Vaart, Aad W., "The Cross-Validated Adaptive Epsilon-Net Estimator" (February 2004). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 142.