Published 2004 as "The deletion/substitution/addition algorithm in loss function based estimation" in Journal of Statistical Methods in Molecular Biology 3(1), article 18.


In van der Laan and Dudoit (2003) we propose and theoretically study a unified loss function based statistical methodology, which provides a road map for estimation and performance assessment. Given a parameter of interest which can be described as the minimizer of the population mean of a loss function, the road map involves as important ingredients cross-validation for estimator selection and minimizing over subsets of basis functions the empirical risk of the subset-specific estimator of the parameter of interest, where the basis functions correspond to a parameterization of a specified subspace of the complete parameter space. In this article we first review this approach. Then we propose a general deletion/substitution/addition algorithm for minimizing over subsets of variables (e.g., basis functions) the empirical risk of subset-specific estimators of the parameter of interest. In particular, in the regression context, this algorithm corresponds to minimizing over subsets of variables the sum of squared residuals of the subset-specific linear regression estimator. This algorithm provides us with a new class of loss-based cross-validated algorithms in prediction of univariate and multivariate outcomes, conditional density and hazard estimation, and we generalize it to censored outcomes such as survival. In the context of regression, using polynomial basis functions, we study the properties of the deletion/substitution/addition algorithm in simulations and apply the method to detect binding sites in yeast gene expression experiments.


Numerical Analysis and Computation | Statistical Methodology | Statistical Models | Statistical Theory | Survival Analysis