Abstract
In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a large amount of computation time. Thus, when the processes of obtaining a single estimate for cross-validated AUC is significant, the bootstrap, as a means of variance estimation, can be computationally intractable. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
Disciplines
Biostatistics
Suggested Citation
LeDell, Erin; Petersen, Maya L.; and van der Laan, Mark J., "Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates" (December 2012). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 304.
https://biostats.bepress.com/ucbbiostat/paper304