In statistical medicine comparing the predictability or ﬁt of two models can help to determine whether a set of prognostic variables contains additional information about medical outcomes, or whether one of two different model ﬁts (perhaps based on different algorithms, or different set of variables) should be preferred for clinical use. Clinical medicine has tended to rely on comparisons of clinical metrics like C-statistics and more recently reclassiﬁcation. Such metrics rely on the outcome being categorical and utilize a speciﬁc and often obscure loss function. In classical statistics one can use likelihood ratio tests and information based criterion if the comparisons allow for it. However, for many data adaptive models such approaches are not suitable and people have traditionally used cross-validation to choose between models in such settings. In this paper we propose a test that focuses on the “conditional” risk differences (conditional on the models being ﬁxed) for the improvement in prediction risk, which is valid under cross-validation. We derive Wald-type test statistics and conﬁdence intervals for cross-validated test sets utilizing the independent validation within cross-validation in conjunction with a test for multiple comparisons. We show that this test maintains proper Type I Error under the null ﬁt, and can be used as a general test of relative ﬁt for any semi-parametric model alternative, using most any loss function. We apply the test to a candidate gene study to test for the association of a set of genes in a genetic pathway.
Biostatistics | Statistical Theory
Goldstein, Benjamin A.; Polley, Eric; Briggs, Farren; and van der Laan, Mark J., "Testing the Relative Performance of Data Adaptive Prediction Algorithms: A Generalized Test of Conditional Risk Differences" (July 2013). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 316.