In statistical medicine comparing the predictability or fit of two models can help to determine whether a set of prognostic variables contains additional information about medical outcomes, or whether one of two different model fits (perhaps based on different algorithms, or different set of variables) should be preferred for clinical use. Clinical medicine has tended to rely on comparisons of clinical metrics like C-statistics and more recently reclassification. Such metrics rely on the outcome being categorical and utilize a specific and often obscure loss function. In classical statistics one can use likelihood ratio tests and information based criterion if the comparisons allow for it. However, for many data adaptive models such approaches are not suitable and people have traditionally used cross-validation to choose between models in such settings. In this paper we propose a test that focuses on the “conditional” risk differences (conditional on the models being fixed) for the improvement in prediction risk, which is valid under cross-validation. We derive Wald-type test statistics and confidence intervals for cross-validated test sets utilizing the independent validation within cross-validation in conjunction with a test for multiple comparisons. We show that this test maintains proper Type I Error under the null fit, and can be used as a general test of relative fit for any semi-parametric model alternative, using most any loss function. We apply the test to a candidate gene study to test for the association of a set of genes in a genetic pathway.


Biostatistics | Statistical Theory