Medical advances continue to provide new and potentially better means for detecting disease. Such is true in cancer, for example, where biomarkers are sought for early detection and where improvements in imaging methods may pick up the initial functional and molecular changes associated with cancer development. In other binary classification tasks, computational algorithms such as Neural Networks, Support Vector Machines and Evolutionary Algorithms have been applied to areas as diverse as credit scoring, object recognition, and peptide-binding prediction. Before a classifier becomes an accepted technology, it must undergo rigorous evaluation to determine its ability to discriminate between states. Characterization of factors influencing classier performance is an important step in this process. Analysis of covariates may reveal sub-populations in which classifier performance is greatest or identify features of the classifier that improve accuracy.
We develop regression methods for the non-parametric area under the ROC curve, a well-accepted summary measure of classifier accuracy. The estimating function generalizes standard approaches, and, interestingly, is related to the two-sample Mann-Whitney U-statistic. Implementation is straightforward as it is an adaptation of binary regression methods. Asymptotic theory is non-standard because the regressor variables are cross-correlated. Nevertheless, simulation studies show the method produces estimates with small bias and reasonable coverage probability. Application of the method to evaluate the covariate effects on a new device for diagnosing hearing impairment reveals that the device performs better in the more severely impaired subjects and that certain test parameters, which are adjustable by the device operator, are key to test performance.
Clinical Epidemiology | Multivariate Analysis | Statistical Methodology | Statistical Models | Statistical Theory
Dodd, Lori E. and Pepe, Margaret S., "Semi-parametric Regression for the Area Under the Receiver Operating Characteristic Curve" (January 2003). UW Biostatistics Working Paper Series. Working Paper 186.