"Combining Predictors for Classification Using the Area Under the ROC C" by Margaret S. Pepe, Tianxi Cai et al.

UW Biostatistics Working Paper Series

Title

Combining Predictors for Classification Using the Area Under the ROC Curve

Authors

Margaret S. Pepe, University of WashingtonFollow
Tianxi Cai, Harvard UniversityFollow
Zheng Zhang, University of WashingtonFollow

Comments

Note: This Working Paper is a revised version of the previously posted "Robust Binary Regression for Optimally Combining Predictors."

Abstract

We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in the following sense: it yields prediction performance comparable with our theoretically robust procedure when the logistic model fails and even when the form of the linear predictor is incorrectly specified. This raises some intriguing questions about using logistic regression for prediction. Some preliminary explanations are given that draw from recent literature.

Next we suggest that it may not be necessary to fit the linear function over the whole predictor space to achieve adequate classification properties. Procedures that restrict modeling to a subspace defined by minimally acceptable false-positive and false-negative error rates are suggested. We find that relaxing linearity assumptions to a subspace infers further robustness and that the logistic likelihood calculated over the restricted region provides a robust objective function for determining classification rules.

Overall, our new procedure performs well but not substantially better than logistic regression. Further work is warranted to clarify the relationship between the two conceptually distinct procedures, and may provide a new conceptual basis for using the logistic likelihood to combine predictors.

Note: This Working Paper is a revised version of the previously posted "Robust Binary Regression for Optimally Combining Predictors."

Disciplines

Clinical Epidemiology | Epidemiology | Multivariate Analysis | Statistical Models

Suggested Citation

Pepe, Margaret S.; Cai, Tianxi; and Zhang, Zheng, "Combining Predictors for Classification Using the Area Under the ROC Curve" (June 2004). UW Biostatistics Working Paper Series. Working Paper 198.
https://biostats.bepress.com/uwbiostat/paper198

Previous Versions

April 18, 2003

Download

Included in

Clinical Epidemiology Commons, Epidemiology Commons, Multivariate Analysis Commons, Statistical Models Commons

COinS

Collection of Biostatistics Research Archive

UW Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Previous Versions

Included in

Browse

Search

Author Corner

UW Biostatistics

Collection of Biostatistics Research Archive

UW Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Previous Versions

Included in

Share

Browse

Search

Author Corner

UW Biostatistics