We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in the following sense: it yields prediction performance comparable with our theoretically robust procedure when the logistic model fails and even when the form of the linear predictor is incorrectly specified. This raises some intriguing questions about using logistic regression for prediction. Some preliminary explanations are given that draw from recent literature.
Next we suggest that it may not be necessary to fit the linear function over the whole predictor space to achieve adequate classification properties. Procedures that restrict modeling to a subspace defined by minimally acceptable false-positive and false-negative error rates are suggested. We find that relaxing linearity assumptions to a subspace infers further robustness and that the logistic likelihood calculated over the restricted region provides a robust objective function for determining classification rules.
Overall, our new procedure performs well but not substantially better than logistic regression. Further work is warranted to clarify the relationship between the two conceptually distinct procedures, and may provide a new conceptual basis for using the logistic likelihood to combine predictors.
Note: This Working Paper is a revised version of the previously posted "Robust Binary Regression for Optimally Combining Predictors."
Clinical Epidemiology | Epidemiology | Multivariate Analysis | Statistical Models
Pepe, Margaret S.; Cai, Tianxi; and Zhang, Zheng, "Combining Predictors for Classification Using the Area Under the ROC Curve" (June 2004). UW Biostatistics Working Paper Series. Working Paper 198.