Binary classifications problems are ubiquitous in health and social science applications. In many cases, one wishes to balance two conflicting criteria for an optimal binary classifier. For instance, in resource-limited settings, an HIV prevention program based on offering Pre-Exposure Prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program to deliver. In this article, we consider a general class of performance-constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold function. These include the minimization of the Rate of Positive Predictions subject to a lower bound on the sensitivity, and vice versa, and the Neyman-Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner algorithm, characterized by weights combining the constituent risk prediction algorithms and a discriminating risk threshold for classification that aim to minimize the given constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individual PrEP targeting strategy in a resource-limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study.


Biostatistics | Other Statistics and Probability