Latent supervised learning is a machine learning technique for performing binary classification using a surrogate variable for the unobserved training label. We extend latent supervised learning to the case when the surrogate variable is a right-censored survival time. A motivating application for the proposed methodology is to stratify patients into two risk groups given a set of biomarkers. Sieve maximum likelihood estimation is employed for model estimation with special care taken to account for censoring. Consistency of the proposed estimator is established. Simulations show that the proposed estimator is accurate under a range of settings. Applications to real data examples demonstrate its advantages over a competing method; the proposed method produces more significant separation in survival on both training sets and held-out independent test sets.



Included in

Biostatistics Commons