Abstract
High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
Suggested Citation
Ghosh, Debashis and Chinnaiyan, Arul, "Classification and selection of biomarkers in genomic data using LASSO" (June 2004). The University of Michigan Department of Biostatistics Working Paper Series. Working Paper 42.
https://biostats.bepress.com/umichbiostat/paper42