"Targeted Methods for Biomarker Discovery, the Search for a Standard" by Catherine Tuglus and Mark J. van der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Targeted Methods for Biomarker Discovery, the Search for a Standard

Authors

Catherine Tuglus, Division of Biostatistics, University of California, BerkeleyFollow
Mark J. van der Laan, Division of Biostatistics and Department of Statistics, University of California, BerkeleyFollow

Abstract

More often than not biomarker studies analyze large quantities of variables with complicated and generally unknown correlation structure. There are numerous statistical methods which attempt to unravel these variables and determine the underlying mechanism through identification of causally related biomarkers. Results from these methods are generally difficult to interpret and nearly impossible to compare across studies. The FDA has currently called for a standardization of methods and protocol for biomarker detection. In response, we propose targeted variable importance (tVIM) as a standardized method for biomarker discovery. Through the use of targeted Maximum Likelihood, tVIM provides double robust estimates of variable importance along with formal inference. These measures are biologically interpretable as a causal effect under specified conditions, allowing for reproducibility across populations. In this analysis we compare tVIM to four different measures of importance provided by three different statistical methods: univariate linear regression (LM), LASSO penalized multiple regression (Q), and two importance measures from randomForest (RF1 and RF2). Their performance is compared in simulation under conditions of increasing correlation. We are interested in their ability to distinguish "true" relevant biomarkers from correlated decoy biomarkers. The comparisons are based on the resulting ranked variable list for each method using the importance measures and p-values when available. In simulation, tVIM coupled with a data-adaptive model selection method outperforms linear regression, LASSO, and randomForest and is more resilient to increases in correlation. In application we apply all methods to the Golub et al 1999 Leukemia data and compare the resulting gene lists based on biological relevance. Both LM and tVIM are also applied to the van't Veer breast cancer data. We compare them based on the top 10 most important genes. From these results, tVIM appears to rank more biologically relevant genes at the top its list than the other methods. Given extreme correlations, methods to reduce bias and provide realistic gene lists are also discussed.

Disciplines

Biostatistics | Clinical Trials | Epidemiology | Microarrays

Suggested Citation

Tuglus, Catherine and van der Laan, Mark J., "Targeted Methods for Biomarker Discovery, the Search for a Standard" (March 2008). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 233.
https://biostats.bepress.com/ucbbiostat/paper233

Download

Included in

Biostatistics Commons, Clinical Trials Commons, Epidemiology Commons, Microarrays Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics