"Loss-Based Estimation with Cross-Validation: Applications to Microarra" by Sandrine Dudoit, Mark J. van der Laan et al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding

Authors

Sandrine Dudoit, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Sunduz Keles, Division of Biostatistics, School of Public Health, University of California, Berkeley
Annette M. Molinaro, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Sandra E. Sinisi, Division of Biostatistics, School of Public Health, University of California, Berkeley
Siew Leng Teng, Division of Biostatistics, School of Public Health, University of California, Berkeley

Comments

Published 2005 in G. Piatetsky-Shapiro and P. Tamayo (eds.), Microarray Data Mining, Special Issue of SIGKDD Explorations, Vol. 5, No. 2, p. 56-68.

Abstract

Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable loss function and candidate estimators are generated using this (or possibly another) loss function. Cross-validation is applied to select an optimal estimator among the candidates and to assess the overall performance of the resulting estimator. This general estimation framework encompasses a number of problems which have traditionally been treated separately in the statistical literature, including multivariate outcome prediction and density estimation based on either uncensored or censored data. This article provides an overview of the methodology and describes its application to two problems in genomic data analysis: the prediction of biological and clinical outcomes (possibly censored) using microarray gene expression measures and the identification of regulatory motifs (i.e., transcription factor binding sites) in DNA sequences.

Disciplines

Suggested Citation

Dudoit, Sandrine; van der Laan, Mark J.; Keles, Sunduz; Molinaro, Annette M.; Sinisi, Sandra E.; and Teng, Siew Leng, "Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding" (December 2003). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 137.
https://biostats.bepress.com/ucbbiostat/paper137

Download

Included in

Genetics Commons, Microarrays Commons, Multivariate Analysis Commons, Statistical Methodology Commons, Statistical Theory Commons, Survival Analysis Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics