Optimal Sample Size for Multiple Testing: The Case of Gene Expression Microarrays

Peter Müller, Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center
Giovanni Parmigiani, The Sydney Kimmel Comprehensive Cancer Center, Johns Hopkins University
Christian Robert, CEREMADE, Université Paris Dauphine, and CREST, INSEE, France
Judith Rousseau, Université Rene Descartes, Paris, and CREST, INSEE, France

Current version of this paper can now be downloaded at http://www.bepress.com/jhubiostat/paper31.


We consider the choice of an optimal sample size for multiple comparison problems. The motivating application is the choice of the number of microarray experiments to be carried out when learning about differential gene expression. However, the approach is valid in any application that involves multiple comparison in a large number of hypothesis tests.

We discuss two decision problems in the context of this setup, the sample size selection and the decision about the multiple comparisons. The focus of the discussion is on the sample size selection. For the multiple comparison we assume an approach as in Genovese and Wasserman (2002), based on controlling posterior expected false discovery rate (FDR). For the sample size selection we adopt a decision theoretic solution, using expected false negative rate (FNR) as decision criterion, combined with a power analysis as sensitivity diagnostic. Posterior expected FDR and marginal FNR are computed with respect to an assumed parametric probability model. In our implementation we use a version of the model proposed in Newton et al. (2001). But the discussion is independent of the chosen probability model. The approach is valid for any model that includes positive prior probabilities for the null hypotheses in the multiple comparisons, and that allows efficient marginal and posterior simulation. Posterior and marginal simulation can be done by dependent Markov chain Monte Carlo simulation.