Published in Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer, 2005.


This chapter proposes widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; van der Laan et al., 2004a,b; Pollard and van der Laan, 2004; Pollard et al., 2005). Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of Type I errors, V_n, and rejected hypotheses, R_n. These error rates include: the generalized family-wise error rate, gFWER(k) = Pr(V_n > k), or chance of at least (k+1) false positives (the special case k=0 corresponds to the usual family-wise error rate, FWER), and tail probabilities for the proportion of false positives among the rejected hypotheses, TPPFP(q) = Pr(V_n/R_n > q). Single-step and step-down common-cut-off (maxT) and common-quantile (minP) procedures, that take into account the joint distribution of the test statistics, are proposed to control the FWER. In addition, augmentation multiple testing procedures are provided to control the gFWER and TPPFP, based on any initial FWER-controlling procedure. The results of a multiple testing procedure can be summarized using rejection regions for the test statistics, confidence regions for the parameters of interest, or adjusted p-values. A key ingredient of our proposed MTPs is the test statistics null distribution (and consistent bootstrap estimator thereof) used to derive rejection regions and corresponding confidence regions and adjusted p-values. This chapter illustrates an implementation in SAS (Version 9) of the bootstrap-based single-step maxT procedure and of the gFWER- and TPPFP-controlling augmentation procedures. These multiple testing procedures are applied to an HIV-1 sequence dataset to identify codon positions associated with viral replication capacity.


Numerical Analysis and Computation | Statistical Methodology | Statistical Theory