"Resampling-based Multiple Testing: Asymptotic Control of Type I Error " by Katherine S. Pollard and Mark J. van der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data

Authors

Katherine S. Pollard, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow
Mark J. van der Laan, Division of Biostatistics, School of Public Health, University of California, BerkeleyFollow

Comments

Published 2005 in J. Statistical Planning and Inference, 125, pp. 85-100.

Abstract

We define a general statistical framework for multiple hypothesis testing and show that the correct null distribution for the test statistics is obtained by projecting the true distribution of the test statistics onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator. This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We prove that this bootstrap estimated null distribution provides asymptotic control of most type I error rates. We show that obtaining a test statistic null distribution from a data null distribution, e.g. projecting the data generating distribution onto the space of all distributions satisfying the complete null), only provides the correct test statistic null distribution if the covariance of the vector influence curve is the same under the data null distribution as under the true data distribution. This condition is a weak version of the subset pivotality condition. We show that our multiple testing methodology controlling type I error is equivalent to constructing an error-specific confidence region for the true parameter and checking if it contains the hypothesized value. We also study the two sample problem and show that the permutation distribution produces an asymptotically correct null distribution if (i) the sample sizes are equal or (ii) the populations have the same covariance structure. We include a discussion of the application of multiple testing to gene expression data, where the dimension typically far exceeds the sample size. An analysis of a cancer gene expression data set illustrates the methodology.

Disciplines

Numerical Analysis and Computation | Statistical Methodology | Statistical Theory

Suggested Citation

Pollard, Katherine S. and van der Laan, Mark J., "Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data" (June 2003). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 121.
https://biostats.bepress.com/ucbbiostat/paper121

Previous Versions

December 02, 2002

Download

Included in

Numerical Analysis and Computation Commons, Statistical Methodology Commons, Statistical Theory Commons

COinS

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Previous Versions

Included in

Browse

Search

Author Corner

UCB Biostatistics

Collection of Biostatistics Research Archive

U.C. Berkeley Division of Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Previous Versions

Included in

Share

Browse

Search

Author Corner

UCB Biostatistics