Abstract
Suppose we have a data set of n-observations where the extent of dependence between them is poorly understood. We assume we have an estimator that is squareroot-consistent for a particular estimand, and the dependence structure is weak enough so that the standardized estimator is asymptotically normally distributed. Our goal is to estimate the asymptotic variance of the standardized estimator so that we can construct a Wald-type confidence interval for the estimate. In this paper we present an approach that allows us to learn this asymptotic variance from a sequence of influence function based candidate variance estimators. We focus on time dependence, but the method we propose generalizes to data with arbitrary dependence structure. We show our approach is theoretically consistent under appropriate conditions, and evaluate its practical performance with a simulation study, which shows our method compares favorably with various existing subsampling and bootstrap approaches. We also include a real-world data analysis, estimating an average treatment effect (and a confidence interval) of ventilation rate on illness absence for a classroom observed over time.
Disciplines
Biostatistics
Suggested Citation
Davies, Molly M. and van der Laan, Mark J., "Sieve Plateau Variance Estimators: A New Approach to Confidence Interval Estimation for Dependent Data" (May 2014). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 322.
https://biostats.bepress.com/ucbbiostat/paper322
Comments
Molly M. Davies is a doctoral candidate in the Group in Biostatistics, University of California at Berkeley (email: mollymdavies@gmail.com). Mark J. van der Laan is a Professor of Statistics and Biostatistics, University of California at Berkeley (email: laan@berkeley.edu). This work was supported in part by the NIH Grant 2R01AI074345-06A1. We are grateful to Mark J. Mendell of the Indoor Environment Group, Environmental Energy Technologies Division, Lawrence Berkeley National Laboratory, for providing the data we use in the practical data analysis in this paper. Funds for the project that generated these data came from the California Energy Commission, and from the Assistant Secretary for Energy Efficiency and Renewable Energy, Office of Building Technology, State, and Community Programs, of the US Department of Energy under Contract No. DE-AC02-05CH11231. The authors would also like to acknowledge Nathan Kurz for his invaluable computational insight and assistance.