Improved Semi-Parametric Time Series Models of Air Pollution and Mortality

Francesca Dominici, The Johns Hopkins Bloomberg School of Public Health
Aidian McDermott, The Johns Hopkins Bloomberg School of Public Health
Trevor J. Hastie, Stanford University

Abstract

In 2002, methodological issues around time series analyses of air pollution and health attracted the attention of the scientific community, policy makers, the press, and the diverse stakeholders concerned with air pollution. As the Environmental Protection Agency (EPA) was finalizing its most recent review of epidemiological evidence on particulate matter air pollution (PM), statisticians and epidemiologists found that the S-Plus implementation of Generalized Additive Models (GAM)can overestimate effects of air pollution and understate statistical uncertainty in time series studies of air pollution and health. This discovery delayed the completion of the PM Criteria Document prepared as part of the review of the U.S. National Ambient Air Quality Standard (NAAQS), as the time-series findings were a critical component of the evidence. In addition, it raised concerns about the adequacy of current model formulations and their software implementations.

In this paper we provide improvements in semi-parametric regression directly relevant to risk estimation in time series studies of air pollution. First, we introduce a closed form estimate of the asymptotically exact covariance matrix of the linear component of a GAM. To ease the implementation of these calculations, we develop the S package gam.exact, an extended version of gam. Use of gam.exact allows a more robust assessment of the statistical uncertainty of the estimated pollution coefficients. Second, we develop a bandwidth selection method to reduce confounding bias in the pollution-mortality relationship due to unmeasured time-varying factors such as season and influenza epidemics. Our method selects the number of degrees of freedom in the smooth part of the model that minimizes the mean squared error of the air pollution coefficient. Third, we introduce a conceptual framework to fully explore the sensitivity of the air pollution risk estimates to model choice. We apply our methods to data of the National Mortality Morbidity Air Pollution Study (NMMAPS), which includes time series data from the 90 largest US cities for the period 1987-1994.