In air pollution epidemiology, improvements in statistical analysis tools can translate into significant scientific advances, because of the unfavorable signal-to-noise ratios, and large correlations between exposures and confounders. Therefore, the use of a novel model selection approach in identifying time windows of exposure to pollutants that lead to adverse health effects is important and welcome. However, previous literature has raised concerns about approaches that select a model based on a given data set, and then estimate health effects in the same data assuming that the chosen model is correct. Problems can be particularly severe when: 1) the sample size is small for the magnitude of the true health effects to be detected; and 2) candidate predictors are highly correlated and likely to have a similar effect on the health outcome. Bayesian Model Averaging (BMA) has been advocated as a way of estimating health effects accounting for model uncertainty. However, BMA might not be as effective for effect estimation as it has proven to be for prediction. This is because posterior model probabilities might not reflect the ability of the model to provide an estimate of the health effect properly adjusted for confounding. In studies of air pollution and health, the focus should ideally be on estimating health effects, accounting for the uncertainty in the adjustment for confounding factors, especially when model choice and estimation are performed on the same data. However, the development of appropriate statistical tools remains an area of open investigation.
Dominici, Francesca; Wang, Chi; Crainiceanu, Ciprian; and Parmigiani, Giovanni, "MODEL SELECTION AND HEALTH EFFECT ESTIMATION IN ENVIRONMENTAL EPIDEMIOLOGY" (January 2008). Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 164.