In epidemiology, public health and social science, mediation analysis is often undertaken to investigate the extent to which the effect of a risk factor on an outcome of interest is mediated by other covariates. A pivotal quantity of interest in such an analysis is the mediation proportion. A common method for estimating it, termed the "difference method'', compares estimates from models with and without the hypothesized mediator. However, rigorous methodology for estimation and statistical inference for this quantity has not previously been available. We formulated the problem for the Cox model and generalized linear models, and utilize a data duplication algorithm together with a generalized estimation equations approach for estimating the mediation proportion and its variance. We further considered the assumption that the same link function hold for the marginal and conditional models, a property which we term "g-linkability''. We show that our approach is valid whenever g-linkability holds, exactly or approximately, and present results from an extensive simulation study to explore finite sample properties. We developed estimation and inference methods that reflect the fact the mediation proportion is bounded between zero and one. In particular, we developed statistical testing procedures for the existence of mediation that honors these bounds, and compare the empirical behavior of crude and logit based confidence intervals. The methodology is illustrated by an analysis of pre-menopausal breast cancer incidence in the Nurses' Health Study. User-friendly publicly available software implementing those methods can be downloaded at the last author's website.



Included in

Biostatistics Commons