COBRA Preprint Series

The handling of missing data in molecular epidemiologic studies

Manisha Desai, Stanford UniversityFollow
Jessica Kubo, Stanford UniversityFollow
Denise Esserman, University of North CarolinaFollow
Mary Beth Terry, Columbia UniversityFollow

Abstract

Background: Molecular epidemiologic studies face a missing data problem as biospecimen data are often collected on only a proportion of subjects eligible for study.

Methods: We investigated all molecular epidemiologic studies published in CEBP in 2009 to characterize the prevalence of missing data and to elucidate how the issue was addressed. We considered multiple imputation (MI), a missing data technique that is readily available and easy to implement, as a possible solution.

Results: While the majority of studies had missing data, only 16% compared subjects with and without missing data. Furthermore, 95% of the studies with missing data performed a complete-case (CC) analysis, a method known to yield biased and inefficient estimates.

Conclusions: Missing data methods are not customarily being incorporated into the analyses of molecular epidemiologic studies. Barriers may include a lack of awareness that missing data exists, particularly when availability of data is part of the inclusion criteria; the need for specialized software; and a perception that the CC approach is the gold standard. Standard MI is a reasonable solution that is valid when the data are missing at random (MAR). If the data are not missing at random (NMAR) we recommend MI over CC when strong auxiliary data are available. MI, with the missing data mechanism specified, is another alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI’s ability to account for the uncertainty of these assumptions.

Impact: Missing data methods are underutilized, which can deleteriously affect the interpretation of results.

Disciplines

Epidemiology

Suggested Citation

Desai, Manisha; Kubo, Jessica; Esserman, Denise; and Terry, Mary Beth, "The handling of missing data in molecular epidemiologic studies" (November 2010). COBRA Preprint Series. Working Paper 72.
https://biostats.bepress.com/cobra/art72

Download

Included in

Epidemiology Commons

COinS

Collection of Biostatistics Research Archive

COBRA Preprint Series

The handling of missing data in molecular epidemiologic studies

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

Collection of Biostatistics Research Archive

COBRA Preprint Series

The handling of missing data in molecular epidemiologic studies

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner