Longitudinal studies of older adults usually need to account for deaths and missing data. The study databases often include multiple health-related variables, whose trends over time are hard to compare because they were measured on different scales. Here we present a unified approach to these three problems that was developed and used in the Cardiovascular Health Study. Data were first transformed to a new scale that had integer/ratio properties, and on which “dead” logically takes the value zero. Missing data were then imputed on this new scale, using each person’s own data over time. Imputation could thus be informed by impending death. The new transformed and imputed variable has a value for every person at every potential time, accounts for death, and can also be considered as a measure of “standardized health” that permits comparison of variables that were originally measured on different scales. The imputed variable can also be transformed back to the original scale, which differs from the original data in that missing values have been imputed. Imputed values near death required an addition “post-adjustment”. One approach is shown in sections 5 and 6. In the resulting tidy dataset, every observation is labeled as to whether it was observed, imputed (and how), or the person was dead at the time. The resulting “tidy” dataset can be considered complete, but is flexible enough to permit analysts to handle missing data and deaths in other ways. This approach may be useful for other longitudinal studies as well as for the Cardiovascular Health Study.
Biostatistics | Longitudinal Data Analysis and Time Series
Diehr, Paula, "Methods for Dealing with Death and Missing Data, and for Standardizing Different Health Variables in Longitudinal Datasets: The Cardiovascular Health Study" (April 2016). UW Biostatistics Working Paper Series. Working Paper 390.