Longitudinal studies of older adults usually need to account for deaths and missing data. The databases often include multiple health-related variables, which are hard to compare because they were measured on different scales. Here we present the unified approach to these three problems, developed and used in the Cardiovascular Health Study. Data were first transformed to a new scale that had integer/ratio properties, and on which “dead” takes the value zero. Missing data were then imputed on this new scale, using each person’s own data over time. Imputation could thus be informed by impending death. The new transformed and imputed variable has a value for every person at every potential time, accounts for death, and can also be considered as a measure of “standardized health” that permits comparison of variables that were originally measured on different scales. The new variable can also be transformed back to the original scale, where it differs from the original data in that missing values have been imputed. Each observation is labeled as to whether it was observed, imputed (and how), or the person was dead at the time. An example using real is CHS data is given. The resulting “tidy” dataset can be considered complete, but is flexible enough to permit analysts to handle missing data and deaths in other ways. This approach may be useful for other longitudinal studies as well as for the Cardiovascular Health Study.
Biostatistics | Longitudinal Data Analysis and Time Series
Diehr, Paula, "Methods for Dealing with Death and Missing Data, and for Standardizing Different Health Variables in Longitudinal Datasets: The Cardiovascular Health Study" (February 2013). UW Biostatistics Working Paper Series. Working Paper 390.