Abstract
We consider the problem of obtaining population-based inference in the presence of missing data and outliers in the context of estimating obesity prevalence and body-mass index (BMI) measures from the Healthy For Life Study. Identifying multiple outliers in a multivariate setting is problematic because of problems such as masking, in which groups of outliers inflate the covariance matrix in a fashion that prevents their identification when included, and swamping, in which outliers skew covariances in a fashion that make non-outling observations appear to be outliers. We develop a latent class model that assumes each observation belongs to one of $K$ unobserved latent classes, which each latent class having a distinct covariance matrix. We consider the latent class covariance matrix with the largest determinant to form an ``outlier class.'' By separating the covariance matrix for the outliers from the covariance matricies for the remainder of the data, we avoid the problems of masking and swamping. By further utilizing a multiple imputation approach, we simultaneously 1) conduct inference after removing cases the appear to be outliers; 2) promulgate uncertainty in the outlier status through the model inference; and 3) account for the sample design in the population inference. We also construct the imputation model in a fashion that accounts for the sample design.
Disciplines
Design of Experiments and Sample Surveys
Suggested Citation
Elliott, Michael, "Multiple Imputation in the Presence of Outliers" (May 2006). The University of Michigan Department of Biostatistics Working Paper Series. Working Paper 59.
https://biostats.bepress.com/umichbiostat/paper59