Abstract

We consider the problem of obtaining population-based inference in the presence of missing data and outliers in the context of estimating obesity prevalence and body-mass index (BMI) measures from the Healthy For Life Study. Identifying multiple outliers in a multivariate setting is problematic because of problems such as masking, in which groups of outliers inflate the covariance matrix in a fashion that prevents their identification when included, and swamping, in which outliers skew covariances in a fashion that make non-outling observations appear to be outliers. We develop a latent class model that assumes each observation belongs to one of $K$ unobserved latent classes, which each latent class having a distinct covariance matrix. We consider the latent class covariance matrix with the largest determinant to form an ``outlier class.'' By separating the covariance matrix for the outliers from the covariance matricies for the remainder of the data, we avoid the problems of masking and swamping. By further utilizing a multiple imputation approach, we simultaneously 1) conduct inference after removing cases the appear to be outliers; 2) promulgate uncertainty in the outlier status through the model inference; and 3) account for the sample design in the population inference. We also construct the imputation model in a fashion that accounts for the sample design.

Disciplines

Design of Experiments and Sample Surveys

Share

COinS