For clustered data in the medical sciences, disease is present when one or more of the observations in the cluster has the disease condition. This paper focuses on estimation of periodontal disease prevalence defined as the probability that one or more tooth sites have disease in a randomly selected subject. The prohibitive exam time and monetary cost of the full-mouth examination makes partial-mouth recording protocols attractive alternative methods to assess chronic periodontitis. In particular, Beck et al. (2006) proposed the random site selection method (RSSM), which pre-specifies a fixed number of tooth sites to be selected randomly from each subject. RSSM could reduce the examination time, but standard estimators that define an individual's disease status solely in terms of selected sites tend to underestimate disease prevalence. We define each mouth as a cluster and disease status (presence or absence) at each tooth site as a binary variable. We describe a prevalence estimator based on the conditional linear family (CLF) of correlated binary distributions under the working assumptions of equal site-level means and exchangeable pairwise correlation for all within-cluster pairs of sites. We derive a variance estimator for the CLF-RSSM prevalence estimator by the delta method. Using simulated data, our prevalence estimator and its variance estimator have small to negligible bias and confidence intervals for prevalence have coverage near the 95% nominal level when the working model is correct. Taking missing teeth into consideration, the CLF-RSSM prevalence estimator has approximately 90% coverage in our simulations. Given a more realistic unequal means and dental correlation structure, the CLF-RSSM prevalence and its standard deviation estimator do not perform well under model misspecification. While the overall approach to the estimation of disease prevalence at the cluster level using partial cluster sampling is promising, new estimators that incorporate more realistic distributional assumptions of correlated binary data (e.g. tooth surfaces in a mouth) may be needed according to the application.
Wang, Rujin and Preisser, John S., "Prevalence Estimation at the Cluster Level for Correlated Binary Data Using Random Partial-Cluster Sampling" (September 2016). The University of North Carolina at Chapel Hill Department of Biostatistics Technical Report Series. Working Paper 46.