We introduce a method for estimating the prevalence of disease using data from a case-control family study performed to investigate the aggregation of disease in families. The families are sampled via case and control probands, and the resulting data consist of information on disease status and covariates for the probands and their relatives. We introduce estimators for overall prevalence and for covariate stratum-specific prevalence (e.g., sex-specific prevalence) that yield approximately unbiased estimates of their population counterparts. We also introduce corresponding confidence intervals that have good coverage properties even for small prevalences. The estimators and intervals address the over-representation of diseased individuals in case-control family data by using only the relatives (of the probands) and by taking into account whether each relative was selected via a case or a control proband. Finally, we describe a simulation experiment in which the estimators and intervals were applied to case-control family datasets sampled from a fictional population that resembled the catchment area for an Austrian family study of major depressive disorder. The resulting estimates varied closely and symmetrically around their population counterparts, and the resulting intervals had good coverage properties.



Included in

Epidemiology Commons