Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, i.e., combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using SNP genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiquity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-liklihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.



Included in

Biostatistics Commons