Statistical challenges arise in identifying meaningful patterns and structures from high dimensional genomic data sets. Relating HIV genotype (sequence of amino acids) to phenotypic resistance presents a typical problem. When the HIV virus is under antiretroviral drug pressure, unfavorable mutations of the target genes often lead to greatly increased resistance of the virus to drugs, including drugs the virus has not been exposed to. Identification of mutation combinations and their correlation to drug resistance is critical in guiding efficient prescription of HIV drugs. The identification of a subset of codons associated with drug resistance from a set of several hundreds of codons presents a multiple testing problem. Statistical issues arising from genomic data multiple testing procedures include the choice of the null test-statistic distribution used to define cut-offs. Controlling familywise error rate implies controlling the number of false positives among true nulls. Given the large number of hypotheses to be tested, the number of true nulls is unknown. We apply two multiple testing procedures (MTPs) controlling familywise error rate: an adhoc augmented-Bonferroni method and a Empirical Bayes procedure originally proposed in van der Laan, Birkner and Hubbard(2005). Using simulations, we demonstrate that the proposed MTPs are less conservative than the traditional methods such as Bonferroni and Holm's procedures. We apply the methods to HIV resistance data where we wish to identify mutations in the protease gene associated with Amprenavir resistance.



Included in

Biostatistics Commons