Published 2005 in Statistical Applications in Genetics and Molecular Biology, Vol. 4, issue 1, Article 8.


Analysis of viral strand sequence data and viral replication capacity could potentially lead to biological insights regarding the replication ability of HIV-1. Determining specific target codons on the viral strand will facilitate the manufacturing of target specific antiretrovirals. Various algorithmic and analysis techniques can be applied to this application. We propose using multiple testing to find codons which have significant univariate associations with replication capacity of the virus. We also propose using a data adaptive multiple regression algorithm to obtain multiple predictions of viral replication capacity based on an entire mutant/non-mutant sequence profile. The data set to which these techniques were applied consists of 317 patients, each with 282 sequenced protease and reverse transcriptase codons. Initially, the multiple testing procedure (Pollard and van der Laan, 2003) was applied to the individual specific viral sequence data. A single-step multiple testing procedure method was used to control the family wise error rate (FWER) at the five percent alpha level. Additional augmentation multiple testing procedures were applied to control the generalized family wise error (gFWER) or the tail probability of the proportion of false positives (TPPFP). Finally, the loss-based, cross-validated Deletion/Substitution/Addition regression algorithm (Sinisi and van der Laan, 2004) was applied to the dataset separately. This algorithm builds candidate estimators in the prediction of a univariate outcome by minimizing an empirical risk, and it uses cross-validation to select fine-tuning parameters such as: size of the regression model, maximum allowed order of interaction of terms in the regression model, and the dimension of the vector of covariates. This algorithm also is used to measure variable importance of the codons. Findings from these multiple analyses are consistent with biological findings and could possibly lead to further biological knowledge regarding HIV-1 viral data.


Disease Modeling | Statistical Methodology | Statistical Theory