In GWAS, “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. The standard for reporting findings from a GWAS requires a two-stage design, in which discovered associations are replicated in an independent follow-up study. Current practices for declaring generalizations rely on testing associations while controlling the Family Wise Error Rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. While this approach limits false generalizations, we show that it does not guarantee control over the FWER or False Discovery Rate (FDR) of the generalization null hypotheses. In addition, it fails to leverage the two-stage design to increase power for detecting generalized associations. We develop a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of SNP-trait associations. We show that our framework accommodates various SNP selection rules for generalization testing based on p-values in the discovery study, and still control FWERg or FDRg. A key finding is that it is often beneficial to use a more lenient p-value threshold then the genome-wide significance threshold. For instance, in a GWAS of Total Cholesterol (TC) in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with p-values< 5 × 10−8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with p-values< 6.6×10−5 (89 regions), we generalized SNPs from 27 regions.



Included in

Biostatistics Commons