Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution of alleles at these loci, present noisy patterns. In this paper, we propose a new distance-based LD measure, R, which takes into account multilocus haplotypes around the two loci in order to exploit information from neighboring loci. The LD measure R yields a matrix of pairwise distances between markers, based on the correlation between the lengths of shared haplotypes among chromosomes around these markers. Data analysis demonstrates that visualization of LD patterns through the R matrix reveals more deterministic patterns, with much less noise, than using classical LD measures. Moreover, the patterns are highly compatible with recently suggested models of haplotype block structure. We propose to apply the new LD measure to define haplotype blocks through cluster analysis. Specifically, we present a distance-based clustering algorithm, DHPBlocker, which performs hierarchical partitioning of an ordered sequence of markers into disjoint and adjacent blocks with a hierarchical structure. The proposed method integrates information on the two main existing criteria in defining haplotype blocks, namely, LD and haplotype diversity, through the use of silhouette width and description length as cluster validity measures, respectively. The new LD measure and clustering procedure are applied to single nucleotide polymorphism (SNP) datasets from the human 5q31 region (Daly et al. 2001) and the class II region of the human major histocompatibility complex (Jeffreys et al. 2001). Our results are in good agreement with published results. In addition, analyses performed on different subsets of markers indicate that the method is robust with regards to the allele frequency and density of the genotyped markers. Unlike previously proposed methods, our new cluster-based method can uncover hierarchical relationships among blocks and can be applied to polymorphic DNA markers or amino acid sequence data.


Genetics | Multivariate Analysis | Statistical Methodology | Statistical Theory