Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.
Bioinformatics | Computational Biology
Scharpf, Robert B.; Parmigiani, Giovanni; Pevnser, Jonathan; and Ruczinski, Ingo, "A HIDDEN MARKOV MODEL FOR JOINT ESTIMATION OF GENOTYPE AND COPY NUMBER IN HIGH-THROUGHPUT SNP CHIPS" (February 2007). Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 136.