We have developed a statistical method for the analysis of array based CGH data to detect genomic DNA copy number changes. Our method allows us to answer the biologically relevant questions (what is the probability that a given gene or region has increased or decreased copy number changes) in a clear and simple way, within a rigorous statistical framework. We use a non-homogeneous Hidden Markov Model that incorporates distance between genes, a crucial requirement to analyze data from platforms where distances between probes is highly variable. As the true number of hidden states (states of copy number changes) is not known in advance in biological samples, we do not fix the number of hidden states of the model, but use Reversible Jump Markov Chain Monte Carlo for inference. We can therefore investigate the likely number of hidden states in the data and, more importantly, provide posterior probabilities that a gene or a set of genes is in a given state. To summarize results, we employ Bayesian Model Averaging, averaging over models with different states, and thus incorporating model uncertainty. Our method can be used to analyze data from each chromosome independently or all chromosomes together, offering both flexibility in the biological phenomena studied and increased statistical precision. Thus, our method provides a rigorous statistical foundation for locating genes and chromosomal regions with altered copy number and potentially related to cancer and other complex diseases.



Included in

Microarrays Commons