"REMOVING TECHNICAL VARIABILITY IN RNA-SEQ DATA USING CONDITIONAL QUANT" by Kasper D. Hansen, Rafael A. Irizarry et al.

Johns Hopkins University, Dept. of Biostatistics Working Papers

Title

REMOVING TECHNICAL VARIABILITY IN RNA-SEQ DATA USING CONDITIONAL QUANTILE NORMALIZATION

Authors

Kasper D. Hansen, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Rafael A. Irizarry, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Zhijin Wu, Department of Community Health, Section of Biostatistics, Brown UniversityFollow

Comments

This is a preprint

Abstract

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.

Disciplines

Bioinformatics | Computational Biology

Suggested Citation

Hansen, Kasper D.; Irizarry, Rafael A.; and Wu, Zhijin, "REMOVING TECHNICAL VARIABILITY IN RNA-SEQ DATA USING CONDITIONAL QUANTILE NORMALIZATION" (May 2011). Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 227.
https://biostats.bepress.com/jhubiostat/paper227

Download

Included in

Bioinformatics Commons, Computational Biology Commons

COinS

Collection of Biostatistics Research Archive

Johns Hopkins University, Dept. of Biostatistics Working Papers

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

JHU Biostatistics

Collection of Biostatistics Research Archive

Johns Hopkins University, Dept. of Biostatistics Working Papers

Title

Authors

Comments

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

JHU Biostatistics