"OPTIMIZED CROSS-STUDY ANALYSIS OF MICROARRAY-BASED PREDICTORS" by Xiaogang Zhong, Luigi Marchionni et al.

Johns Hopkins University, Dept. of Biostatistics Working Papers

Title

OPTIMIZED CROSS-STUDY ANALYSIS OF MICROARRAY-BASED PREDICTORS

Authors

Xiaogang Zhong, Department of Applied Mathematics and Statistics, Johns Hopkins University
Luigi Marchionni, Department of Oncology, Johns Hopkins University
Leslie Cope, Departments of Oncology and Biostatistics, Johns Hopkins University
Edwin S. Iversen, Institute of Statistics and Decision Sciences, Duke University
Elizabeth S. Garrett-Mayer, Departments of Oncology and Biostatistics, Johns Hopkins University
Edward Gabrielson, Departments of Oncology and Pathology, Johns Hopkins University
Giovanni Parmigiani, The Sydney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Department of Pathology & Department of Biostatistics, Johns Hopkins Bloomberg School of Public HealthFollow

Abstract

Background: Microarray-based gene expression analysis is widely used in cancer research to discover molecular signatures for cancer classification and prediction. In addition to numerous independent profiling projects, a number of investigators have analyzed multiple published data sets for purposes of cross-study validation. However, the diverse microarray platforms and technical approaches make direct comparisons across studies difficult, and without means to identify aberrant data patterns, less than optimal. To address this issue, we previously developed an integrative correlation approach to systematically address agreement of gene expression measurements across studies, providing a basis for cross-study validation analysis. Here we generalize this methodology to provide a metric for evaluating the overall efficacy of preprocessing and cross-referencing, and explore optimal combinations of filtering and cross-referencing strategies. We operate in the context of validating prognostic breast cancer gene expression signatures on data reported by three different groups, each using a different platform.

Results: To evaluate overall cross-platform reproducibility in the context of a specific prediction problem, we suggest integrative association, that is the cross-study correlation of gene-specific measure of association with the phenotype predicted. Specifically, in this paper we use the correlation among the Cox proportional hazard coefficients for association of gene expression to relapse free survival (RFS). Gene filtering by integrative correlation to select reproducible genes emerged as the key factor to increase the integrative association, while alternative methods of gene cross-referencing and gene filtering proved only to modestly improve the overall reproducibility. Patient selection was another major factor affecting the validation process. In particular, in one of the studies considered, gene expression association with RFS varied across subsets of patients that differ by their ascertainment criteria. One of the subsets proved to be highly consistent with other studies, while others showed significantly lower consistency. Third, as expected, use of cluster-specific mean expression profiles in the Cox model yielded more generalizable results than expression data from individual genes. Finally, by using our approach we were able to validate the association between the breast cancer molecular classes proposed by Sorlie et al. and RFS.

Conclusions: This paper provides a simple, practical and comprehensive technique for measuring consistency of molecular classification results across microarray platforms, without requiring subjective judgments about membership of samples in putative clusters. This methodology will be of value in consistently typing breast and other cancers across different studies and platforms in the future. Although the tumor subtypes considered here have been previously validated by their proponents, this is the first independent validation, and the first to include the Affymetrix platform.

Disciplines

Bioinformatics | Computational Biology

Suggested Citation

Zhong, Xiaogang; Marchionni, Luigi; Cope, Leslie ; Iversen, Edwin S.; Garrett-Mayer, Elizabeth S.; Gabrielson, Edward; and Parmigiani, Giovanni, "OPTIMIZED CROSS-STUDY ANALYSIS OF MICROARRAY-BASED PREDICTORS" (January 2007). Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 129.
https://biostats.bepress.com/jhubiostat/paper129

Download

Included in

Bioinformatics Commons, Computational Biology Commons

COinS

Collection of Biostatistics Research Archive

Johns Hopkins University, Dept. of Biostatistics Working Papers

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

JHU Biostatistics

Collection of Biostatistics Research Archive

Johns Hopkins University, Dept. of Biostatistics Working Papers

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

JHU Biostatistics