"A Data-Analytic Strategy for Protein-Biomarker Discovery: Profiling of" by Yutaka Yasui, Margaret S. Pepe et al.

Title

A Data-Analytic Strategy for Protein-Biomarker Discovery: Profiling of High-Dimensional Proteomic Data for Cancer Detection

Authors

Yutaka Yasui, Fred Hutchinson Cancer Research CenterFollow
Margaret S. Pepe, University of WashingtonFollow
Mary Lou Thompson, University of WashingtonFollow
Bao-Ling Adam, Eastern Virginia Medical SchoolFollow
George L. Wright Jr., Eastern Virginia Medical SchoolFollow
Yinsheng Qu, Fred Hutchinson Cancer Research CenterFollow
John D. Potter, Fred Hutchinson Cancer Research CenterFollow
Marcy Winget, Fred Hutchinson Cancer Research CenterFollow
Mark Thornquist, Fred Hutchinson Cancer Research CenterFollow
Ziding Feng, University of Washington & Fred Hutchinson Cancer Research CenterFollow

Comments

Published in Biostatistics 4:449-63 (2003).

Abstract

With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of "signature" protein profiles specific to each pathologic state (e.g., normal vs. cancer) or differential profiles between experimental conditions (e.g., treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data analytic strategy for discovering protein biomarkers based on such high-dimensional mass-spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique.

Our data analytic strategy takes properties of the SELDI mass-spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After these pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.

Suggested Citation

Yasui, Yutaka; Pepe, Margaret S.; Thompson, Mary Lou; Adam, Bao-Ling; Wright, George L. Jr.; Qu, Yinsheng; Potter, John D.; Winget, Marcy; Thornquist, Mark; and Feng, Ziding, "A Data-Analytic Strategy for Protein-Biomarker Discovery: Profiling of High-Dimensional Proteomic Data for Cancer Detection" (January 2002). UW Biostatistics Working Paper Series. Working Paper 177.
https://biostats.bepress.com/uwbiostat/paper177

Collection of Biostatistics Research Archive

UW Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Suggested Citation

Browse

Search

Author Corner

UW Biostatistics

Collection of Biostatistics Research Archive

UW Biostatistics Working Paper Series

Title

Authors

Comments

Abstract

Suggested Citation

Share

Browse

Search

Author Corner

UW Biostatistics