Abstract

Variable screening has emerged as a crucial first step in the analysis of high-throughput data, but existing procedures can be computationally cumbersome, difficult to justify theoretically, or inapplicable to certain types of analyses. Motivated by a high-dimensional censored quantile regression problem in multiple myeloma genomics, this paper makes three contributions. First, we establish a score test-based screening framework, which is widely applicable, extremely computationally efficient, and relatively simple to justify. Secondly, we propose a resampling-based procedure for selecting the number of variables to retain after screening according to the principle of reproducibility. Finally, we propose a new iterative score test screening method which is closely related to sparse regression. In simulations we apply our methods to four di erent regression models and show that they can outperform existing procedures. We also apply score test screening to an analysis of gene expression data from multiple myeloma patients using a censored quantile regression model to identify high-risk genes.

Disciplines

Biostatistics

Included in

Biostatistics Commons

Share

COinS