"Optimized Variable Selection Via Repeated Data Splitting" by Marinela Capanu, Colin B. Begg et al.

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Title

Optimized Variable Selection Via Repeated Data Splitting

Authors

Marinela Capanu, Memorial Sloan-Kettering Cancer CenterFollow
Colin B. Begg, Memorial Sloan-Kettering Cancer CenterFollow
Mithat Gonen, Memorial Sloan-Kettering Cancer CenterFollow

Abstract

We introduce a new variable selection procedure that repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. Simulation results show that the proposed variable selection technique enjoys superior performance compared to candidate methods, being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy at our institution.

Disciplines

Biostatistics | Statistical Methodology | Statistical Models

Suggested Citation

Capanu, Marinela; Begg, Colin B.; and Gonen, Mithat, "Optimized Variable Selection Via Repeated Data Splitting" (January 2017). Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series. Working Paper 34.
https://biostats.bepress.com/mskccbiostat/paper34

suplemmentary tables.pdf (135 kB)

Download

Included in

Biostatistics Commons, Statistical Methodology Commons, Statistical Models Commons

COinS

Collection of Biostatistics Research Archive

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Browse

Search

Author Corner

MSKCC Biostatistics

Collection of Biostatistics Research Archive

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Title

Authors

Abstract

Disciplines

Suggested Citation

Included in

Share

Browse

Search

Author Corner

MSKCC Biostatistics