Abstract
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Disciplines
Bioinformatics | Computational Biology | Genetics | Microarrays | Multivariate Analysis | Statistical Models
Suggested Citation
Ding, Beiying and Gentleman, Robert, "Classification Using Generalized Partial Least Squares" (May 2004). Bioconductor Project Working Papers. Working Paper 5.
https://biostats.bepress.com/bioconductor/paper5
Included in
Bioinformatics Commons, Computational Biology Commons, Genetics Commons, Microarrays Commons, Multivariate Analysis Commons, Statistical Models Commons
Comments
Submitted to Journal of Computational and Graphical Statistics.