Abstract

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. In this article, we first propose the GMD regression (GMDR) as an estimation/prediction tool that seamlessly incorporates two-way structures into high-dimensional linear models. The proposed GMDR directly regresses the outcome on a set of GMD components, selected by a novel procedure that guarantees the best prediction performance. We then propose the GMD inference (GMDI) framework to identify variables that are associated with the outcome for any model in a large family of regression models that includes GMDR. As opposed to most existing tools for high-dimensional inference, GMDI efficiently accounts for pre-specified two-way structures and can provide asymptotically valid inference even for non-sparse coefficient vectors. We study the theoretical properties of GMDI in terms of both the type-I error rate and power. We demonstrate the effectiveness of GMDR and GMDI on simulated data and an application to microbiome data.

Disciplines

Biostatistics | Multivariate Analysis | Statistical Methodology | Statistical Theory

Share

COinS