Abstract
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data.
We present a generalized approach to NMF based on Renyi's divergence between two non-negative matrices related to the Poisson likelihood. Our approach unifies various competing models and provides a unique framework for NMF. Furthermore, we generalize the equivalence between NMF and probabilistic latent semantic indexing, a well-known method used in text mining and document clustering applications. We evaluate the performance of our method in the unsupervised setting using consensus clustering and demonstrate its applicability using real-life and simulated data.
Disciplines
Bioinformatics | Categorical Data Analysis | Computational Biology | Multivariate Analysis | Statistical Methodology | Statistical Models | Statistical Theory
Suggested Citation
Devarajan, Karthik; Wang, Guoli; and Ebrahimi, Nader, "A unified approach to non-negative matrix factorization and probabilistic latent semantic indexing" (July 2011). COBRA Preprint Series. Working Paper 80.
https://biostats.bepress.com/cobra/art80
Included in
Bioinformatics Commons, Categorical Data Analysis Commons, Computational Biology Commons, Multivariate Analysis Commons, Statistical Methodology Commons, Statistical Models Commons, Statistical Theory Commons