Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for delivering analysis results in a timely manner. In this paper, we describe a high-performance C++ toolbox for NMF, called hpcNMF, that is designed for use on desktop computers and distributed computer clusters. Algorithms based on different statistical models and cost functions as well as various metrics for model selection and evaluating goodness-of-fit are implemented in the toolbox. hpcNMF is platform independent and does not require the use of any special libraries. It is compatible with Windows, Linux and Mac operating systems; and message-passing interface is required for hpcNMF to be deployed on computer clusters to leverage the power of parallelized computing. We illustrate the utility of this toolbox using several real examples encompassing a broad range of applications.
Artificial Intelligence and Robotics | Astrophysics and Astronomy | Biochemistry, Biophysics, and Structural Biology | Bioinformatics | Biomechanics | Biometry | Biostatistics | Categorical Data Analysis | Computational Biology | Computational Linguistics | Computational Neuroscience | Computer Sciences | Discourse and Text Linguistics | Environmental Sciences | Genetics and Genomics | Kinesiology | Medicine and Health Sciences | Microarrays | Molecular Biology | Motor Control | Multivariate Analysis | Numerical Analysis and Computation | Numerical Analysis and Scientific Computing | Oceanography and Atmospheric Sciences and Meteorology | Semantics and Pragmatics | Statistical Models | Statistics and Probability | Theory and Algorithms
Devarajan, Karthik and Wang, Guoli, "hpcNMF: A high-performance toolbox for non-negative matrix factorization" (February 2016). COBRA Preprint Series. Working Paper 115.
Artificial Intelligence and Robotics Commons, Astrophysics and Astronomy Commons, Bioinformatics Commons, Biomechanics Commons, Biometry Commons, Biostatistics Commons, Categorical Data Analysis Commons, Computational Biology Commons, Computational Linguistics Commons, Computational Neuroscience Commons, Discourse and Text Linguistics Commons, Environmental Sciences Commons, Medicine and Health Sciences Commons, Microarrays Commons, Molecular Biology Commons, Motor Control Commons, Multivariate Analysis Commons, Numerical Analysis and Computation Commons, Numerical Analysis and Scientific Computing Commons, Oceanography and Atmospheric Sciences and Meteorology Commons, Semantics and Pragmatics Commons, Statistical Models Commons, Theory and Algorithms Commons