Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for delivering analysis results in a timely manner. In this paper, we describe a high-performance C++ toolbox for NMF, called hpcNMF, that is designed for use on desktop computers and distributed computer clusters. Algorithms based on different statistical models and cost functions as well as various metrics for model selection and evaluating goodness-of-fit are implemented in the toolbox. hpcNMF is platform independent and does not require the use of any special libraries. It is compatible with Windows, Linux and Mac operating systems; and message-passing interface is required for hpcNMF to be deployed on computer clusters to leverage the power of parallelized computing. We illustrate the utility of this toolbox using several real examples encompassing a broad range of applications.


Artificial Intelligence and Robotics | Astrophysics and Astronomy | Biochemistry, Biophysics, and Structural Biology | Bioinformatics | Biomechanics | Biometry | Biostatistics | Categorical Data Analysis | Computational Biology | Computational Linguistics | Computational Neuroscience | Computer Sciences | Discourse and Text Linguistics | Environmental Sciences | Genetics and Genomics | Kinesiology | Medicine and Health Sciences | Microarrays | Molecular Biology | Motor Control | Multivariate Analysis | Numerical Analysis and Computation | Numerical Analysis and Scientific Computing | Oceanography and Atmospheric Sciences and Meteorology | Semantics and Pragmatics | Statistical Models | Statistics and Probability | Theory and Algorithms