Abstract

Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g. top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences for these metabolites are made. However, this approach is possible to lead biased biological inferences because these metabolites are not objectively selected by statistical criterion. We proposed a statistical procedure to pick up metabolites by statistical hypothesis test of factor loading in PCA and make biological inferences by metabolite set enrichment analysis (MSEA) for these significant metabolites. This procedure depends on the fact that the eigenvector in PCA for autoscaled data is proportional to the correlation coefficient between PC score and each metabolite levels. We applied this approach for two metabolomic data of mice liver samples. 136 of 282 metabolites in first case study and 66 of 275 metabolites in second case study were statistically significant. This result suggests that to set the previously-determined number of metabolites is not appropriate because the number of significant metabolites is different in each study when using factor loading in PCA. Moreover, MSEA was performed for these significant metabolites and significant metabolic pathways can be detected. These results are acceptable when compared with previous biological knowledge. It is essential to select metabolites statistically for making unbiased biological inferences from metabolome data, when using factor loading in PCA. We proposed a statistical procedure to pick up metabolites by statistical hypothesis test of factor loading in PCA and make biological inferences by MSEA for these significant metabolites. We developed an R package "mseapca" to perform this approach. The “mseapca” package is publicity available on CRAN website.

Disciplines

Biostatistics

Included in

Biostatistics Commons

Share

COinS