FAQ

FAQ

What is NMF?

NMF (Nonnegative Matrix Factorization) is a mathematical technique with a long history in the field of genomics for the analysis of bulk RNA sequencing data1,2, and it has been widely adopted as a powerful dimensionality reduction tool for single-cell data as well3,4,5–10. It is unique among matrix factorization methods in identifying overlapping patterns in high-throughput data to define biological processes that can co-occur across cell types or biological conditions11.

What is CoGAPS?

CoGAPS (Coordinated Gene Activity across Pattern Subsets) is a Bayesian NMF (Nonnegative Matrix Factorization) algorithm. It can be used to perform sparse matrix factorization on any data, and when this data represents biomolecules, to do gene set analysis. CoGAPS improves on other enrichment measurement methods by combining a Markov chain Monte Carlo (MCMC) matrix factorization algorithm (GAPS) with a threshold-independent statistic inferring activity on gene sets.

Who should use CoGAPS?

Anyone can use CoGAPS; no machine learning experience is required.

What kind of data does CoGAPS work on?

CoGAPS can be used to perform sparse matrix factorization on any data. And when this data represents biomolecules, to do gene set analysis.

How do I cite CoGAPS?

If you use the CoGAPS package for your analysis, please cite:

Fertig EJ, Ding J, Favorov AV, Parmigiani G, Ochs MF (2010). “CoGAPS: an integrated R/C++ package to identify overlapping patterns of activation of biological processes from expression data.” Bioinformatics, 26(21), 2792–2793.

If you use the gene set statistic, please cite Ochs et al. (2009)

References

  1. Brunet, J.-P., -P. Brunet, J., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences vol. 101 4164–4169 Preprint at https://doi.org/10.1073/pnas.0308531101 (2004).

  2. Moloshok, T. D. et al. Application of Bayesian decomposition for analysing microarray data. Bioinformatics 18, 566–575 (2002).

  3. Stein-O’Brien, G. L. et al. Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species. Cell Syst 8, 395–411.e8 (2019).

  4. Clark, B. S. et al. Single-Cell RNA-Seq Analysis of Retinal Development Identifies NFI Factors as Regulating Mitotic Exit and Late-Born Cell Specification. Neuron 102, 1111–1126.e5 (2019).

  5. Lê Cao, K.-A. et al. Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol. 22, 220 (2021).

  6. Zhu, X., Ching, T., Pan, X., Weissman, S. M. & Garmire, L. Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization. PeerJ 5, e2888 (2017).

  7. Duren, Z. et al. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations. Proc. Natl. Acad. Sci. U. S. A. 115, 7723–7728 (2018).

  8. DeBruine, Z. J., Melcher, K. & Triche, T. J. Fast and robust non-negative matrix factorization for single-cell experiments. bioRxiv 2021.09.01.458620 (2021) doi:10.1101/2021.09.01.458620.

  9. Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient Generation of Transcriptomic Profiles by Random Composite Measurements. Cell 171, 1424–1436.e18 (2017).

  10. Wu, Y., Tamayo, P. & Zhang, K. Visualizing and Interpreting Single-Cell Gene Expression Datasets with Similarity Weighted Nonnegative Embedding. Cell Syst 7, 656–666.e4 (2018).

  11. Stein-O’Brien, G. L. et al. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet. 34, 790–805 (2018).