The maximum entropy principle for compositional data.

Corey Weistuch, Jiening Zhu, Joseph O Deasy, Allen R Tannenbaum
Author Information
  1. Corey Weistuch: Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, USA.
  2. Jiening Zhu: Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, USA.
  3. Joseph O Deasy: Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, USA.
  4. Allen R Tannenbaum: Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, USA. allen.tannenbaum@stonybrook.edu.

Abstract

BACKGROUND: Compositional systems, represented as parts of some whole, are ubiquitous. They encompass the abundances of proteins in a cell, the distribution of organisms in nature, and the stoichiometry of the most basic chemical reactions. Thus, a central goal is to understand how such processes emerge from the behaviors of their components and their pairwise interactions. Such a study, however, is challenging for two key reasons. Firstly, such systems are complex and depend, often stochastically, on their constituent parts. Secondly, the data lie on a simplex which influences their correlations.
RESULTS: To resolve both of these issues, we provide a general and data-driven modeling tool for compositional systems called Compositional Maximum Entropy (CME). By integrating the prior geometric structure of compositions with sample-specific information, CME infers the underlying multivariate relationships between the constituent components. We provide two proofs of principle. First, we measure the relative abundances of different bacteria and infer how they interact. Second, we show that our method outperforms a common alternative for the extraction of gene-gene interactions in triple-negative breast cancer.
CONCLUSIONS: CME provides novel and biologically-intuitive insights and is promising as a comprehensive quantitative framework for compositional data.

Keywords

References

  1. Neural Comput. 2021 Apr 13;33(5):1145-1163 [PMID: 33617741]
  2. Sci Rep. 2015 Jul 14;5:12323 [PMID: 26169480]
  3. J Signal Transduct. 2011;2011:195239 [PMID: 21799948]
  4. Sci Rep. 2017 Aug 1;7(1):7035 [PMID: 28765612]
  5. FEBS Lett. 2001 Jan 19;488(3):179-84 [PMID: 11163768]
  6. Br J Math Stat Psychol. 2020 May;73(2):187-212 [PMID: 31206621]
  7. Cell. 1989 Jun 30;57(7):1083-93 [PMID: 2525423]
  8. Nature. 2006 Apr 20;440(7087):1007-12 [PMID: 16625187]
  9. Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4245-50 [PMID: 23431203]
  10. PLoS Comput Biol. 2020 Nov 30;16(11):e1008435 [PMID: 33253160]
  11. Nature. 2012 Apr 04;486(7403):395-9 [PMID: 22495314]
  12. Sci Signal. 2013 Apr 02;6(269):pl1 [PMID: 23550210]
  13. Proc Natl Acad Sci U S A. 2021 Oct 5;118(40): [PMID: 34588302]
  14. Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Jul;90(1):010101 [PMID: 25122234]
  15. Nature. 2012 Apr 18;486(7403):346-52 [PMID: 22522925]
  16. Oncogene. 2000 Nov 20;19(49):5636-42 [PMID: 11114744]
  17. Annu Rev Phys Chem. 2020 Apr 20;71:213-238 [PMID: 32075515]
  18. Front Microbiol. 2014 May 20;5:219 [PMID: 24904535]
  19. J Chem Phys. 2018 Jan 7;148(1):010901 [PMID: 29306272]
  20. PLoS Comput Biol. 2015 Jul 30;11(7):e1004182 [PMID: 26225866]
  21. Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301 [PMID: 22106262]
  22. Nat Rev Microbiol. 2012 Jul 16;10(8):538-50 [PMID: 22796884]
  23. Genomics Inform. 2019 Mar;17(1):e6 [PMID: 30929407]
  24. Microbiome. 2017 Mar 3;5(1):27 [PMID: 28253908]
  25. IEEE Trans Neural Netw. 2007 Sep;18(5):1529-31 [PMID: 18220201]
  26. JCO Precis Oncol. 2017 Jul;2017: [PMID: 28890946]
  27. Cell. 2021 Jan 21;184(2):334-351.e20 [PMID: 33434495]
  28. PLoS One. 2022 Mar 14;17(3):e0265150 [PMID: 35286348]
  29. Front Microbiol. 2017 Nov 15;8:2224 [PMID: 29187837]
  30. Nat Biotechnol. 2014 Sep;32(9):903-14 [PMID: 25150838]
  31. PLoS Comput Biol. 2013;9(12):e1003388 [PMID: 24348232]
  32. Genome Res. 2003 Oct;13(10):2363-71 [PMID: 14525934]
  33. ISME J. 2019 Nov;13(11):2647-2655 [PMID: 31253856]
  34. Sci Rep. 2012;2:802 [PMID: 23150773]
  35. Cancer Discov. 2012 May;2(5):401-4 [PMID: 22588877]
  36. Mol Cell Biol. 2005 Sep;25(17):7432-40 [PMID: 16107692]

Grants

  1. BCRF-17-193/Breast Cancer Research Foundation
  2. P30 CA008748/NCI NIH HHS
  3. R01AT01141/NIH HHS
  4. W911NF2210292/Army Research Office
  5. FA9550-17-1-043/Air Force Office of Scientific Research

MeSH Term

Entropy
Proteins
Bacteria

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0dataCompositionalsystemscompositionalCMEpartsabundancescomponentsinteractionstwoconstituentprovideMaximumprincipleentropyBACKGROUND:representedwholeubiquitousencompassproteinscelldistributionorganismsnaturestoichiometrybasicchemicalreactionsThuscentralgoalunderstandprocessesemergebehaviorspairwisestudyhoweverchallengingkeyreasonsFirstlycomplexdependoftenstochasticallySecondlyliesimplexinfluencescorrelationsRESULTS:resolveissuesgeneraldata-drivenmodelingtoolcalledEntropyintegratingpriorgeometricstructurecompositionssample-specificinformationinfersunderlyingmultivariaterelationshipsproofsFirstmeasurerelativedifferentbacteriainferinteractSecondshowmethodoutperformscommonalternativeextractiongene-genetriple-negativebreastcancerCONCLUSIONS:providesnovelbiologically-intuitiveinsightspromisingcomprehensivequantitativeframeworkmaximumInferenceNetworks

Similar Articles

Cited By