Representing and extending ensembles of parsimonious evolutionary histories with a directed acyclic graph.

Will Dumm, Mary Barker, William Howard-Snyder, William S DeWitt Iii, Frederick A Matsen Iv
Author Information
  1. Will Dumm: Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA. ORCID
  2. Mary Barker: Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA. ORCID
  3. William Howard-Snyder: Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA. ORCID
  4. William S DeWitt Iii: Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA. ORCID
  5. Frederick A Matsen Iv: Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA. matsen@fredhutch.org. ORCID

Abstract

In many situations, it would be useful to know not just the best phylogenetic tree for a given data set, but the collection of high-quality trees. This goal is typically addressed using Bayesian techniques, however, current Bayesian methods do not scale to large data sets. Furthermore, for large data sets with relatively low signal one cannot even store every good tree individually, especially when the trees are required to be bifurcating. In this paper, we develop a novel object called the "history subpartition directed acyclic graph" (or "history sDAG" for short) that compactly represents an ensemble of trees with labels (e.g. ancestral sequences) mapped onto the internal nodes. The history sDAG can be built efficiently and can also be efficiently trimmed to only represent maximally parsimonious trees. We show that the history sDAG allows us to find many additional equally parsimonious trees, extending combinatorially beyond the ensemble used to construct it. We argue that this object could be useful as the "skeleton" of a more complete uncertainty quantification.

Keywords

References

  1. Mol Biol Evol. 2018 May 1;35(5):1253-1265 [PMID: 29474671]
  2. Bioinformatics. 2022 Aug 2;38(15):3734-3740 [PMID: 35731204]
  3. Genome Biol. 2021 Jul 1;22(1):196 [PMID: 34210356]
  4. Syst Biol. 2007 Jun;56(3):485-95 [PMID: 17562472]
  5. Syst Biol. 2022 Feb 10;71(2):426-438 [PMID: 34398231]
  6. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):749-759 [PMID: 33704069]
  7. Syst Biol. 2015 May;64(3):472-91 [PMID: 25631175]
  8. Mol Biol Evol. 2018 Feb 1;35(2):518-522 [PMID: 29077904]
  9. Syst Biol. 2015 Sep;64(5):709-26 [PMID: 25999395]
  10. Nat Genet. 2021 Jun;53(6):809-816 [PMID: 33972780]
  11. Brief Bioinform. 2022 Mar 10;23(2): [PMID: 35043153]
  12. Nat Genet. 2019 Sep;51(9):1321-1329 [PMID: 31477933]
  13. Mol Biol Evol. 2019 Sep 1;36(9):2069-2085 [PMID: 31127303]
  14. J Comput Biol. 2011 Mar;18(3):445-57 [PMID: 21385046]
  15. Zool Res. 2020 Nov 18;41(6):705-708 [PMID: 33045776]
  16. Yi Chuan. 2020 Feb 20;42(2):212-221 [PMID: 32102777]
  17. Syst Biol. 2023 May 26;: [PMID: 37232476]
  18. Cladistics. 1999 Dec;15(4):415-428 [PMID: 34902941]
  19. Science. 2011 Jul 22;333(6041):448-50 [PMID: 21680810]
  20. Algorithms Mol Biol. 2023 Jul 31;18(1):10 [PMID: 37525243]
  21. Evolution. 1985 Jul;39(4):783-791 [PMID: 28561359]
  22. Nucleic Acids Res. 2017 Jan 4;45(D1):D482-D490 [PMID: 27899678]
  23. Nat Genet. 2019 Sep;51(9):1330-1338 [PMID: 31477934]
  24. Nat Microbiol. 2020 Nov;5(11):1403-1407 [PMID: 32669681]
  25. Syst Biol. 2020 Sep 1;69(5):1016-1032 [PMID: 31985810]

Grants

  1. F31 AI150163/NIAID NIH HHS
  2. R01 AI162611/NIAID NIH HHS
  3. S10 OD028685/NIH HHS
  4. /Howard Hughes Medical Institute

MeSH Term

Phylogeny
Bayes Theorem
Biological Evolution
Radiopharmaceuticals
Uncertainty

Chemicals

Radiopharmaceuticals

Word Cloud

Created with Highcharts 10.0.0treesdataacyclicparsimoniousmanyusefultreeBayesianlargesetsobject"historydirectedensemblehistorysDAGcanefficientlyextendinguncertaintygraphPhylogeneticsituationsknowjustbestphylogeneticgivensetcollectionhigh-qualitygoaltypicallyaddressedusingtechniqueshowevercurrentmethodsscaleFurthermorerelativelylowsignaloneevenstoreeverygoodindividuallyespeciallyrequiredbifurcatingpaperdevelopnovelcalledsubpartitiongraph"sDAG"shortcompactlyrepresentslabelsegancestralsequencesmappedontointernalnodesbuiltalsotrimmedrepresentmaximallyshowallowsusfindadditionalequallycombinatoriallybeyondusedconstructargue"skeleton"completequantificationRepresentingensemblesevolutionaryhistoriesDirectedMaximumparsimonyinference

Similar Articles

Cited By (4)