Novel symmetry-preserving neural network model for phylogenetic inference.

Xudong Tang, Leonardo Zepeda-Nuñez, Shengwen Yang, Zelin Zhao, Claudia Solís-Lemus
Author Information
  1. Xudong Tang: Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53706, United States.
  2. Leonardo Zepeda-Nuñez: Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, United States.
  3. Shengwen Yang: Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53706, United States.
  4. Zelin Zhao: Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, United States.
  5. Claudia Solís-Lemus: Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53706, United States. ORCID

Abstract

Motivation: Scientists world-wide are putting together massive efforts to understand how the biodiversity that we see on Earth evolved from single-cell organisms at the origin of life and this diversification process is represented through the Tree of Life. Low sampling rates and high heterogeneity in the rate of evolution across sites and lineages produce a phenomenon denoted "long branch attraction" (LBA) in which long nonsister lineages are estimated to be sisters regardless of their true evolutionary relationship. LBA has been a pervasive problem in phylogenetic inference affecting different types of methodologies from distance-based to likelihood-based.
Results: Here, we present a novel neural network model that outperforms standard phylogenetic methods and other neural network implementations under LBA settings. Furthermore, unlike existing neural network models in phylogenetics, our model naturally accounts for the tree isomorphisms via permutation invariant functions which ultimately result in lower memory and allows the seamless extension to larger trees.
Availability and implementation: We implement our novel theory on an open-source publicly available GitHub repository: https://github.com/crsl4/nn-phylogenetics.

References

  1. Bioinformatics. 2023 Sep 2;39(9): [PMID: 37669126]
  2. BMC Evol Biol. 2010 Apr 12;10:99 [PMID: 20384985]
  3. Cladistics. 2005 Apr;21(2):163-193 [PMID: 34892859]
  4. Mol Biol Evol. 1998 Dec;15(12):1600-11 [PMID: 9866196]
  5. BMC Evol Biol. 2005 Oct 06;5:50 [PMID: 16209710]
  6. J Cheminform. 2019 Nov 21;11(1):70 [PMID: 33430985]
  7. BMC Evol Biol. 2007 Feb 08;7 Suppl 1:S4 [PMID: 17288577]
  8. Bioinformatics. 2011 Feb 15;27(4):592-3 [PMID: 21169378]
  9. Cladistics. 1988 Jun;4(2):105-209 [PMID: 34949076]
  10. Syst Biol. 2020 Mar 1;69(2):221-233 [PMID: 31504938]
  11. J Comput Biol. 2022 Jan;29(1):74-89 [PMID: 34986031]
  12. Bioinformatics. 2014 May 1;30(9):1312-3 [PMID: 24451623]
  13. Mol Biol Evol. 1988 Nov;5(6):729-31 [PMID: 3221794]
  14. Mol Biol Evol. 2020 Dec 16;37(12):3632-3641 [PMID: 32637998]
  15. J Cheminform. 2021 Feb 17;13(1):12 [PMID: 33597034]
  16. Bioinformatics. 2019 Feb 1;35(3):526-528 [PMID: 30016406]
  17. Mol Biol Evol. 1987 Jul;4(4):406-25 [PMID: 3447015]
  18. Mol Biol Evol. 2007 Aug;24(8):1586-91 [PMID: 17483113]
  19. Bioinformatics. 2001 Aug;17(8):754-5 [PMID: 11524383]
  20. Mol Biol Evol. 2004 Jun;21(6):1095-109 [PMID: 15014145]
  21. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  22. Bioinformatics. 2003 Aug 12;19(12):1572-4 [PMID: 12912839]
  23. Mol Biol Evol. 2015 Jan;32(1):268-74 [PMID: 25371430]
  24. Mol Phylogenet Evol. 2004 Nov;33(2):440-51 [PMID: 15336677]
  25. Syst Biol. 2005 Oct;54(5):731-42 [PMID: 16243761]
  26. Mol Biol Evol. 2020 May 1;37(5):1495-1507 [PMID: 31868908]

Word Cloud

Created with Highcharts 10.0.0neuralnetworkLBAphylogeneticmodellineagesinferencenovelMotivation:Scientistsworld-wideputtingtogethermassiveeffortsunderstandbiodiversityseeEarthevolvedsingle-cellorganismsoriginlifediversificationprocessrepresentedTreeLifeLowsamplingrateshighheterogeneityrateevolutionacrosssitesproducephenomenondenoted"longbranchattraction"longnonsisterestimatedsistersregardlesstrueevolutionaryrelationshippervasiveproblemaffectingdifferenttypesmethodologiesdistance-basedlikelihood-basedResults:presentoutperformsstandardmethodsimplementationssettingsFurthermoreunlikeexistingmodelsphylogeneticsnaturallyaccountstreeisomorphismsviapermutationinvariantfunctionsultimatelyresultlowermemoryallowsseamlessextensionlargertreesAvailabilityimplementation:implementtheoryopen-sourcepubliclyavailableGitHubrepository:https://githubcom/crsl4/nn-phylogeneticsNovelsymmetry-preserving

Similar Articles

Cited By