PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders.

Tianyu Xie, Harry Richman, Jiansi Gao, Frederick A Matsen, Cheng Zhang
Author Information
  1. Tianyu Xie: School of Mathematical Sciences, Peking University.
  2. Harry Richman: Computational Biology Program, Fred Hutchinson Cancer Research Center.
  3. Jiansi Gao: Computational Biology Program, Fred Hutchinson Cancer Research Center.
  4. Frederick A Matsen: Computational Biology Program, Fred Hutchinson Cancer Research Center.
  5. Cheng Zhang: School of Mathematical Sciences, Peking University.

Abstract

Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack sufficient resolution. In this paper, we introduce (PhyloVAEs), an unsupervissed learning framework designed for representation learning and generative modeling of tree topologies. Leveraging an efficient encoding mechanism inspired by autoregressive tree topology generation, we develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. PhyloVAE combines this generative model with a collaborative inference model based on learnable topological features, allowing for high-resolution representations of phylogenetic tree samples. Extensive experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies.

References

  1. Sci Adv. 2023 Mar 10;9(10):eadd7437 [PMID: 36897949]
  2. Syst Biol. 2012 May;61(3):539-42 [PMID: 22357727]
  3. Mol Biol Evol. 2016 Oct;33(10):2735-43 [PMID: 27343287]
  4. Syst Biol. 2005 Jun;54(3):471-82 [PMID: 16012112]
  5. Science. 2001 Dec 14;294(5550):2348-51 [PMID: 11743200]
  6. Evolution. 1985 Jul;39(4):783-791 [PMID: 28561359]
  7. Mol Biol Evol. 1997 Jul;14(7):717-24 [PMID: 9214744]
  8. Mol Biol Evol. 2013 May;30(5):1188-95 [PMID: 23418397]
  9. Virus Evol. 2018 Jun 08;4(1):vey016 [PMID: 29942656]
  10. Syst Biol. 2013 Jul;62(4):501-11 [PMID: 23479066]
  11. Syst Biol. 2015 Mar;64(2):205-14 [PMID: 25378436]
  12. J Mol Evol. 1981;17(6):368-76 [PMID: 7288891]
  13. Biometrics. 1999 Mar;55(1):1-12 [PMID: 11318142]
  14. Syst Biol. 2024 Jun 27;: [PMID: 38935520]
  15. BMC Evol Biol. 2007 Nov 08;7:214 [PMID: 17996036]
  16. Syst Biol. 2015 May;64(3):472-91 [PMID: 25631175]
  17. Mol Ecol Resour. 2017 Nov;17(6):1385-1392 [PMID: 28374552]

Grants

  1. R01 AI162611/NIAID NIH HHS

Word Cloud

Created with Highcharts 10.0.0treephylogeneticlearninggenerativegenerationmodelLearningrepresentationsrepresentationtopologiestopologyfastinformativestructuresessentialanalyzingevolutionaryrelationshipsClassicaldistance-basedmethodswidelyusedprojecttreesEuclideanspaceoftensensitivechoicedistancemetricmaylacksufficientresolutionpaperintroducePhyloVAEsunsupervissedframeworkdesignedmodelingLeveragingefficientencodingmechanisminspiredautoregressivedevelopdeeplatent-variablefacilitatesparallelizedPhyloVAEcombinescollaborativeinferencebasedlearnabletopologicalfeaturesallowinghigh-resolutionsamplesExtensiveexperimentsdemonstratePhyloVAE'srobustcapabilitiesPhyloVAE:UnsupervisedPhylogeneticTreesviaVariationalAutoencoders

Similar Articles

Cited By

No available data.