An alignment-free method for phylogeny estimation using maximum likelihood.

Tasfia Zahin, Md Hasin Abrar, Mizanur Rahman Jewel, Tahrina Tasnim, Md Shamsuzzoha Bayzid, Atif Rahman
Author Information
  1. Tasfia Zahin: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  2. Md Hasin Abrar: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  3. Mizanur Rahman Jewel: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  4. Tahrina Tasnim: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  5. Md Shamsuzzoha Bayzid: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
  6. Atif Rahman: Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh. atif@cse.buet.ac.bd.

Abstract

BACKGROUND: While alignment has traditionally been the primary approach for establishing homology prior to phylogenetic inference, alignment-free methods offer a simplified alternative, particularly beneficial when handling genome-wide data involving long sequences and complex events such as rearrangements. Moreover, alignment-free methods become crucial for data types like genome skims, where assembly is impractical. However, despite these benefits, alignment-free techniques have not gained widespread acceptance since they lack the accuracy of alignment-based techniques, primarily due to their reliance on simplified models of pairwise distance calculation.
RESULTS: Here, we present a likelihood based alignment-free technique for phylogenetic tree construction. We encode the presence or absence of k-mers in genome sequences in a binary matrix, and estimate phylogenetic trees using a maximum likelihood approach. A likelihood based alignment-free method for phylogeny estimation is implemented for the first time in a software named PEAFOWL, which is available at: https://github.com/hasin-abrar/Peafowl-repo . We analyze the performance of our method on seven real datasets and compare the results with the state of the art alignment-free methods.
CONCLUSIONS: Results suggest that our method is competitive with existing alignment-free tools. This indicates that maximum likelihood based alignment-free methods may in the future be refined to outperform alignment-free methods relying on distance calculation as has been the case in the alignment-based setting.

Keywords

References

  1. Genome Biol. 2017 Oct 3;18(1):186 [PMID: 28974235]
  2. Bioinformatics. 2017 Apr 1;33(7):971-979 [PMID: 28073754]
  3. PLoS One. 2014 Jul 25;9(7):e101271 [PMID: 25062443]
  4. PLoS One. 2013 Jun 24;8(6):e67048 [PMID: 23826193]
  5. Mol Biol Evol. 1987 Jul;4(4):406-25 [PMID: 3447015]
  6. Brief Bioinform. 2014 May;15(3):407-18 [PMID: 24291823]
  7. Bioinformatics. 2014 Jul 15;30(14):1991-9 [PMID: 24700317]
  8. BMC Bioinformatics. 2019 Dec 17;20(Suppl 20):638 [PMID: 31842735]
  9. Mol Biol Evol. 2018 Jun 1;35(6):1547-1549 [PMID: 29722887]
  10. Front Plant Sci. 2012 Aug 29;3:192 [PMID: 22952468]
  11. Bioinformatics. 2015 Apr 15;31(8):1169-75 [PMID: 25504847]
  12. Bioinformatics. 2020 Apr 1;36(7):2040-2046 [PMID: 31790149]
  13. Bioinformatics. 2014 May 1;30(9):1312-3 [PMID: 24451623]
  14. Front Genet. 2021 Oct 22;12:766496 [PMID: 34745231]
  15. Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2677-82 [PMID: 19188606]
  16. PLoS One. 2020 Feb 10;15(2):e0228070 [PMID: 32040534]
  17. Proc Natl Acad Sci U S A. 2017 Aug 29;114(35):9391-9396 [PMID: 28808018]
  18. Nucleic Acids Res. 2017 Jul 3;45(W1):W554-W559 [PMID: 28472388]
  19. Genome Biol. 2019 Feb 13;20(1):34 [PMID: 30760303]
  20. Brief Bioinform. 2019 Jul 19;20(4):1222-1237 [PMID: 29220512]
  21. Genomics. 2018 Sep;110(5):263-273 [PMID: 29180261]
  22. Sci Rep. 2016 Jul 01;6:28970 [PMID: 27363362]
  23. Syst Biol. 2007 Apr;56(2):206-21 [PMID: 17454975]
  24. NAR Genom Bioinform. 2019 Oct 30;2(1):lqz013 [PMID: 33575565]
  25. CSH Protoc. 2008 Apr 01;2008:pdb.top32 [PMID: 21356815]
  26. Bioinformatics. 2011 Mar 15;27(6):764-70 [PMID: 21217122]
  27. Nucleic Acids Res. 2013 Apr;41(7):e75 [PMID: 23335788]
  28. Mol Biol Evol. 1995 Sep;12(5):843-9 [PMID: 7476130]
  29. Genome Biol. 2016 Jun 20;17(1):132 [PMID: 27323842]
  30. Genome Biol. 2019 Jul 25;20(1):144 [PMID: 31345254]

MeSH Term

Phylogeny
Likelihood Functions
Software
Algorithms
Sequence Alignment

Word Cloud

Created with Highcharts 10.0.0alignment-freemethodslikelihoodmethodphylogeneticbasedmaximumapproachsimplifieddatasequencesgenometechniquesalignment-baseddistancecalculationusingphylogenyestimationBACKGROUND:alignmenttraditionallyprimaryestablishinghomologypriorinferenceofferalternativeparticularlybeneficialhandlinggenome-wideinvolvinglongcomplexeventsrearrangementsMoreoverbecomecrucialtypeslikeskimsassemblyimpracticalHoweverdespitebenefitsgainedwidespreadacceptancesincelackaccuracyprimarilyduereliancemodelspairwiseRESULTS:presenttechniquetreeconstructionencodepresenceabsencek-mersbinarymatrixestimatetreesimplementedfirsttimesoftwarenamedPEAFOWLavailableat:https://githubcom/hasin-abrar/Peafowl-repoanalyzeperformancesevenrealdatasetscompareresultsstateartCONCLUSIONS:Resultssuggestcompetitiveexistingtoolsindicatesmayfuturerefinedoutperformrelyingcasesettingk-merAlignment-freeLikelihoodPhylogenetics

Similar Articles

Cited By

No available data.