Tasfia Zahin, Md Hasin Abrar, Mizanur Rahman Jewel, Tahrina Tasnim, Md Shamsuzzoha Bayzid, Atif Rahman
BACKGROUND: While alignment has traditionally been the primary approach for establishing homology prior to phylogenetic inference, alignment-free methods offer a simplified alternative, particularly beneficial when handling genome-wide data involving long sequences and complex events such as rearrangements. Moreover, alignment-free methods become crucial for data types like genome skims, where assembly is impractical. However, despite these benefits, alignment-free techniques have not gained widespread acceptance since they lack the accuracy of alignment-based techniques, primarily due to their reliance on simplified models of pairwise distance calculation.
RESULTS: Here, we present a likelihood based alignment-free technique for phylogenetic tree construction. We encode the presence or absence of k-mers in genome sequences in a binary matrix, and estimate phylogenetic trees using a maximum likelihood approach. A likelihood based alignment-free method for phylogeny estimation is implemented for the first time in a software named PEAFOWL, which is available at: https://github.com/hasin-abrar/Peafowl-repo . We analyze the performance of our method on seven real datasets and compare the results with the state of the art alignment-free methods.
CONCLUSIONS: Results suggest that our method is competitive with existing alignment-free tools. This indicates that maximum likelihood based alignment-free methods may in the future be refined to outperform alignment-free methods relying on distance calculation as has been the case in the alignment-based setting.