Introduction

A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.KCMBT is freely available on GitHub: (https://github.com/abdullah009/kcmbt_mt).rajasek@engr.uconn.eduSupplementary data are available at Bioinformatics online.

Publications

  1. KCMBT: a k-mer Counter based on Multiple Burst Trees.
    Cite this
    Mamun AA, Pal S, Rajasekaran S, 2016-09-01 - Bioinformatics (Oxford, England)

Credits

  1. Abdullah-Al Mamun
    Developer

    Department of Computer Science and Engineering, University of Connecticut, United States of America

  2. Soumitra Pal
    Developer

    Department of Computer Science and Engineering, University of Connecticut, United States of America

  3. Sanguthevar Rajasekaran
    Investigator

    Department of Computer Science and Engineering, University of Connecticut, United States of America

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006292
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Country/RegionUnited States of America
Submitted BySanguthevar Rajasekaran