Introduction

We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.

Publications

  1. Indexing Arbitrary-Length k-Mers in Sequencing Reads.
    Cite this
    Kowalski T, Grabowski S, Deorowicz S, 2015-01-01 - PloS one

Credits

  1. Tomasz Kowalski
    Developer

    Institute of Applied Computer Science, Lodz University of Technology, Spain

  2. Szymon Grabowski
    Developer

    Institute of Applied Computer Science, Lodz University of Technology, Spain

  3. Sebastian Deorowicz
    Investigator

    Institute of Informatics, Silesian University of Technology, Poland

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001178
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Country/RegionPoland
Submitted BySebastian Deorowicz