PgRC2: engineering the compression of sequencing reads.

Tomasz M Kowalski, Szymon Grabowski
Author Information
  1. Tomasz M Kowalski: Institute of Applied Computer Science, Lodz University of Technology, Lodz 90-924, Poland. ORCID
  2. Szymon Grabowski: Institute of Applied Computer Science, Lodz University of Technology, Lodz 90-924, Poland. ORCID

Abstract

SUMMARY: The FASTQ format remains at the heart of high-throughput sequencing. Despite advances in specialized FASTQ compressors, they are still imperfect in terms of practical performance tradeoffs. We present a multi-threaded version of Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of approximating the shortest common superstring over high-quality reads. Redundancy in the obtained string is efficiently removed by using a compact temporary representation. The current version, v2.0, preserves the compression ratio of the previous one, reducing the compression (resp. decompression) time by a factor of 8-9 (resp. 2-2.5) on a 14-core/28-thread machine.
AVAILABILITY AND IMPLEMENTATION: PgRC���2.0 can be downloaded from https://github.com/kowallus/PgRC and https://zenodo.org/records/14882486 (10.5281/zenodo.14882486).

Grants

  1. /Lodz University of Technology
  2. 501/12-24-1-5418/Faculty of Electrical, Electronic, Computer and Control Engineering

MeSH Term

Algorithms
High-Throughput Nucleotide Sequencing
Sequence Analysis, DNA
Data Compression
Software

Word Cloud

Created with Highcharts 10.0.0compressionFASTQsequencingversionreads0respSUMMARY:formatremainshearthigh-throughputDespiteadvancesspecializedcompressorsstillimperfecttermspracticalperformancetradeoffspresentmulti-threadedPseudogenome-basedReadCompressorPgRCin-memoryalgorithmcompressingDNAstreambasedideaapproximatingshortestcommonsuperstringhigh-qualityRedundancyobtainedstringefficientlyremovedusingcompacttemporaryrepresentationcurrentv2preservesratiopreviousonereducingdecompressiontimefactor8-92-2514-core/28-threadmachineAVAILABILITYANDIMPLEMENTATION:PgRC���2candownloadedhttps://githubcom/kowallus/PgRChttps://zenodoorg/records/14882486105281/zenodo14882486PgRC2:engineering

Similar Articles

Cited By

No available data.