Introduction

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/.

Publications

  1. Compression of FASTQ and SAM format sequencing data.
    Cite this
    Bonfield JK, Mahoney MV, 2013-01-01 - PloS one

Credits

  1. James K Bonfield
    Developer

    Wellcome Trust Sanger Institute, Cambridge, United Kingdom of Great Britain and Northern Ireland

  2. Matthew V Mahoney
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT002355
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC
User InterfaceTerminal Command Line
Download Count0
Submitted ByMatthew V Mahoney