Introduction

Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.

Publications

  1. SAMBLASTER: fast duplicate marking and structural variant read extraction.
    Cite this
    Faust GG, Hall IM, 2014-09-01 - Bioinformatics (Oxford, England)

Credits

  1. Gregory G Faust
    Developer

    Department of Biochemistry and Molecular Genetics and Center for Public Health Genomics, University of Virginia, United States of America

  2. Ira M Hall
    Investigator

    Department of Biochemistry and Molecular Genetics and Center for Public Health Genomics, University of Virginia, United States of America

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006731
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Country/RegionUnited States of America
Submitted ByIra M Hall