Introduction

Existing sequence assembly editors struggle with the volumes of data now readily available from the latest generation of DNA sequencing instruments.We describe the Gap5 software along with the data structures and algorithms used that allow it to be scalable. We demonstrate this with an assembly of 1.1 billion sequence fragments and compare the performance with several other programs. We analyse the memory, CPU, I/O usage and file sizes used by Gap5.Gap5 is part of the Staden Package and is available under an Open Source licence from http://staden.sourceforge.net. It is implemented in C and Tcl/Tk. Currently it works on Unix systems only.

Publications

  1. Gap5--editing the billion fragment sequence assembly.
    Cite this
    Bonfield JK, Whitwham A, 2010-07-01 - Bioinformatics (Oxford, England)
  2. The Staden package, 1998.
    Cite this
    Staden R, Beal KF, Bonfield JK, 2000-01-01 - Methods in molecular biology (Clifton, N.J.)
  3. The Staden sequence analysis package.
    Cite this
    Staden R, 1996-06-01 - Molecular biotechnology

Credits

  1. James K Bonfield
    Developer

    Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, United Kingdom of Great Britain and Northern Ireland

  2. Andrew Whitwham
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001757
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC
User InterfaceTerminal Command Line
Download Count0
Submitted ByAndrew Whitwham