Introduction

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. "provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month". The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.

Publications

  1. StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics.
    Cite this
    Ramirez-Gonzalez RH, Leggett RM, Waite D, Thanki A, Drou N, Caccamo M, Davey R, 2013-01-01 - F1000Research

Credits

  1. Ricardo H Ramirez-Gonzalez
    Developer

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

  2. Richard M Leggett
    Developer

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

  3. Darren Waite
    Developer

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

  4. Anil Thanki
    Developer

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

  5. Nizar Drou
    Developer

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

  6. Mario Caccamo
    Developer

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

  7. Robert Davey
    Investigator

    The Genome Analysis Centre, Norwich Research Park, United Kingdom of Great Britain and Northern Ireland

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001541
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesPerl
User InterfaceTerminal Command Line
Download Count0
Submitted ByRobert Davey