Introduction

The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible.To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers.Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.

Publications

  1. Automated ensemble assembly and validation of microbial genomes.
    Cite this
    Koren S, Treangen TJ, Hill CM, Pop M, Phillippy AM, 2014-05-01 - BMC bioinformatics

Credits

  1. Sergey Koren
    Developer

    National Biodefense Analysis and Countermeasures Center, 110 Thomas Johnson Drive, United States of America

  2. Todd J Treangen
    Developer

  3. Christopher M Hill
    Developer

  4. Mihai Pop
    Developer

  5. Adam M Phillippy
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT000430
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++, Perl
User InterfaceTerminal Command Line
Download Count0
Submitted ByAdam M Phillippy