Introduction

Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites.We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.GMcloser and an accompanying tool (GMvalue) for evaluating the assembly and correcting misassemblies except SNPs and short indels in the assembly are available at https://sourceforge.net/projects/gmcloser/.shunichi.kosugi@riken.jp.Supplementary data are available at Bioinformatics online.

Publications

  1. GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.
    Cite this
    Kosugi S, Hirakawa H, Tabata S, 2015-12-01 - Bioinformatics (Oxford, England)

Credits

  1. Shunichi Kosugi
    Developer

    Department of Technology Development, Kazusa DNA Research Institute, Japan

  2. Hideki Hirakawa
    Developer

    Department of Technology Development, Kazusa DNA Research Institute, Japan

  3. Satoshi Tabata
    Investigator

    Department of Technology Development, Kazusa DNA Research Institute, Japan

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006849
Tool TypeApplication
Category
PlatformsLinux/Unix
Technologies
User InterfaceTerminal Command Line
Download Count0
Country/RegionJapan
Submitted BySatoshi Tabata