Introduction

Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Taxonomic analysis of microbial communities, a process referred to as binning, is one of the most challenging tasks when analyzing metagenomic reads data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species and the limitations due to short read lengths and sequencing errors.MetaProb is a novel assembly-assisted tool for unsupervised metagenomic binning. The novelty of MetaProb derives from solving a few important problems: how to divide reads into groups of independent reads, so that k-mer frequencies are not overestimated; how to convert k-mer counts into probabilistic sequence signatures, that will correct for variable distribution of k-mers, and for unbalanced groups of reads, in order to produce better estimates of the underlying genome statistic; how to estimate the number of species in a dataset. We show that MetaProb is more accurate and efficient than other state-of-the-art tools in binning both short reads datasets (F-measure 0.87) and long reads datasets (F-measure 0.97) for various abundance ratios. Also, the estimation of the number of species is more accurate than MetaCluster. On a real human stool dataset MetaProb identifies the most predominant species, in line with previous human gut studies.https://bitbucket.org/samu661/metaprobcinzia.pizzi@dei.unipd.it or comin@dei.unipd.itSupplementary data are available at Bioinformatics online.

Publications

  1. MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.
    Cite this
    Girotto S, Pizzi C, Comin M, 2016-09-01 - Bioinformatics (Oxford, England)

Credits

  1. Samuele Girotto
    Developer

    Department of Information Engineering, University of Padova, Italy

  2. Cinzia Pizzi
    Developer

    Department of Information Engineering, University of Padova, Italy

  3. Matteo Comin
    Investigator

    Department of Information Engineering, University of Padova, Italy

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT003334
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Country/RegionItaly
Submitted ByMatteo Comin