Introduction

With the advent of next-generation sequencing, traditional bioinformatics tools are challenged by massive raw metagenomic datasets. One of the bottlenecks of metagenomic studies is lack of large-scale and cloud computing suitable data analysis tools. In this paper, we proposed a Spark based tool, called MetaSpark, to recruit metagenomic reads to reference genomes. MetaSpark benefits from the distributed data set (RDD) of Spark, which makes it able to cache data set in memory across cluster nodes and scale well with the datasets. Compared with previous metagenomics recruitment tools, MetaSpark recruited significantly more reads than many programs such as SOAP2, BWA and LAST and increased recruited reads by ∼4% compared with FR-HIT when there were 1 million reads and 0.75 GB references. Different test cases demonstrate MetaSpark's scalability and overall high performance.https://github.com/zhouweiyg/metaspark.bniu@sccas.cn , jingluo@ynu.edu.cn.Supplementary data are available at Bioinformatics online.

Publications

  1. MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.
    Cite this
    Zhou W, Li R, Yuan S, Liu C, Yao S, Luo J, Niu B, 2017-04-01 - Bioinformatics (Oxford, England)

Credits

  1. Wei Zhou
    Developer

    School of Software, Yunnan University, China

  2. Ruilin Li
    Developer

    University of Chinese Academy of Sciences, Beijing 100190, China

  3. Shuo Yuan
    Developer

    School of Software, Yunnan University, China

  4. ChangChun Liu
    Developer

    School of Software, Yunnan University, China

  5. Shaowen Yao
    Developer

    School of Software, Yunnan University, China

  6. Jing Luo
    Developer

    School of Life Sciences and State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, China

  7. Beifang Niu
    Investigator

    University of Chinese Academy of Sciences, Beijing 100190, China

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001958
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesPerl
User InterfaceTerminal Command Line
Download Count0
Country/RegionChina
Submitted ByBeifang Niu