SFA-SPA

Introduction

The determination of protein sequences from a metagenomic dataset enables the study of metabolism and functional roles of the organisms that are present in the sampled microbial community. We had previously introduced algorithm and software for the accurate reconstruction of protein sequences from short peptides identified on nucleotide reads in a metagenomic dataset. Here, we present significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. The improved computational efficiency is achieved using a suffix array data structure that allows for fast querying during the assembly process, and a significant redesign of assembly steps that enables multi-threaded execution.The program is available under the GPLv3 license from sourceforge.net/projects/spa-assembler.

Publications

SFA-SPA: a suffix array based short peptide assembler for metagenomic data.
Cite this
Yang Y, Zhong C, Yooseph S, 2015-06-01 - Bioinformatics (Oxford, England)

Credits

Youngik Yang
Developer
Informatics Department, J. Craig Venter Institute, United States of America
Cuncong Zhong
Developer
Informatics Department, J. Craig Venter Institute, United States of America
Shibu Yooseph
Investigator
Informatics Department, J. Craig Venter Institute, United States of America

Community Ratings

Usability	Efficiency	Reliability	Rated By
			0 user
Sign in to rate

Summary

Accession	BT001479
Tool Type	Application
Category
Platforms	Linux/Unix
Technologies	C++, Perl
User Interface	Terminal Command Line
Download Count	0
Country/Region	United States of America
Submitted By	Shibu Yooseph

SFA-SPA

Introduction

Publications

SFA-SPA: a suffix array based short peptide assembler for metagenomic data. Cite this

Credits

Community Ratings

SFA-SPA: a suffix array based short peptide assembler for metagenomic data.
Cite this