Starcode

Introduction

The increasing throughput of sequencing technologies offers new applications and challenges for computational biology. In many of those applications, sequencing errors need to be corrected. This is particularly important when sequencing reads from an unknown reference such as random DNA barcodes. In this case, error correction can be done by performing a pairwise comparison of all the barcodes, which is a computationally complex problem.Here, we address this challenge and describe an exact algorithm to determine which pairs of sequences lie within a given Levenshtein distance. For error correction or redundancy reduction purposes, matched pairs are then merged into clusters of similar sequences. The efficiency of starcode is attributable to the poucet search, a novel implementation of the Needleman-Wunsch algorithm performed on the nodes of a trie. On the task of matching random barcodes, starcode outperforms sequence clustering algorithms in both speed and precision.The C source code is available at http://github.com/gui11aume/starcode.

Publications

Starcode: sequence clustering based on all-pairs search.
Cite this
Zorita E, Cuscó P, Filion GJ, 2015-06-01 - Bioinformatics (Oxford, England)

Credits

Eduard Zorita
Developer
Genome Architecture, Gene Regulation, Spain
Pol Cuscó
Developer
Genome Architecture, Gene Regulation, Spain
Guillaume J Filion
Investigator
Genome Architecture, Gene Regulation, Spain

Community Ratings

Usability	Efficiency	Reliability	Rated By
			0 user
Sign in to rate

Summary

Accession	BT005905
Tool Type	Application
Category
Platforms	Linux/Unix
Technologies	C
User Interface	Terminal Command Line
Download Count	0
Country/Region	Spain
Submitted By	Guillaume J Filion

Starcode

Introduction

Publications

Starcode: sequence clustering based on all-pairs search. Cite this

Credits

Community Ratings

Starcode: sequence clustering based on all-pairs search.
Cite this