Introduction

BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FINDINGS: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. CONCLUSIONS: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

Publications

  1. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.
    Cite this
    Waldmann J, Gerken J, Hankeln W, Schweer T, Glöckner FO, 2014-01-01 - BMC research notes

Credits

  1. Jost Waldmann
    Developer

  2. Jan Gerken
    Developer

  3. Wolfgang Hankeln
    Developer

  4. Timmy Schweer
    Developer

  5. Frank Oliver Glöckner
    Investigator

    Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Germany

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT007009
Tool TypeApplication
Category
PlatformsLinux/Unix
Technologies
User InterfaceTerminal Command Line
Download Count0
Country/RegionGermany
Submitted ByFrank Oliver Glöckner