CARE: context-aware sequencing read error correction.

Felix Kallenborn, Andreas Hildebrandt, Bertil Schmidt

Author Information

Felix Kallenborn: Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany. ORCID
Andreas Hildebrandt: Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany.
Bertil Schmidt: Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany.

PMID: 32818262 DOI: 10.1093/bioinformatics/btaa738

MOTIVATION: Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes.
RESULTS: We present CARE-an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration.
AVAILABILITYAND IMPLEMENTATION: CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Algorithms

High-Throughput Nucleotide Sequencing

Humans

Sequence Alignment

Sequence Analysis, DNA

Software

OpenLB
Open Library of Bioscience

Abstract

MeSH Term

Word Cloud

Similar Articles

Cited By

Research & Resources

Featured

Alliance & Collaboration

Conference & Outreach

About

OpenLB Open Library of Bioscience