Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.

Michael B Hall, Ryan R Wick, Louise M Judd, An N Nguyen, Eike J Steinig, Ouli Xie, Mark Davies, Torsten Seemann, Timothy P Stinear, Lachlan Coin
Author Information
  1. Michael B Hall: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID
  2. Ryan R Wick: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID
  3. Louise M Judd: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID
  4. An N Nguyen: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID
  5. Eike J Steinig: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
  6. Ouli Xie: Department of Infectious Diseases, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID
  7. Mark Davies: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
  8. Torsten Seemann: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
  9. Timothy P Stinear: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID
  10. Lachlan Coin: Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia. ORCID

Abstract

Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance detection. This study presents a comprehensive benchmarking of variant calling accuracy in bacterial genomes using Oxford Nanopore Technologies (ONT) sequencing data. We evaluated three ONT basecalling models and both simplex (single-strand) and duplex (dual-strand) read types across 14 diverse bacterial species. Our findings reveal that deep learning-based variant callers, particularly Clair3 and DeepVariant, significantly outperform traditional methods and even exceed the accuracy of Illumina sequencing, especially when applied to ONT's super-high accuracy model. ONT's superior performance is attributed to its ability to overcome Illumina's errors, which often arise from difficulties in aligning reads in repetitive and variant-dense genomic regions. Moreover, the use of high-performing variant callers with ONT's super-high accuracy data mitigates ONT's traditional errors in homopolymers. We also investigated the impact of read depth on variant calling, demonstrating that 10�� depth of ONT super-accuracy data can achieve precision and recall comparable to, or better than, full-depth Illumina sequencing. These results underscore the potential of ONT sequencing, combined with advanced variant calling algorithms, to replace traditional short-read sequencing methods in bacterial genomics, particularly in resource-limited settings.

Keywords

References

  1. Genome Biol. 2021 Sep 14;22(1):266 [PMID: 34521459]
  2. mBio. 2024 Jan 16;15(1):e0269623 [PMID: 38085031]
  3. Genome Biol. 2023 Oct 5;24(1):221 [PMID: 37798733]
  4. Nat Rev Genet. 2022 Sep;23(9):522-523 [PMID: 35577990]
  5. Nat Comput Sci. 2022 Dec;2(12):797-803 [PMID: 38177392]
  6. Elife. 2024 Oct 10;13: [PMID: 39388235]
  7. NAR Genom Bioinform. 2021 Mar 27;3(1):lqab019 [PMID: 33817639]
  8. Microb Genom. 2023 Jan;9(1): [PMID: 36748454]
  9. Cell Genom. 2022 May 11;2(5): [PMID: 35720974]
  10. Bioinformatics. 2014 Oct 15;30(20):2843-51 [PMID: 24974202]
  11. PLoS One. 2021 Oct 1;16(10):e0257521 [PMID: 34597327]
  12. Brief Bioinform. 2021 May 20;22(3): [PMID: 32698196]
  13. J Clin Microbiol. 2020 Feb 24;58(3): [PMID: 31852766]
  14. Nat Methods. 2022 Jul;19(7):823-826 [PMID: 35789207]
  15. Nat Commun. 2019 Oct 11;10(1):4660 [PMID: 31604920]
  16. Nat Biotechnol. 2018 Nov;36(10):983-987 [PMID: 30247488]
  17. Mol Biol Evol. 2014 May;31(5):1077-88 [PMID: 24600054]
  18. Gigascience. 2020 Feb 1;9(2): [PMID: 32025702]
  19. Bioinformatics. 2018 Sep 1;34(17):i884-i890 [PMID: 30423086]
  20. Genome Biol. 2021 Sep 6;22(1):261 [PMID: 34488830]
  21. Gigascience. 2021 Feb 16;10(2): [PMID: 33590861]
  22. Genome Med. 2016 Sep 29;8(1):97 [PMID: 27683027]
  23. Brief Bioinform. 2021 Sep 2;22(5): [PMID: 33483726]
  24. Mol Biol Evol. 2019 Mar 1;36(3):587-603 [PMID: 30690464]
  25. PLoS One. 2016 Oct 5;11(10):e0163962 [PMID: 27706213]
  26. PLoS Comput Biol. 2023 Mar 2;19(3):e1010905 [PMID: 36862631]
  27. Emerg Infect Dis. 2016 Feb;22(2):331-4 [PMID: 26812583]
  28. J Clin Microbiol. 2023 Mar 23;61(3):e0157822 [PMID: 36815861]
  29. PLoS Comput Biol. 2022 Jan 24;18(1):e1009802 [PMID: 35073327]
  30. Nat Commun. 2023 Dec 9;14(1):8149 [PMID: 38071244]
  31. Nat Rev Genet. 2019 Jun;20(6):341-355 [PMID: 30918369]
  32. Genome Res. 2015 Jul;25(7):1043-55 [PMID: 25977477]
  33. PLoS Comput Biol. 2018 Jan 26;14(1):e1005944 [PMID: 29373581]
  34. Nat Biotechnol. 2011 Jan;29(1):24-6 [PMID: 21221095]
  35. Nat Rev Genet. 2023 Jul;24(7):464-483 [PMID: 37059810]
  36. Lancet Microbe. 2022 Apr;3(4):e265-e273 [PMID: 35373160]
  37. Microb Genom. 2021 Aug;7(8): [PMID: 34346861]
  38. Nat Commun. 2024 Jan 16;15(1):544 [PMID: 38228587]
  39. Nat Methods. 2021 May;18(5):491-498 [PMID: 33820988]
  40. Nat Rev Genet. 2011 Nov 29;13(1):36-46 [PMID: 22124482]
  41. PLoS Comput Biol. 2020 Jun 26;16(6):e1007981 [PMID: 32589667]
  42. Lancet Microbe. 2021 Nov;2(11):e575-e583 [PMID: 35544081]
  43. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  44. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  45. Genome Biol. 2019 Nov 14;20(1):237 [PMID: 31727126]
  46. Nat Methods. 2023 Nov;20(11):1661-1665 [PMID: 37735570]
  47. Microb Genom. 2024 May;10(5): [PMID: 38713194]
  48. Nat Commun. 2023 Jan 4;14(1):60 [PMID: 36599823]
  49. Microb Genom. 2024 Jun;10(6): [PMID: 38833287]
  50. Nat Methods. 2018 Aug;15(8):595-597 [PMID: 30013044]

Grants

  1. FSPGN000045/National Health and Medical Research Council

MeSH Term

Deep Learning
Benchmarking
Genome, Bacterial
Nanopore Sequencing
Bacteria
Nanopores
High-Throughput Nucleotide Sequencing
Genomics
Genetic Variation

Word Cloud

Created with Highcharts 10.0.0variantcallingbacterialsequencingaccuracyONTdataONT'sdeepcallerstraditionalSgenomicsdiseasereadparticularlymethodsIlluminasuper-higherrorsdepthlearningnanoporebiologyVariantfundamentalunderpinningidentificationtransmissionclustersconstructionphylogenetictreesantimicrobialresistancedetectionstudypresentscomprehensivebenchmarkinggenomesusingOxfordNanoporeTechnologiesevaluatedthreebasecallingmodelssimplexsingle-strandduplexdual-strandtypesacross14diversespeciesfindingsreveallearning-basedClair3DeepVariantsignificantlyoutperformevenexceedespeciallyappliedmodelsuperiorperformanceattributedabilityovercomeIllumina'softenarisedifficultiesaligningreadsrepetitivevariant-densegenomicregionsMoreoverusehigh-performingmitigateshomopolymersalsoinvestigatedimpactdemonstrating10��super-accuracycanachieveprecisionrecallcomparablebetterfull-depthresultsunderscorepotentialcombinedadvancedalgorithmsreplaceshort-readresource-limitedsettingsBenchmarkingrevealssuperioritysequenceCjejuniEcoliKpneumoniaeLmonocytogenesMtuberculosisaureusentericapyogenesbacteriabenchmarkbioinformaticscomputationalinfectiousmicrobiologysystems

Similar Articles

Cited By (6)