Long-read sequencing for reliably calling the allele in sequence-based typing.

Anne Vatland Kr��vel, Marit A K Hetland, Eva Bernhoff, Anna Steensen Bj��rheim, Markus Andr�� Soma, Iren H L��hr
Author Information
  1. Anne Vatland Kr��vel: Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway.
  2. Marit A K Hetland: Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway.
  3. Eva Bernhoff: Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway.
  4. Anna Steensen Bj��rheim: Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway.
  5. Markus Andr�� Soma: Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway.
  6. Iren H L��hr: Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway.

Abstract

Sequence-based typing (SBT) of is a valuable tool in epidemiological studies and outbreak investigations of Legionnaires' disease. In the SBT scheme, is one of seven genes that determine the sequence type (ST). The genome typically contains two copies of and When they are non-identical it can be challenging to determine the allele, and subsequently the ST, from Illumina short-reads. In our collection of 233 genomes, there were 62 STs, 18 of which carried non-identical copies. Using short-reads, the allele was misassembled or untypeable in several STs. Genomes belonging to ST154 and ST574, which carried allele 7 and allele 15, were assigned an incorrect allele and/or gene copy number when short-read assembled. For other isolates, mainly those carrying non-identical copies, short-read assemblers occasionally failed to resolve the structure of the region, also resulting in untypeability from the short-read data. In this study, we wanted to understand the challenges we observed with calling the 2 allele from short-reads, assess if other short-read methods were able to resolve the -region, and investigate the possibility of using long-reads to obtain the alleles, and thereby perform SBT from long-reads only. We found that the choice of short-read assembler had a major impact on resolving the -region and thus SBT from short-reads, but no method consistently solved the allele. By using Oxford Nanopore Technology (ONT) sequencing together with Trycycler and Medaka for long-read assembly and polishing we were able to resolve the copies and correctly identify the allele, in accordance with Sanger sequencing/EQA results for all tested isolates (n=35). The remaining six genes of the SBT profile could also be determined from the ONT-only reads. The STs called from ONT-only assemblies were also consistent with hybrid-assemblies of Illumina and ONT reads. We therefore propose ONT sequencing as an alternative method to perform SBT to overcome the challenge observed with short-reads. To facilitate this, we have developed ONTmompS (https://github.com/marithetland/ONTmompS), an approach to determine ST from long-read or hybrid assemblies.

Keywords

References

  1. Front Microbiol. 2014 Sep 24;5:501 [PMID: 25309526]
  2. Nat Commun. 2021 Jul 7;12(1):4188 [PMID: 34234121]
  3. Virulence. 2021 Dec;12(1):1122-1144 [PMID: 33843434]
  4. PLoS Comput Biol. 2017 Jun 8;13(6):e1005595 [PMID: 28594827]
  5. Nat Biotechnol. 2019 May;37(5):540-546 [PMID: 30936562]
  6. J Clin Microbiol. 2023 Apr 20;61(4):e0163122 [PMID: 36988494]
  7. Microb Genom. 2023 Feb;9(2): [PMID: 36752781]
  8. J Comput Biol. 2012 May;19(5):455-77 [PMID: 22506599]
  9. Nat Biotechnol. 2011 Jan;29(1):24-6 [PMID: 21221095]
  10. J Mol Biol. 1981 Mar 25;147(1):195-7 [PMID: 7265238]
  11. Int J Syst Evol Microbiol. 2020 Nov;70(11):5607-5612 [PMID: 32701423]
  12. Curr Opin Infect Dis. 2018 Aug;31(4):325-333 [PMID: 29794542]
  13. Genome Biol. 2021 Sep 14;22(1):266 [PMID: 34521459]
  14. Bioinformatics. 2015 Oct 15;31(20):3350-2 [PMID: 26099265]
  15. Int J Mol Sci. 2022 Jan 26;23(3): [PMID: 35163319]
  16. Genome Biol. 2018 Oct 4;19(1):153 [PMID: 30286803]
  17. Microb Genom. 2023 Jan;9(1): [PMID: 36748454]
  18. Appl Environ Microbiol. 2016 May 31;82(12):3582-3590 [PMID: 27060122]
  19. Clin Microbiol Infect. 2017 May;23(5):306-310 [PMID: 28082190]
  20. Nat Methods. 2022 Jul;19(7):823-826 [PMID: 35789207]
  21. J Clin Microbiol. 2005 May;43(5):2047-52 [PMID: 15872220]
  22. Infect Genet Evol. 2016 Sep;43:108-22 [PMID: 27180896]
  23. F1000Res. 2019 Dec 23;8:2138 [PMID: 31984131]
  24. Euro Surveill. 2015 Jul 16;20(28): [PMID: 26212142]
  25. Bioinformatics. 2016 Jul 15;32(14):2103-10 [PMID: 27153593]
  26. Microorganisms. 2022 Feb 28;10(3): [PMID: 35336109]
  27. J Clin Microbiol. 2007 Jun;45(6):1965-8 [PMID: 17409215]
  28. PLoS Comput Biol. 2023 Mar 2;19(3):e1010905 [PMID: 36862631]
  29. Life Sci Alliance. 2022 Mar 2;5(6): [PMID: 35236759]
  30. Nat Comput Sci. 2021 May;1(5):332-336 [PMID: 38217213]
  31. BMC Bioinformatics. 2009 Dec 15;10:421 [PMID: 20003500]
  32. Nucleic Acids Res. 2004 Mar 19;32(5):1792-7 [PMID: 15034147]

MeSH Term

Humans
Legionella pneumophila
Alleles
Sequence Analysis, DNA
Legionnaires' Disease
High-Throughput Nucleotide Sequencing