: scalable bacterial pan-genome graph construction.

Nicholas Noll, Marco Molari, Liam P Shaw, Richard A Neher
Author Information
  1. Nicholas Noll: Kavli Institute for Theoretical Physics, University of California, Santa Barbara, CA, USA.
  2. Marco Molari: Swiss Institute of Bioinformatics, Basel, Switzerland.
  3. Liam P Shaw: Department of Biology, University of Oxford, Oxford, UK.
  4. Richard A Neher: Swiss Institute of Bioinformatics, Basel, Switzerland.

Abstract

The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial , the set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present , a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization.

Keywords

References

  1. Bioinformatics. 2015 Oct 15;31(20):3350-2 [PMID: 26099265]
  2. Genome Biol. 2015 Jul 21;16:143 [PMID: 26195261]
  3. Nat Biotechnol. 2017 Nov;35(11):1026-1028 [PMID: 29035372]
  4. PLoS One. 2010 Jun 25;5(6):e11147 [PMID: 20593022]
  5. Genome Res. 2020 Nov;30(11):1667-1679 [PMID: 33055096]
  6. Genome Biol. 2021 Sep 6;22(1):259 [PMID: 34488837]
  7. Elife. 2021 Jan 08;10: [PMID: 33416498]
  8. Nat Rev Microbiol. 2005 Sep;3(9):722-32 [PMID: 16138100]
  9. Nature. 2020 Nov;587(7833):246-251 [PMID: 33177663]
  10. Bioinformatics. 2017 Oct 15;33(20):3181-3187 [PMID: 28200001]
  11. Genome Biol. 2020 Jul 22;21(1):180 [PMID: 32698896]
  12. Nat Genet. 2012 Jan 08;44(2):226-32 [PMID: 22231483]
  13. Bioinformatics. 2004 Dec 12;20(18):3363-9 [PMID: 15256412]
  14. Genome Res. 2009 May;19(5):744-56 [PMID: 19411599]
  15. Genome Res. 2019 Feb;29(2):304-316 [PMID: 30679308]
  16. PLoS Comput Biol. 2020 Mar 19;16(3):e1007732 [PMID: 32191703]
  17. iScience. 2022 May 16;25(6):104413 [PMID: 35663029]
  18. Bioinformatics. 2015 Nov 15;31(22):3691-3 [PMID: 26198102]
  19. F1000Res. 2021 Apr 13;10:286 [PMID: 34113437]
  20. Mol Biol Evol. 1987 Jul;4(4):406-25 [PMID: 3447015]
  21. Genome Res. 2023 Aug 24;: [PMID: 37620118]
  22. BMC Bioinformatics. 2011 Jun 30;12:272 [PMID: 21718539]
  23. Mol Ecol Resour. 2021 Apr;21(3):641-652 [PMID: 33326691]
  24. Annu Rev Genomics Hum Genet. 2020 Aug 31;21:139-162 [PMID: 32453966]
  25. Front Microbiol. 2018 Sep 05;9:2057 [PMID: 30233535]
  26. Genome Biol. 2020 Sep 17;21(1):249 [PMID: 32943081]
  27. Nat Commun. 2017 Oct 10;8(1):841 [PMID: 29018197]
  28. Curr Opin Microbiol. 2008 Oct;11(5):472-7 [PMID: 19086349]
  29. Nat Rev Microbiol. 2022 Apr;20(4):206-218 [PMID: 34773098]
  30. PLoS Genet. 2020 Jun 12;16(6):e1008866 [PMID: 32530914]
  31. Genome Biol. 2021 Sep 14;22(1):267 [PMID: 34521456]
  32. PLoS Comput Biol. 2018 Jan 26;14(1):e1005944 [PMID: 29373581]
  33. Bioinformatics. 2002 Feb;18(2):337-8 [PMID: 11847089]
  34. Bioinformatics. 2011 Feb 1;27(3):334-42 [PMID: 21148543]
  35. Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210234 [PMID: 35989606]
  36. J Mol Evol. 1987;25(4):351-60 [PMID: 3118049]
  37. Nucleic Acids Res. 2018 Jan 9;46(1):e5 [PMID: 29077859]
  38. PLoS Genet. 2009 Jan;5(1):e1000344 [PMID: 19165319]
  39. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  40. Antimicrob Agents Chemother. 2016 May 23;60(6):3767-78 [PMID: 27067320]
  41. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]

Grants

  1. /Wellcome Trust
  2. 220422/Z/20/Z/Wellcome Trust

MeSH Term

Genomics
Genome, Bacterial

Word Cloud

Created with Highcharts 10.0.0genomediversityreferencemicrobialgenesapproachesdynamicsvariationgeneordernumbercompleteassembliespresentgenomesstructuregraphgenomicmicrobescommonlyparameterizedSNPsrelativewell-characterizedarbitraryisolateHowevercontainsfractionsetobservedgivenspeciesReference-basedthusblindaccessorywellwithincopywidespreadusagelong-readsequencinghigh-qualityincreaseddramaticallyadditionpangenomicfocussetsdifferentallowinvestigationsevolutionlatterproblemhowevercomputationallydemandingtoolsavailableshedlightJulia-basedlibrarycommandlineinterfacealigningwholerepresentedpathalongverticesturnencapsulatehomologousmultiplesequencealignmentsresultantdatasuccinctlysummarizespopulation-levelnucleotidestructuralpolymorphismscanexportedseveralcommonformatseitherdownstreamanalysisimmediatevisualization:scalablebacterialpan-genomeconstructiongraphspangenome

Similar Articles

Cited By