CovidPhy: A tool for phylogeographic analysis of SARS-CoV-2 variation.

Xabier Bello, Jacobo Pardo-Seco, Alberto Gómez-Carballa, Hansi Weissensteiner, Federico Martinón-Torres, Antonio Salas
Author Information
  1. Xabier Bello: Genetics, Vaccines and Pediatric Infectious Diseases Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago (IDIS) and Universidad de Santiago de Compostela (USC), Galicia, Spain; Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain.
  2. Jacobo Pardo-Seco: Genetics, Vaccines and Pediatric Infectious Diseases Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago (IDIS) and Universidad de Santiago de Compostela (USC), Galicia, Spain; Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain.
  3. Alberto Gómez-Carballa: Genetics, Vaccines and Pediatric Infectious Diseases Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago (IDIS) and Universidad de Santiago de Compostela (USC), Galicia, Spain; Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain.
  4. Hansi Weissensteiner: Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, 6020, Innsbruck, Austria.
  5. Federico Martinón-Torres: Genetics, Vaccines and Pediatric Infectious Diseases Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago (IDIS) and Universidad de Santiago de Compostela (USC), Galicia, Spain; Translational Pediatrics and Infectious Diseases, Department of Pediatrics, Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain.
  6. Antonio Salas: Genetics, Vaccines and Pediatric Infectious Diseases Research Group (GENVIP), Instituto de Investigación Sanitaria de Santiago (IDIS) and Universidad de Santiago de Compostela (USC), Galicia, Spain; Unidade de Xenética, Instituto de Ciencias Forenses (INCIFOR), Facultade de Medicina, Universidade de Santiago de Compostela, and GenPoB Research Group, Instituto de Investigación Sanitaria (IDIS), Hospital Clínico Universitario de Santiago (SERGAS), Galicia, Spain. Electronic address: antonio.salas@usc.es.

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the pathogen responsible for the coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV-2 genomes have been sequenced massively and worldwide and are now available in different public genome repositories. There is much interest in generating bioinformatic tools capable to analyze and interpret SARS-CoV-2 variation. We have designed CovidPhy (http://covidphy.eu), a web interface that can process SARS-CoV-2 genome sequences in plain fasta text format or provided through identity codes from the Global Initiative on Sharing Avian Influenza Data (GISAID) or GenBank. CovidPhy aggregates information available on the large GISAID database (>1.49 M genomes). Sequences are first aligned against the reference sequence and the interface provides different sources of information, including automatic classification of genomes into a pre-computed phylogeny and phylogeographic information, haplogroup/lineage frequencies, and sequencing variation, indicating also if the genome contains known variants of concern (VOC). Additionally, CovidPhy allows searching for variants and haplotypes introduced by the user and includes a list of genomes that are good candidates for being responsible for large outbreaks worldwide, most likely mediated by important superspreading events, indicating their possible geographic epicenters and their relative impact as recorded in the GISAID database.

Keywords

References

  1. Euro Surveill. 2020 Jun;25(22): [PMID: 32524946]
  2. Nature. 2021 Jan;589(7842):339 [PMID: 33452513]
  3. Mol Biol Evol. 2013 Apr;30(4):772-80 [PMID: 23329690]
  4. Nat Med. 2020 Nov;26(11):1714-1719 [PMID: 32943787]
  5. J Med Internet Res. 2020 Oct 2;22(10):e22299 [PMID: 32931441]
  6. Zool Res. 2020 Nov 18;41(6):605-620 [PMID: 32935498]
  7. Science. 2021 Feb 5;371(6529): [PMID: 33303686]
  8. BMC Bioinformatics. 2019 Jan 23;20(1):48 [PMID: 30674273]
  9. Zool Res. 2020 Nov 18;41(6):705-708 [PMID: 33045776]
  10. BMC Bioinformatics. 2010 Sep 07;11:451 [PMID: 20822531]
  11. Lancet. 2020 Mar 14;395(10227):e47 [PMID: 32113505]
  12. Genome Res. 2020 Oct;30(10):1434-1448 [PMID: 32878977]
  13. Science. 2021 Apr 9;372(6538): [PMID: 33658326]
  14. Bioinformatics. 2018 Dec 1;34(23):4121-4123 [PMID: 29790939]
  15. Euro Surveill. 2017 Mar 30;22(13): [PMID: 28382917]
  16. PLoS Biol. 2020 Nov 12;18(11):e3000897 [PMID: 33180773]
  17. Nat Microbiol. 2021 Mar;6(3):415 [PMID: 33514928]
  18. Zool Res. 2021 Jan 18;42(1):87-93 [PMID: 33410308]
  19. Nucleic Acids Res. 2016 Jul 8;44(W1):W58-63 [PMID: 27084951]
  20. Nat Microbiol. 2020 Nov;5(11):1403-1407 [PMID: 32669681]
  21. Nature. 2020 Mar;579(7798):265-269 [PMID: 32015508]

MeSH Term

COVID-19
Databases, Genetic
Genome, Viral
Humans
Internet
Pandemics
Phylogeny
Phylogeography
SARS-CoV-2
Software

Word Cloud

Created with Highcharts 10.0.0SARS-CoV-2genomesgenomevariationCovidPhyGISAIDinformationcoronavirusresponsibleCOVID-19worldwideavailabledifferentinterfacelargedatabasephylogeographicindicatingvariantsconcerneventssevereacuterespiratorysyndrome2pathogendisease2019pandemicsequencedmassivelynowpublicrepositoriesmuchinterestgeneratingbioinformatictoolscapableanalyzeinterpretdesignedhttp://covidphyeuwebcanprocesssequencesplainfastatextformatprovidedidentitycodesGlobalInitiativeSharingAvianInfluenzaDataGenBankaggregates>149 MSequencesfirstalignedreferencesequenceprovidessourcesincludingautomaticclassificationpre-computedphylogenyhaplogroup/lineagefrequenciessequencingalsocontainsknownVOCAdditionallyallowssearchinghaplotypesintroduceduserincludeslistgoodcandidatesoutbreakslikelymediatedimportantsuperspreadingpossiblegeographicepicentersrelativeimpactrecordedCovidPhy:toolanalysisPhylogenyRNASuperspreadingVariants

Similar Articles

Cited By (3)