Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2.

Dalang Yu, Xiao Yang, Bixia Tang, Yi-Hsuan Pan, Jianing Yang, Guangya Duan, Junwei Zhu, Zi-Qian Hao, Hailong Mu, Long Dai, Wangjie Hu, Mochen Zhang, Ying Cui, Tong Jin, Cui-Ping Li, Lina Ma, Language translation team, Xiao Su, Guoqing Zhang, Wenming Zhao, Haipeng Li
Author Information
  1. Dalang Yu: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  2. Xiao Yang: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  3. Bixia Tang: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  4. Yi-Hsuan Pan: Key Laboratory of Brain Functional Genomics of Ministry of Education, School of Life Science, East China Normal University, Shanghai 200062, China.
  5. Jianing Yang: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  6. Guangya Duan: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  7. Junwei Zhu: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  8. Zi-Qian Hao: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  9. Hailong Mu: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  10. Long Dai: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  11. Wangjie Hu: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  12. Mochen Zhang: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  13. Ying Cui: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  14. Tong Jin: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  15. Cui-Ping Li: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  16. Lina Ma: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  17. Xiao Su: Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai 200031, China.
  18. Guoqing Zhang: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China.
  19. Wenming Zhao: National Genomics Data Center, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences, Beijing 100101, China.
  20. Haipeng Li: National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China. ORCID

Abstract

Genomic epidemiology is important to study the COVID-19 pandemic, and more than two million severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) genomic sequences were deposited into public databases. However, the exponential increase of sequences invokes unprecedented bioinformatic challenges. Here, we present the Coronavirus GenBrowser (CGB) based on a highly efficient analysis framework and a node-picking rendering strategy. In total, 1,002,739 high-quality genomic sequences with the transmission-related metadata were analyzed and visualized. The size of the core data file is only 12.20 MB, highly efficient for clean data sharing. Quick visualization modules and rich interactive operations are provided to explore the annotated SARS-CoV-2 evolutionary tree. CGB binary nomenclature is proposed to name each internal lineage. The pre-analyzed data can be filtered out according to the user-defined criteria to explore the transmission of SARS-CoV-2. Different evolutionary analyses can also be easily performed, such as the detection of accelerated evolution and ongoing positive selection. Moreover, the 75 genomic spots conserved in SARS-CoV-2 but non-conserved in other coronaviruses were identified, which may indicate the functional elements specifically important for SARS-CoV-2. The CGB was written in Java and JavaScript. It not only enables users who have no programming skills to analyze millions of genomic sequences, but also offers a panoramic vision of the transmission and evolution of SARS-CoV-2.

Keywords

References

  1. Cell Res. 2020 May;30(5):408-420 [PMID: 32238901]
  2. Cell. 2020 Aug 20;182(4):812-827.e19 [PMID: 32697968]
  3. Genomics Proteomics Bioinformatics. 2021 Aug;19(4):584-589 [PMID: 34175476]
  4. Science. 2020 Jul 31;369(6503):582-587 [PMID: 32513865]
  5. Nucleic Acids Res. 2024 Jan 5;52(D1):D18-D32 [PMID: 38018256]
  6. Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17 [PMID: 33095870]
  7. Natl Sci Rev. 2020 Jun;7(6):1012-1023 [PMID: 34676127]
  8. Glob Chall. 2017 Jan 10;1(1):33-46 [PMID: 31565258]
  9. Zool Res. 2020 May 18;41(3):247-257 [PMID: 32351056]
  10. Nature. 2020 Mar;579(7798):265-269 [PMID: 32015508]
  11. Yi Chuan. 2020 Aug 20;42(8):799-809 [PMID: 32952115]
  12. Lancet. 2003 May 24;361(9371):1779-85 [PMID: 12781537]
  13. Science. 2004 Mar 12;303(5664):1666-9 [PMID: 14752165]
  14. Virus Evol. 2018 Jan 08;4(1):vex042 [PMID: 29340210]
  15. PLoS Comput Biol. 2019 Apr 8;15(4):e1006650 [PMID: 30958812]
  16. Zool Res. 2020 Nov 18;41(6):705-708 [PMID: 33045776]
  17. Yi Chuan. 2020 Feb 20;42(2):212-221 [PMID: 32102777]
  18. Nat Genet. 2020 Oct;52(10):991-998 [PMID: 32908258]
  19. Natl Sci Rev. 2019 Oct;6(5):867-869 [PMID: 34691944]
  20. Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9241-9243 [PMID: 32269081]
  21. Nature. 2021 Jul;595(7869):707-712 [PMID: 34098568]
  22. Bioinformatics. 2018 Dec 1;34(23):4121-4123 [PMID: 29790939]
  23. Science. 2009 May 22;324(5930):987 [PMID: 19460968]
  24. Euro Surveill. 2017 Mar 30;22(13): [PMID: 28382917]
  25. J Mol Evol. 1971;1(1):18-25 [PMID: 4377445]
  26. Nat Genet. 2020 Oct;52(10):986-991 [PMID: 32908257]

Grants

  1. 2020YFC084-7000/National Key Research and Development Project of China
  2. XDB38030100/Chinese Academy of Sciences
  3. 2017SHZDZX01/Shanghai Municipal Science and Technology Major Project
  4. JBGSRWBD-SINH-2021-10/Shanghai Institute of Nutrition and Health

MeSH Term

COVID-19
Computational Biology
DNA Mutational Analysis
Databases, Genetic
Genome, Viral
Genomics
Humans
Molecular Epidemiology
Molecular Sequence Annotation
Mutation
Public Health Surveillance
SARS-CoV-2
Software
Web Browser

Links to CNCB-NGDC Resources

Database Commons: DBC007545 (CGB)

Word Cloud

Created with Highcharts 10.0.0SARS-CoV-2genomicsequencestransmissionevolutionGenBrowserCGBdataepidemiologyimportantcoronavirusCoronavirushighlyefficientexploreevolutionarycanalsoGenomicstudyCOVID-19pandemictwomillionsevereacuterespiratorysyndrome2depositedpublicdatabasesHoweverexponentialincreaseinvokesunprecedentedbioinformaticchallengespresentbasedanalysisframeworknode-pickingrenderingstrategytotal1002739high-qualitytransmission-relatedmetadataanalyzedvisualizedsizecorefile1220MBcleansharingQuickvisualizationmodulesrichinteractiveoperationsprovidedannotatedtreebinarynomenclatureproposednameinternallineagepre-analyzedfilteredaccordinguser-definedcriteriaDifferentanalyseseasilyperformeddetectionacceleratedongoingpositiveselectionMoreover75spotsconservednon-conservedcoronavirusesidentifiedmayindicatefunctionalelementsspecificallywrittenJavaJavaScriptenablesusersprogrammingskillsanalyzemillionsofferspanoramicvisionmonitoring

Similar Articles

Cited By (16)