Compositional Variability and Mutation Spectra of Monophyletic SARS-CoV-2 Clades.

Xufei Teng, Qianpeng Li, Zhao Li, Yuansheng Zhang, Guangyi Niu, Jingfa Xiao, Jun Yu, Zhang Zhang, Shuhui Song
Author Information
  1. Xufei Teng: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  2. Qianpeng Li: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  3. Zhao Li: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  4. Yuansheng Zhang: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  5. Guangyi Niu: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  6. Jingfa Xiao: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  7. Jun Yu: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: junyu@big.ac.cn.
  8. Zhang Zhang: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: zhangzhang@big.ac.cn.
  9. Shuhui Song: China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: songshh@big.ac.cn.

Abstract

COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months, and a global fight against both has been intensifying. Here, we describe an analysis procedure where genome composition and its variables are related, through the genetic code to molecular mechanisms, based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex. Our analysis starts with primary sequence information, identity-based phylogeny based on 22,051 SARS-CoV-2 sequences, and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades. All are tailored to two key mechanisms: strand-biased and function-associated mutations. Our findings are listed as follows: 1) The most dominant mutation is C-to-U permutation, whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity, albeit assumed most slightly deleterious. 2) The second abundance group includes three negative-strand mutations (U-to-C, A-to-G, and G-to-A) and a positive-strand mutation (G-to-U) due to DNA repair mechanisms after cellular abasic events. 3) A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis. 4) Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes. These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes, to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications. Such actions are in desperate need, especially in the middle of the War against COVID-19.

Keywords

References

  1. Mil Med Res. 2020 Mar 13;7(1):11 [PMID: 32169119]
  2. N Engl J Med. 2020 Feb 20;382(8):727-733 [PMID: 31978945]
  3. Elife. 2021 Feb 11;10: [PMID: 33570490]
  4. Curr Protoc Bioinformatics. 2020 Mar;69(1):e96 [PMID: 32162851]
  5. Genome Res. 2002 Jun;12(6):851-6 [PMID: 12045139]
  6. Mol Biol Evol. 1996 May;13(5):660-5 [PMID: 8676740]
  7. Yi Chuan. 2019 Aug 20;41(8):761-772 [PMID: 31447427]
  8. JAMA. 2020 May 12;323(18):1843-1844 [PMID: 32159775]
  9. JAMA. 2020 Apr 7;323(13):1239-1242 [PMID: 32091533]
  10. Nature. 2009 Jun 25;459(7250):1122-5 [PMID: 19516283]
  11. Nature. 2020 Mar;579(7798):270-273 [PMID: 32015507]
  12. Mol Biol Evol. 2018 Jun 1;35(6):1547-1549 [PMID: 29722887]
  13. Genomics Proteomics Bioinformatics. 2007 Feb;5(1):1-6 [PMID: 17572358]
  14. Nat Commun. 2020 Nov 26;11(1):6013 [PMID: 33243994]
  15. Lancet. 2013 Dec 14;382(9909):1993-2002 [PMID: 24055451]
  16. Genomics Proteomics Bioinformatics. 2011 Apr;9(1-2):21-9 [PMID: 21641559]
  17. PLoS Curr. 2009 Sep 02;1:RRN1031 [PMID: 20029613]
  18. Science. 2014 Sep 12;345(6202):1369-72 [PMID: 25214632]
  19. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):640-647 [PMID: 32663617]
  20. Nucleic Acids Res. 2020 Jan 8;48(D1):D84-D86 [PMID: 31665464]
  21. Genomics Proteomics Bioinformatics. 2007 Dec;5(3-4):143-51 [PMID: 18267295]
  22. Genome Biol. 2016 Jun 06;17(1):122 [PMID: 27268795]
  23. Lancet Infect Dis. 2020 Apr;20(4):411-412 [PMID: 32105638]
  24. Cell. 2020 Aug 20;182(4):812-827.e19 [PMID: 32697968]
  25. Microbiome. 2018 Oct 3;6(1):178 [PMID: 30285857]
  26. Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259 [PMID: 30931475]
  27. Int J Clin Pract. 2020 Aug;74(8):e13525 [PMID: 32374903]
  28. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):749-759 [PMID: 33704069]
  29. Genomics Proteomics Bioinformatics. 2012 Feb;10(1):11-22 [PMID: 22449397]
  30. mSphere. 2020 Jun 24;5(3): [PMID: 32581081]
  31. Genomics Proteomics Bioinformatics. 2012 Feb;10(1):4-10 [PMID: 22449396]
  32. N Engl J Med. 2020 Mar 19;382(12):1177-1179 [PMID: 32074444]
  33. Natl Sci Rev. 2020 Jun;7(6):1012-1023 [PMID: 34676127]
  34. Expert Rev Respir Med. 2020 Sep;14(9):881-888 [PMID: 32536226]
  35. Nat Rev Microbiol. 2016 Aug;14(8):523-34 [PMID: 27344959]
  36. Nat Microbiol. 2020 Apr;5(4):536-544 [PMID: 32123347]
  37. Nucleic Acids Res. 2017 Jan 4;45(D1):D611-D618 [PMID: 28053166]
  38. Nat Rev Microbiol. 2019 Mar;17(3):181-192 [PMID: 30531947]
  39. PLoS Pathog. 2013 Aug;9(8):e1003565 [PMID: 23966862]
  40. Nat Med. 2020 May;26(5):672-675 [PMID: 32296168]
  41. Genomics Proteomics Bioinformatics. 2020 Dec;18(6):627-639 [PMID: 32739507]
  42. Curr Biol. 2020 Oct 5;30(19):3896 [PMID: 33022232]
  43. PLoS One. 2010 Mar 10;5(3):e9490 [PMID: 20224823]
  44. Annu Rev Microbiol. 2019 Sep 8;73:529-557 [PMID: 31226023]
  45. Glob Chall. 2017 Jan 10;1(1):33-46 [PMID: 31565258]
  46. Bioinformatics. 2011 Feb 15;27(4):592-3 [PMID: 21169378]
  47. Zool Res. 2020 May 18;41(3):247-257 [PMID: 32351056]
  48. Nucleic Acids Res. 2020 Jan 8;48(D1):D24-D33 [PMID: 31702008]
  49. Science. 2003 May 30;300(5624):1399-404 [PMID: 12730501]
  50. Nucleic Acids Res. 2004 Mar 19;32(5):1792-7 [PMID: 15034147]
  51. Sci Adv. 2020 Jun 17;6(25):eabb5813 [PMID: 32596474]
  52. Genomics Proteomics Bioinformatics. 2012 Aug;10(4):175-80 [PMID: 23084772]
  53. Nat Microbiol. 2020 Nov;5(11):1403-1407 [PMID: 32669681]
  54. Small. 2020 Aug;16(32):e2002169 [PMID: 32578378]
  55. Nature. 2020 Mar;579(7798):265-269 [PMID: 32015508]

MeSH Term

COVID-19
Evolution, Molecular
Genome, Viral
Humans
Mutation
SARS-CoV-2

Word Cloud

Created with Highcharts 10.0.0mutationSARS-CoV-2mutationscompositionviralsequenceCOVID-19analysismolecularmechanismsbasedreplicationproteomeeffectiveinformationvariationfindingspermutationMutationcausativepathogenrushedworldstaggeringpandemicmonthsglobalfightintensifyingdescribeproceduregenomevariablesrelatedgeneticcodeunderstandingRNAfeedbackloopfraternityincludingsitesreplicase-transcriptasecomplexstartsprimaryidentity-basedphylogeny22051sequencesevaluationpatternsspectra12permutationsamongorganizedcladestailoredtwokeymechanisms:strand-biasedfunction-associatedlistedfollows:1dominantC-to-Uwhoseabundantsecond-codon-positioncountsalteraminoacidtowardhigherweightlowerhydrophobicityalbeitassumedslightlydeleterious2secondabundancegroupincludesthreenegative-strandU-to-CA-to-GG-to-Apositive-strandG-to-UdueDNArepaircellularabasicevents3clade-associatedbiasedtrendfoundattributableelevatedlevelnegative-sensestrandsynthesis4Within-cladeinformativeassociatingnon-synonymouschangesdemandplatformemergingmappedontomostlysubtlefast-adjustingproteomestranscriptomesprovidebiologicalclinicallogicalconvergencepharmaceuticaldiagnosticapplicationsactionsdesperateneedespeciallymiddleWarCompositionalVariabilitySpectraMonophyleticCladesspectrumNucleotideViral

Similar Articles

Cited By (8)