Using intrahost single nucleotide variant data to predict SARS-CoV-2 detection cycle threshold values.

Lea Duesterwald, Marcus Nguyen, Paul Christensen, S Wesley Long, Randall J Olsen, James M Musser, James J Davis
Author Information
  1. Lea Duesterwald: College of Engineering, Cornell University, Ithaca, NY, United States of America.
  2. Marcus Nguyen: Northwestern-Argonne Institute for Science and Engineering, Evanston, IL, United States of America.
  3. Paul Christensen: Laboratory of Human Molecular and Translational Human Infectious Diseases, Center for Infectious Diseases, Houston Methodist Research Institute and Department of Pathology and Genomic Medicine, Houston Methodist Hospital, Houston, TX, United States of America.
  4. S Wesley Long: Laboratory of Human Molecular and Translational Human Infectious Diseases, Center for Infectious Diseases, Houston Methodist Research Institute and Department of Pathology and Genomic Medicine, Houston Methodist Hospital, Houston, TX, United States of America.
  5. Randall J Olsen: Laboratory of Human Molecular and Translational Human Infectious Diseases, Center for Infectious Diseases, Houston Methodist Research Institute and Department of Pathology and Genomic Medicine, Houston Methodist Hospital, Houston, TX, United States of America.
  6. James M Musser: Laboratory of Human Molecular and Translational Human Infectious Diseases, Center for Infectious Diseases, Houston Methodist Research Institute and Department of Pathology and Genomic Medicine, Houston Methodist Hospital, Houston, TX, United States of America.
  7. James J Davis: Northwestern-Argonne Institute for Science and Engineering, Evanston, IL, United States of America. ORCID

Abstract

Over the last four years, each successive wave of the COVID-19 pandemic has been caused by variants with mutations that improve the transmissibility of the virus. Despite this, we still lack tools for predicting clinically important features of the virus. In this study, we show that it is possible to predict the PCR cycle threshold (Ct) values from clinical detection assays using sequence data. Ct values often correspond with patient viral load and the epidemiological trajectory of the pandemic. Using a collection of 36,335 high quality genomes, we built models from SARS-CoV-2 intrahost single nucleotide variant (iSNV) data, computing XGBoost models from the frequencies of A, T, G, C, insertions, and deletions at each position relative to the Wuhan-Hu-1 reference genome. Our best model had an R2 of 0.604 [0.593-0.616, 95% confidence interval] and a Root Mean Square Error (RMSE) of 5.247 [5.156-5.337], demonstrating modest predictive power. Overall, we show that the results are stable relative to an external holdout set of genomes selected from SRA and are robust to patient status and the detection instruments that were used. This study highlights the importance of developing modeling strategies that can be applied to publicly available genome sequence data for use in disease prevention and control.

References

  1. Front Bioinform. 2022 Oct 24;2:1020189 [PMID: 36353215]
  2. mBio. 2020 Oct 30;11(6): [PMID: 33127862]
  3. Infect Dis Ther. 2020 Sep;9(3):573-586 [PMID: 32725536]
  4. Cell. 2020 Aug 20;182(4):812-827.e19 [PMID: 32697968]
  5. bioRxiv. 2022 Nov 23;: [PMID: 36451881]
  6. medRxiv. 2022 Aug 19;: [PMID: 36032964]
  7. Nucleic Acids Res. 2023 Jan 6;51(D1):D678-D689 [PMID: 36350631]
  8. Gigascience. 2021 Feb 16;10(2): [PMID: 33590861]
  9. Clin Infect Dis. 2020 Nov 19;71(16):2158-2166 [PMID: 32445580]
  10. Crit Care. 2020 Apr 23;24(1):170 [PMID: 32326952]
  11. J Infect Dis. 2021 May 28;223(10):1666-1670 [PMID: 33580259]
  12. Am J Pathol. 2021 Oct;191(10):1754-1773 [PMID: 34303698]
  13. Science. 2021 Jul 16;373(6552): [PMID: 34083451]
  14. Clin Infect Dis. 2021 Jun 1;72(11):e921 [PMID: 32986798]
  15. Cell Host Microbe. 2022 Sep 14;30(9):1242-1254.e6 [PMID: 35988543]
  16. Science. 2021 Apr 16;372(6539): [PMID: 33688063]
  17. Eur J Clin Microbiol Infect Dis. 2020 Jun;39(6):1059-1061 [PMID: 32342252]
  18. J Clin Virol. 2021 Jul;140:104869 [PMID: 34023572]
  19. J Clin Virol. 2022 Jun;150-151:105153 [PMID: 35472751]
  20. Comput Biol Med. 2021 Nov;138:104915 [PMID: 34655896]
  21. J Public Health Afr. 2022 May 24;13(1):2163 [PMID: 35720798]
  22. Am J Pathol. 2021 Jun;191(6):983-992 [PMID: 33741335]
  23. Emerg Infect Dis. 2023 May;29(5): [PMID: 37054986]
  24. Sci Rep. 2023 Apr 15;13(1):6156 [PMID: 37061534]
  25. Clin Infect Dis. 2020 Dec 17;71(10):2663-2666 [PMID: 32442256]
  26. Viruses. 2022 Jun 28;14(7): [PMID: 35891394]
  27. Arch Virol. 2022 Feb;167(2):327-344 [PMID: 35089389]
  28. J Infect. 2021 Oct;83(4):e1-e3 [PMID: 34419559]
  29. Nature. 2022 May;605(7911):640-652 [PMID: 35361968]
  30. Swiss Med Wkly. 2022 Jan 06;152:w30133 [PMID: 35019196]
  31. Proc Natl Acad Sci U S A. 2020 Jun 30;117(26):15193-15199 [PMID: 32522874]
  32. Clin Med (Lond). 2021 Jan;21(1):e54-e56 [PMID: 33243836]
  33. Sci Rep. 2022 Jun 3;12(1):9275 [PMID: 35661750]
  34. Cell. 2021 Sep 30;184(20):5189-5200.e7 [PMID: 34537136]
  35. Science. 2022 Jun 17;376(6599):1327-1332 [PMID: 35608456]
  36. Clin Lab Med. 2022 Jun;42(2):237-248 [PMID: 35636824]
  37. Elife. 2021 Jul 12;10: [PMID: 34250907]
  38. Genome Biol. 2019 Jan 8;20(1):8 [PMID: 30621750]
  39. Br J Surg. 2020 Sep;107(10):e367 [PMID: 32687598]
  40. J Clin Microbiol. 2020 May 26;58(6): [PMID: 32245835]
  41. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  42. BMJ. 2020 Apr 21;369:m1443 [PMID: 32317267]
  43. Microbes Infect. 2020 Nov - Dec;22(10):617-621 [PMID: 32911086]

Grants

  1. 75N93019C00076/NIAID NIH HHS

MeSH Term

SARS-CoV-2
Humans
COVID-19
Genome, Viral
Polymorphism, Single Nucleotide
Viral Load
Pandemics

Word Cloud

Created with Highcharts 10.0.0datavaluesdetectionpandemicvirusstudyshowpredictcyclethresholdCtsequencepatientUsinggenomesmodelsSARS-CoV-2intrahostsinglenucleotidevariantrelativegenomelastfouryearssuccessivewaveCOVID-19causedvariantsmutationsimprovetransmissibilityDespitestilllacktoolspredictingclinicallyimportantfeaturespossiblePCRclinicalassaysusingoftencorrespondviralloadepidemiologicaltrajectorycollection36335highqualitybuiltiSNVcomputingXGBoostfrequenciesTGCinsertionsdeletionspositionWuhan-Hu-1referencebestmodelR20604[0593-061695%confidenceinterval]RootMeanSquareErrorRMSE5247[5156-5337]demonstratingmodestpredictivepowerOverallresultsstableexternalholdoutsetselectedSRArobuststatusinstrumentsusedhighlightsimportancedevelopingmodelingstrategiescanappliedpubliclyavailableusediseasepreventioncontrol

Similar Articles

Cited By

No available data.