Tutorial on survival modeling with applications to omics data.

Zhi Zhao, John Zobolas, Manuela Zucknick, Tero Aittokallio
Author Information
  1. Zhi Zhao: Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, Faculty of Medicine, University of Oslo, Oslo 0372, Norway. ORCID
  2. John Zobolas: Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, Faculty of Medicine, University of Oslo, Oslo 0372, Norway. ORCID
  3. Manuela Zucknick: Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, Faculty of Medicine, University of Oslo, Oslo 0372, Norway. ORCID
  4. Tero Aittokallio: Oslo Centre for Biostatistics and Epidemiology (OCBE), Department of Biostatistics, Faculty of Medicine, University of Oslo, Oslo 0372, Norway. ORCID

Abstract

MOTIVATION: Identification of genomic, molecular and clinical markers prognostic of patient survival is important for developing personalized disease prevention, diagnostic and treatment approaches. Modern omics technologies have made it possible to investigate the prognostic impact of markers at multiple molecular levels, including genomics, epigenomics, transcriptomics, proteomics and metabolomics, and how these potential risk factors complement clinical characterization of patient outcomes for survival prognosis. However, the massive sizes of the omics datasets, along with their correlation structures, pose challenges for studying relationships between the molecular information and patients' survival outcomes.
RESULTS: We present a general workflow for survival analysis that is applicable to high-dimensional omics data as inputs when identifying survival-associated features and validating survival models. In particular, we focus on the commonly used Cox-type penalized regressions and hierarchical Bayesian models for feature selection in survival analysis, which are especially useful for high-dimensional data, but the framework is applicable more generally.
AVAILABILITY AND IMPLEMENTATION: A step-by-step R tutorial using The Cancer Genome Atlas survival and omics data for the execution and evaluation of survival models has been made available at https://ocbe-uio.github.io/survomics.

References

  1. Biom J. 2023 Jan;65(1):e2100139 [PMID: 35837982]
  2. Nat Comput Sci. 2021 Jun;1(6):395-402 [PMID: 38217236]
  3. BMC Med Res Methodol. 2013 Mar 06;13:33 [PMID: 23496923]
  4. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34472591]
  5. Bioinformatics. 2022 Sep 2;38(17):4178-4184 [PMID: 35818973]
  6. Biometrics. 2015 Jun;71(2):428-38 [PMID: 25639276]
  7. Nat Rev Cancer. 2011 Feb;11(2):85-95 [PMID: 21258394]
  8. PLoS Biol. 2004 Apr;2(4):E108 [PMID: 15094809]
  9. Stat Med. 2000 Dec 30;19(24):3401-15 [PMID: 11122504]
  10. Br J Cancer. 2003 Aug 4;89(3):431-6 [PMID: 12888808]
  11. Stat Med. 2012 Oct 30;31(24):2882-91 [PMID: 22733695]
  12. Genome Biol. 2017 May 5;18(1):83 [PMID: 28476144]
  13. Nat Methods. 2021 Jul;18(7):723-732 [PMID: 34155396]
  14. J Clin Oncol. 1999 May;17(5):1499-507 [PMID: 10334537]
  15. EBioMedicine. 2019 Apr;42:420-430 [PMID: 30917936]
  16. Cancer Genomics Proteomics. 2022 Jan-Feb;19(1):1-11 [PMID: 34949654]
  17. BMC Med. 2023 May 15;21(1):182 [PMID: 37189125]
  18. Brief Bioinform. 2021 May 20;22(3): [PMID: 32823283]
  19. Nat Med. 2015 May;21(5):449-56 [PMID: 25894828]
  20. J Stat Comput Simul. 2017;87(7):1363-1378 [PMID: 29217870]
  21. Annu Rev Public Health. 1997;18:83-104 [PMID: 9143713]
  22. Biometrics. 2005 Mar;61(1):92-105 [PMID: 15737082]
  23. JAMA. 1982 May 14;247(18):2543-6 [PMID: 7069920]
  24. Brief Bioinform. 2010 Mar;11(2):253-64 [PMID: 19965979]
  25. BMC Bioinformatics. 2008 Jan 10;9:14 [PMID: 18186927]
  26. Stat Appl Genet Mol Biol. 2008;7(1):Article7 [PMID: 18312212]
  27. Bioinformatics. 2007 Jul 15;23(14):1768-74 [PMID: 17485430]
  28. Brief Bioinform. 2021 Jan 18;22(1):77-87 [PMID: 32597465]
  29. Biom J. 2015 Nov;57(6):959-81 [PMID: 26417963]
  30. BMC Med Res Methodol. 2017 Apr 18;17(1):60 [PMID: 28420338]
  31. BioData Min. 2013 Mar 01;6(1):5 [PMID: 23448398]
  32. Stat Med. 2011 May 10;30(10):1105-17 [PMID: 21484848]
  33. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34498681]
  34. Nature. 2000 Aug 17;406(6797):747-52 [PMID: 10963602]
  35. Stat Med. 2014 Dec 30;33(30):5310-29 [PMID: 25042390]
  36. BMC Bioinformatics. 2008 Jun 06;9:269 [PMID: 18538026]
  37. Bioinformatics. 2009 Apr 1;25(7):890-6 [PMID: 19244389]
  38. Nat Rev Genet. 2022 Jul;23(7):395-410 [PMID: 35217821]
  39. Metabolites. 2017 May 16;7(2): [PMID: 28509845]
  40. Bioinformatics. 2007 Aug 15;23(16):2080-7 [PMID: 17553857]
  41. Metabolites. 2021 Mar 21;11(3): [PMID: 33801081]
  42. Proteomics. 2022 Dec;22(23-24):e2200092 [PMID: 36349819]
  43. Genome Biol. 2019 Jun 4;20(1):118 [PMID: 31164141]
  44. Lifetime Data Anal. 2017 Jul;23(3):353-376 [PMID: 27016934]
  45. PLoS One. 2018 Mar 15;13(3):e0193523 [PMID: 29543895]
  46. BMC Bioinformatics. 2021 Dec 11;22(1):586 [PMID: 34895139]
  47. Cell. 2018 Mar 22;173(1):20-51 [PMID: 29570994]
  48. Science. 2023 Jan 20;379(6629):eadd8643 [PMID: 36656942]
  49. Epidemiology. 2011 Jul;22(4):582-5 [PMID: 21642779]
  50. J Chronic Dis. 1983;36(10):715-23 [PMID: 6630407]
  51. PLoS One. 2018 Jul 30;13(7):e0195070 [PMID: 30059495]
  52. Bioinformatics. 2021 Apr 1;36(22-23):5405-5414 [PMID: 33325490]
  53. Biometrics. 2006 Mar;62(1):202-10 [PMID: 16542247]
  54. J Proteome Res. 2015 Jun 5;14(6):2707-13 [PMID: 25873244]
  55. Biostatistics. 2006 Jul;7(3):355-73 [PMID: 16344280]
  56. Nat Med. 2018 Jul;24(7):978-985 [PMID: 29942094]
  57. Stat Med. 1999 Sep 15-30;18(17-18):2529-45 [PMID: 10474158]
  58. Ann Appl Stat. 2011 Jun 1;5(2A):1081-1101 [PMID: 21818245]
  59. Stat Med. 2005 Dec 30;24(24):3927-44 [PMID: 16320281]
  60. Stat Appl Genet Mol Biol. 2009;8:Article 14 [PMID: 19222381]
  61. Bioinformatics. 2010 Mar 15;26(6):784-90 [PMID: 20118118]
  62. Cell Rep Methods. 2023 Apr 24;3(4):100461 [PMID: 37159669]
  63. Int J Obes (Lond). 2022 Sep;46(9):1644-1651 [PMID: 35689089]
  64. Biostatistics. 2019 Apr 1;20(2):347-357 [PMID: 29462286]
  65. Biostatistics. 2016 Oct;17(4):708-21 [PMID: 27118123]
  66. Biometrics. 2020 Sep;76(3):700-710 [PMID: 31733066]
  67. Nat Commun. 2014 May 29;5:3887 [PMID: 24871328]
  68. Stat Appl Genet Mol Biol. 2008;7(1):Article12 [PMID: 18384265]
  69. Stat Med. 2010 Mar 30;29(7-8):818-29 [PMID: 20213714]
  70. Stat Med. 2021 Nov 30;40(27):6038-6056 [PMID: 34404112]
  71. Biometrics. 2000 Mar;56(1):249-55 [PMID: 10783803]
  72. BMC Bioinformatics. 2009 Dec 13;10:413 [PMID: 20003386]
  73. Stat Med. 2016 Dec 20;35(29):5376-5390 [PMID: 27580645]
  74. Brief Bioinform. 2020 May 21;21(3):1080-1097 [PMID: 31329830]
  75. Genes Dis. 2023 Jul 07;11(3):100979 [PMID: 38299197]
  76. Stat Med. 1995 Jan 30;14(2):161-84 [PMID: 7754264]
  77. Genome Biol. 2022 Jan 21;23(1):31 [PMID: 35063006]
  78. J Stat Softw. 2011 Mar;39(5):1-13 [PMID: 27065756]
  79. Biom J. 2018 May;60(3):431-449 [PMID: 29292533]
  80. Bioinformatics. 2006 Mar 01;22(5):566-72 [PMID: 16377613]
  81. Stat Med. 1997 Feb 28;16(4):385-95 [PMID: 9044528]
  82. Biometrics. 2000 Jun;56(2):337-44 [PMID: 10877287]
  83. Br J Cancer. 2003 Sep 1;89(5):781-6 [PMID: 12942105]
  84. BMC Bioinformatics. 2004 Aug 23;5:114 [PMID: 15324460]
  85. Ann Appl Stat. 2019 Sep;13(3):1847-1883 [PMID: 36704751]
  86. Genomics Proteomics Bioinformatics. 2015 Jun;13(3):169-76 [PMID: 25907251]
  87. Epidemiology. 2011 Jul;22(4):575-81 [PMID: 21552129]
  88. Biom J. 2021 Dec;63(8):1607-1622 [PMID: 34319616]

Grants

  1. 2020026/Helse Sør-Øst
  2. 216104/Norwegian Cancer Society
  3. /Radium Hospital Foundation
  4. 326238/Academy of Finland
  5. /Cancer Society of Finland
  6. /European Union's Horizon 2020
  7. /European Union's Horizon 2020

MeSH Term

Humans
Bayes Theorem
Genomics
Proteomics
Genome
Epigenomics
Metabolomics

Word Cloud

Created with Highcharts 10.0.0survivalomicsdatamolecularmodelsclinicalmarkersprognosticpatientmadeoutcomesanalysisapplicablehigh-dimensionalMOTIVATION:IdentificationgenomicimportantdevelopingpersonalizeddiseasepreventiondiagnostictreatmentapproachesModerntechnologiespossibleinvestigateimpactmultiplelevelsincludinggenomicsepigenomicstranscriptomicsproteomicsmetabolomicspotentialriskfactorscomplementcharacterizationprognosisHowevermassivesizesdatasetsalongcorrelationstructuresposechallengesstudyingrelationshipsinformationpatients'RESULTS:presentgeneralworkflowinputsidentifyingsurvival-associatedfeaturesvalidatingparticularfocuscommonlyusedCox-typepenalizedregressionshierarchicalBayesianfeatureselectionespeciallyusefulframeworkgenerallyAVAILABILITYANDIMPLEMENTATION:step-by-stepRtutorialusingCancerGenomeAtlasexecutionevaluationavailablehttps://ocbe-uiogithubio/survomicsTutorialmodelingapplications

Similar Articles

Cited By