Comparative study of machine learning and statistical survival models for enhancing cervical cancer prognosis and risk factor assessment using SEER data.

Anjana Eledath Kolasseri, Venkataramana B
Author Information
  1. Anjana Eledath Kolasseri: Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
  2. Venkataramana B: Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India. venkataramana.b@vit.ac.in.

Abstract

Cervical cancer is a common malignant tumor of the female reproductive system and the leading cause of death among women worldwide. The survival prediction method can be used to effectively analyze the time to event, which is essential in any clinical study. This study aims to bridge the gap between traditional statistical methods and machine learning in survival analysis by revealing which techniques are most effective in predicting survival, with a particular emphasis on improving prediction accuracy and identifying key risk factors for cervical cancer. Women with cervical cancer diagnosed between 2013 and 2015 were included in our study using data from the Surveillance, Epidemiology, and End Results (SEER) database. Using this dataset, the study assesses the performance of Weibull, Cox proportional hazards models, and Random Survival Forests in terms of predictive accuracy and risk factor identification. The findings reveal that machine learning models, particularly Random Survival Forests (RSF), outperform traditional statistical methods in both predictive accuracy and the discernment of crucial prognostic factors, underscoring the advantages of machine learning in handling complex survival data. However, for a survival dataset with a small number of predictors, statistical models should be used first. The study finds that RSF models enhance survival analysis with more accurate predictions and insights into survival risk factors but highlights the need for larger datasets and further research on model interpretability and clinical applicability.

Keywords

References

  1. Cancers (Basel). 2020 Sep 29;12(10): [PMID: 33003533]
  2. J Cancer. 2018 Oct 10;9(21):3923-3928 [PMID: 30410596]
  3. J Gynecol Oncol. 2018 Nov;29(6):e91 [PMID: 30207099]
  4. Gynecol Oncol. 2024 Feb;181:20-27 [PMID: 38103421]
  5. CA Cancer J Clin. 2022 Jan;72(1):7-33 [PMID: 35020204]
  6. CA Cancer J Clin. 2021 May;71(3):209-249 [PMID: 33538338]
  7. Stat Med. 2017 Apr 15;36(8):1272-1284 [PMID: 28088842]
  8. Stat Med. 1996 Feb 28;15(4):361-87 [PMID: 8668867]
  9. Cancers (Basel). 2022 Apr 23;14(9): [PMID: 35565238]
  10. Psychol Methods. 2009 Dec;14(4):323-48 [PMID: 19968396]
  11. Int J Gynaecol Obstet. 2021 Oct;155 Suppl 1:28-44 [PMID: 34669203]
  12. Genome Med. 2021 Sep 27;13(1):152 [PMID: 34579788]
  13. Ann Transl Med. 2016 Dec;4(24):484 [PMID: 28149846]
  14. BMC Med Res Methodol. 2010 Mar 16;10:20 [PMID: 20233435]
  15. N Engl J Med. 2019 Apr 4;380(14):1347-1358 [PMID: 30943338]
  16. Int J Womens Health. 2021 Apr 23;13:385-393 [PMID: 33935521]
  17. Medicine (Baltimore). 2023 Apr 14;102(15):e33547 [PMID: 37058045]
  18. Comput Methods Programs Biomed. 2018 Jun;159:185-198 [PMID: 29650312]
  19. PLoS One. 2019 Jan 23;14(1):e0208807 [PMID: 30673703]
  20. IEEE J Biomed Health Inform. 2018 Sep;22(5):1589-1604 [PMID: 29989977]
  21. Comput Struct Biotechnol J. 2014 Nov 15;13:8-17 [PMID: 25750696]
  22. Ann Transl Med. 2016 Dec;4(23):461 [PMID: 28090517]
  23. Asian Pac J Cancer Prev. 2012;13(6):2991-5 [PMID: 22938495]
  24. Ann Surg. 2022 Nov 1;276(5):776-783 [PMID: 35866643]
  25. BMC Bioinformatics. 2021 Jun 16;22(1):331 [PMID: 34134623]
  26. Comput Stat Data Anal. 2011 Jan 1;55(1):667-676 [PMID: 21076652]
  27. BMC Womens Health. 2021 Jul 6;21(1):267 [PMID: 34229672]
  28. BMC Cancer. 2007 Aug 23;7:164 [PMID: 17718897]
  29. Comput Math Methods Med. 2015;2015:303250 [PMID: 26379761]

MeSH Term

Humans
Uterine Cervical Neoplasms
Female
Machine Learning
SEER Program
Risk Factors
Prognosis
Middle Aged
Proportional Hazards Models
Risk Assessment
Survival Analysis
Models, Statistical
Adult
Aged

Word Cloud

Created with Highcharts 10.0.0survivalstudycancerlearningmodelsstatisticalmachineriskfactorsmethodsanalysisaccuracycervicaldataSurvivalCervicalpredictionusedclinicaltraditionalusingSEERdatasetRandomForestspredictivefactorRSFcommonmalignanttumorfemalereproductivesystemleadingcausedeathamongwomenworldwidemethodcaneffectivelyanalyzetimeeventessentialaimsbridgegaprevealingtechniqueseffectivepredictingparticularemphasisimprovingidentifyingkeyWomendiagnosed20132015includedSurveillanceEpidemiologyEndResultsdatabaseUsingassessesperformanceWeibullCoxproportionalhazardstermsidentificationfindingsrevealparticularlyoutperformdiscernmentcrucialprognosticunderscoringadvantageshandlingcomplexHoweversmallnumberpredictorsfirstfindsenhanceaccuratepredictionsinsightshighlightsneedlargerdatasetsresearchmodelinterpretabilityapplicabilityComparativeenhancingprognosisassessmentMachinePrognosticStatistical

Similar Articles

Cited By