Lung Cancer Detection Using Bayesian Networks: A Retrospective Development and Validation Study on a Danish Population of High-Risk Individuals.

Margrethe Bang Henriksen, Florian Van Daalen, Leonard Wee, Torben Fr��strup Hansen, Lars Henrik Jensen, Claus Lohman Brasen, Ole Hilberg, Inigo Bermejo
Author Information
  1. Margrethe Bang Henriksen: Department of Oncology, Vejle University Hospital, Vejle, Denmark. ORCID
  2. Florian Van Daalen: Department of Radiation Oncology (MAASTRO) GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, the Netherlands. ORCID
  3. Leonard Wee: Department of Radiation Oncology (MAASTRO) GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, the Netherlands. ORCID
  4. Torben Fr��strup Hansen: Department of Oncology, Vejle University Hospital, Vejle, Denmark. ORCID
  5. Lars Henrik Jensen: Department of Oncology, Vejle University Hospital, Vejle, Denmark. ORCID
  6. Claus Lohman Brasen: Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark. ORCID
  7. Ole Hilberg: Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark. ORCID
  8. Inigo Bermejo: Department of Radiation Oncology (MAASTRO) GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, the Netherlands. ORCID

Abstract

BACKGROUND: Lung cancer (LC) is the top cause of cancer deaths globally, prompting many countries to adopt LC screening programs. While screening typically relies on age and smoking intensity, more efficient risk models exist. We devised a Bayesian network (BN) for LC detection, testing its resilience with varying degrees of missing data and comparing it to a prior machine learning (ML) model.
METHODS: We analyzed data from 9940 patients referred for LC assessment in Southern Denmark from 2009 to 2018. Variables included age, sex, smoking, and lab results. Our experiments varied missing data (0%-30%), BN structure (expert-based vs. data-driven), and discretization method (standard vs. data-driven).
RESULTS: Across all missing data levels, area under the curve (AUC) remained steady, ranging from 0.737 to 0.757, compared to the ML model's AUC of 0.77. BN structure and discretization method had minimal impact on performance. BNs were well calibrated overall, with a net benefit in decision curve analysis when predicted risk exceeded 5%.
CONCLUSION: BN models showed resilience with up to 30% missing values. Moreover, these BNs exhibited similar performance, calibration, and clinical utility compared to the machine learning model developed using the same dataset. Considering their effectiveness in handling missing data, BNs emerge as a relevant method for the development of future lung cancer detection models.

References

  1. Transl Lung Cancer Res. 2021 Dec;10(12):4390-4402 [PMID: 35070749]
  2. PLoS Med. 2014 Dec 02;11(12):e1001764 [PMID: 25460915]
  3. Transl Lung Cancer Res. 2023 Dec 26;12(12):2392-2411 [PMID: 38205206]
  4. N Engl J Med. 2020 Feb 6;382(6):503-513 [PMID: 31995683]
  5. Lung Cancer. 2020 Sep;147:154-186 [PMID: 32721652]
  6. Ann Intern Med. 2015 Jan 6;162(1):55-63 [PMID: 25560714]
  7. Value Health. 2019 Apr;22(4):439-445 [PMID: 30975395]
  8. Phys Med Biol. 2011 Mar 21;56(6):1635-51 [PMID: 21335651]
  9. Am J Respir Crit Care Med. 2021 Aug 15;204(4):445-453 [PMID: 33823116]
  10. J Natl Compr Canc Netw. 2018 Apr;16(4):412-441 [PMID: 29632061]
  11. Diagnostics (Basel). 2023 Aug 08;13(16): [PMID: 37627876]
  12. Artif Intell Med. 2016 Sep;72:42-55 [PMID: 27664507]
  13. N Engl J Med. 2011 Aug 4;365(5):395-409 [PMID: 21714641]
  14. Artif Intell Med. 2020 Jul;107:101912 [PMID: 32828451]
  15. Transl Lung Cancer Res. 2021 Feb;10(2):1083-1090 [PMID: 33718046]
  16. J Med Internet Res. 2019 May 16;21(5):e13260 [PMID: 31099339]
  17. Comput Biol Med. 2014 Apr;47:147-60 [PMID: 24607682]
  18. IEEE Access. 2019;7:119403-119419 [PMID: 32754420]
  19. N Engl J Med. 2013 Feb 21;368(8):728-36 [PMID: 23425165]
  20. JAMA Netw Open. 2022 Dec 1;5(12):e2248793 [PMID: 36576736]
  21. JAMA. 2021 Mar 09;325(10):962-970 [PMID: 33687470]
  22. Diagn Progn Res. 2019 Oct 04;3:18 [PMID: 31592444]
  23. Thorax. 2020 Aug;75(8):661-668 [PMID: 32631933]
  24. Lancet Reg Health Eur. 2021 Sep 11;10:100179 [PMID: 34806061]
  25. Curr Chall Thorac Surg. 2023 Feb 25;5: [PMID: 37016707]
  26. Sci Rep. 2024 Dec 24;14(1):30630 [PMID: 39719477]
  27. J Med Screen. 2012 Sep;19(3):154-6 [PMID: 23060474]
  28. PLoS One. 2013 Dec 06;8(12):e82349 [PMID: 24324773]
  29. Am J Respir Crit Care Med. 2021 Dec 1;204(11):1306-1316 [PMID: 34464235]
  30. J Thorac Oncol. 2023 Jan;18(1):47-56 [PMID: 37650698]

Grants

  1. R198-A14299/The Danish National Research Center for Lung Cancer, Danish Cancer Society
  2. /Beckett-Fonden
  3. /Syddansk Universitet
  4. /Region Syddanmark
  5. /Lilly and Herbert Hansens Foundation
  6. /Dagmar Marshalls Fond
  7. /Familien Hede Nielsens Fond

MeSH Term

Humans
Lung Neoplasms
Bayes Theorem
Female
Male
Denmark
Middle Aged
Aged
Retrospective Studies
Early Detection of Cancer
Machine Learning
Risk Assessment
Risk Factors

Word Cloud

Created with Highcharts 10.0.0missingdataLCBNcancermodelsmethod0BNsLungscreeningagesmokingriskBayesiandetectionresiliencemachinelearningMLmodelstructurevsdata-drivendiscretizationcurveAUCcomparedperformanceBACKGROUND:topcausedeathsgloballypromptingmanycountriesadoptprogramstypicallyreliesintensityefficientexistdevisednetworktestingvaryingdegreescomparingpriorMETHODS:analyzed9940patientsreferredassessmentSouthernDenmark20092018Variablesincludedsexlabresultsexperimentsvaried0%-30%expert-basedstandardRESULTS:Acrosslevelsarearemainedsteadyranging737757model's77minimalimpactwellcalibratedoverallnetbenefitdecisionanalysispredictedexceeded5%CONCLUSION:showed30%valuesMoreoverexhibitedsimilarcalibrationclinicalutilitydevelopedusingdatasetConsideringeffectivenesshandlingemergerelevantdevelopmentfuturelungCancerDetectionUsingNetworks:RetrospectiveDevelopmentValidationStudyDanishPopulationHigh-RiskIndividuals

Similar Articles

Cited By

No available data.