CODA: an open-source platform for federated analysis and machine learning on distributed healthcare data.

Louis Mullie, Jonathan Afilalo, Patrick Archambault, Rima Bouchakri, Kip Brown, David L Buckeridge, Yiorgos Alexandros Cavayas, Alexis F Turgeon, Denis Martineau, Fran��ois Lamontagne, Martine Lebrasseur, Renald Lemieux, Jeffrey Li, Micha��l Sauthier, Pascal St-Onge, An Tang, William Witteman, Micha��l Chass��
Author Information
  1. Louis Mullie: Department of Medicine, Centre Hospitalier de l'Universit�� de Montr��al, Montr��al, H2X 3E4, Canada. ORCID
  2. Jonathan Afilalo: Department of Medicine, Jewish General Hospital, Montr��al, H3T 1E4, Canada. ORCID
  3. Patrick Archambault: Department of Emergency Medicine and Family Medicine, Universit�� Laval, Qu��bec, G1V 0A6, Canada.
  4. Rima Bouchakri: Centre de Recherche du Centre Hospitalier de l'Universit�� de Montr��al, Universit�� de Montr��al, Montr��al, H2X 0A9, Canada.
  5. Kip Brown: Centre de Recherche du Centre Hospitalier de l'Universit�� de Montr��al, Universit�� de Montr��al, Montr��al, H2X 0A9, Canada.
  6. David L Buckeridge: Mila Quebec Artificial Intelligence Institute, Montr��al, H2S 3H1, Canada. ORCID
  7. Yiorgos Alexandros Cavayas: Department of Medicine, H��pital du Sacr��-Coeur de Montr��al, Montr��al, H4J 1C5, Canada.
  8. Alexis F Turgeon: Department of Anesthesiology and Critical Care Medicine, Universit�� Laval, Qu��bec, G1V 0A6, Canada.
  9. Denis Martineau: Centre de recherche du CHU de Qu��bec-Universit�� Laval, Universit�� Laval, Qu��bec, G1V 4G2, Canada.
  10. Fran��ois Lamontagne: Centre de recherche du CHUS, Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, J1G 2E8, Canada.
  11. Martine Lebrasseur: Centre de Recherche du Centre Hospitalier de l'Universit�� de Montr��al, Universit�� de Montr��al, Montr��al, H2X 0A9, Canada.
  12. Renald Lemieux: Centre de recherche du CHUS, Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, J1G 2E8, Canada.
  13. Jeffrey Li: Centre de Recherche du Centre Hospitalier de l'Universit�� de Montr��al, Universit�� de Montr��al, Montr��al, H2X 0A9, Canada.
  14. Micha��l Sauthier: Faculty of Medicine, Universit�� de Montr��al, Montr��al, H3C 3J7, Canada.
  15. Pascal St-Onge: Centre de Recherche du Centre Hospitalier de l'Universit�� de Montr��al, Universit�� de Montr��al, Montr��al, H2X 0A9, Canada.
  16. An Tang: Faculty of Medicine, Universit�� de Montr��al, Montr��al, H3C 3J7, Canada.
  17. William Witteman: Centre de Recherche Int��gr�� pour un Syst��me Apprenant en sant�� et Services Sociaux, Centre int��gr�� de sant�� et de Services Sociaux de Chaudi��re-Appalaches, L��vis, G6V 3Z1, Canada.
  18. Micha��l Chass��: Department of Medicine, Centre Hospitalier de l'Universit�� de Montr��al, Montr��al, H2X 3E4, Canada.

Abstract

OBJECTIVES: Distributed computations facilitate multi-institutional data analysis while avoiding the costs and complexity of data pooling. Existing approaches lack crucial features, such as built-in medical standards and terminologies, no-code data visualizations, explicit disclosure control mechanisms, and support for basic statistical computations, in addition to gradient-based optimization capabilities.
MATERIALS AND METHODS: We describe the development of the Collaborative Data Analysis (CODA) platform, and the design choices undertaken to address the key needs identified during our survey of stakeholders. We use a public dataset (MIMIC-IV) to demonstrate end-to-end multi-modal FL using CODA. We assessed the technical feasibility of deploying the CODA platform at 9 hospitals in Canada, describe implementation challenges, and evaluate its scalability on large patient populations.
RESULTS: The CODA platform was designed, developed, and deployed between January 2020 and January 2023. Software code, documentation, and technical documents were released under an open-source license. Multi-modal federated averaging is illustrated using the MIMIC-IV and MIMIC-CXR datasets. To date, 8 out of the 9 participating sites have successfully deployed the platform, with a total enrolment of >1M patients. Mapping data from legacy systems to FHIR was the biggest barrier to implementation.
DISCUSSION AND CONCLUSION: The CODA platform was developed and successfully deployed in a public healthcare setting in Canada, with heterogeneous information technology systems and capabilities. Ongoing efforts will use the platform to develop and prospectively validate models for risk assessment, proactive monitoring, and resource usage. Further work will also make tools available to facilitate migration from legacy formats to FHIR and DICOM.

Keywords

References

  1. Stud Health Technol Inform. 2018;247:581-585 [PMID: 29678027]
  2. Radiother Oncol. 2020 Mar;144:189-200 [PMID: 31911366]
  3. Sensors (Basel). 2020 Dec 29;21(1): [PMID: 33383803]
  4. Stud Health Technol Inform. 2022 May 25;294:194-198 [PMID: 35612055]
  5. Eur Radiol. 2002 Apr;12(4):920-7 [PMID: 11960249]
  6. J Biomed Inform. 2023 May;141:104338 [PMID: 37023843]
  7. Orphanet J Rare Dis. 2021 Apr 7;16(1):161 [PMID: 33827602]
  8. IEEE Rev Biomed Eng. 2021;14:156-180 [PMID: 32746371]
  9. Eur J Nucl Med Mol Imaging. 2021 Nov;48(12):3791-3804 [PMID: 33847779]
  10. JCO Clin Cancer Inform. 2020 Mar;4:184-200 [PMID: 32134684]
  11. JCO Clin Cancer Inform. 2020 Nov;4:1027-1038 [PMID: 33166197]
  12. Sci Data. 2023 Jan 3;10(1):1 [PMID: 36596836]
  13. Med Care. 2010 Dec;48(12):1105-10 [PMID: 20978452]
  14. Stud Health Technol Inform. 2021 May 27;281:402-406 [PMID: 34042774]
  15. PLoS Comput Biol. 2020 Nov 5;16(11):e1008326 [PMID: 33151926]
  16. Sensors (Basel). 2021 Aug 09;21(16): [PMID: 34450820]
  17. Adv Wound Care (New Rochelle). 2013 Dec;2(10):583-587 [PMID: 24761332]
  18. Bioinformatics. 2022 Oct 31;38(21):4919-4926 [PMID: 36073911]
  19. Phys Med Biol. 2022 Oct 19;67(21): [PMID: 36198326]
  20. J Am Med Inform Assoc. 2021 Aug 13;28(9):2017-2026 [PMID: 34151978]
  21. BMC Med Res Methodol. 2021 Apr 3;21(1):64 [PMID: 33812380]
  22. JMIR Med Inform. 2021 Jul 30;9(7):e21929 [PMID: 34328424]
  23. J Biomed Semantics. 2022 Sep 08;13(1):23 [PMID: 36076268]
  24. PLoS Comput Biol. 2016 Jun 07;12(6):e1004867 [PMID: 27271528]
  25. Genomics Inform. 2016 Dec;14(4):173-180 [PMID: 28154508]
  26. J Healthc Inform Res. 2021;5(1):1-19 [PMID: 33204939]
  27. Sci Data. 2019 Dec 12;6(1):317 [PMID: 31831740]
  28. Sci Rep. 2022 Feb 4;12(1):1953 [PMID: 35121774]
  29. Int J Epidemiol. 2014 Dec;43(6):1929-44 [PMID: 25261970]
  30. BMJ. 2021 Dec 31;375:e068197 [PMID: 34972694]
  31. Clin Chem. 2003 Apr;49(4):624-33 [PMID: 12651816]
  32. IEEE Access. 2020;8:140699-140725 [PMID: 32999795]

Grants

  1. 172742/CIHR
  2. #172742/CIHR

MeSH Term

Humans
Software
Health Facilities
Delivery of Health Care
Machine Learning
Canada

Word Cloud

Created with Highcharts 10.0.0platformdataCODAanalysisdeployedfederatedhealthcarelearningcomputationsfacilitatecapabilitiesANDdescribeusepublicMIMIC-IVusingtechnical9CanadaimplementationdevelopedJanuaryopen-sourcesuccessfullylegacysystemsFHIRwillmodelsresourceusagemachinedistributedOBJECTIVES:Distributedmulti-institutionalavoidingcostscomplexitypoolingExistingapproacheslackcrucialfeaturesbuilt-inmedicalstandardsterminologiesno-codevisualizationsexplicitdisclosurecontrolmechanismssupportbasicstatisticaladditiongradient-basedoptimizationMATERIALSMETHODS:developmentCollaborativeDataAnalysisdesignchoicesundertakenaddresskeyneedsidentifiedsurveystakeholdersdatasetdemonstrateend-to-endmulti-modalFLassessedfeasibilitydeployinghospitalschallengesevaluatescalabilitylargepatientpopulationsRESULTS:designed20202023SoftwarecodedocumentationdocumentsreleasedlicenseMulti-modalaveragingillustratedMIMIC-CXRdatasetsdate8participatingsitestotalenrolment>1MpatientsMappingbiggestbarrierDISCUSSIONCONCLUSION:settingheterogeneousinformationtechnologyOngoingeffortsdevelopprospectivelyvalidateriskassessmentproactivemonitoringworkalsomaketoolsavailablemigrationformatsDICOMCODA:biomedicalanalyticscomputingmanagementpredictive

Similar Articles

Cited By