Multimodal masked siamese network improves chest X-ray representation learning.

Saeed Shurrab, Alejandro Guerra-Manzanares, Farah E Shamout
Author Information
  1. Saeed Shurrab: New York University Abu Dhabi, Computer Engineering, Abu Dhabi, 129188, UAE.
  2. Alejandro Guerra-Manzanares: New York University Abu Dhabi, Computer Engineering, Abu Dhabi, 129188, UAE.
  3. Farah E Shamout: New York University Abu Dhabi, Computer Engineering, Abu Dhabi, 129188, UAE. farah.shamout@nyu.edu.

Abstract

Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. Although such approaches deliver promising results, they do not take advantage of the associated patient or scan information collected within Electronic Health Records (EHR). This study aims to develop a multimodal pretraining approach for chest radiographs that considers EHR data incorporation as an additional modality that during training. We propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest radiograph representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate the multimodal MSN on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations through linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and state-of-the-art self-supervised learning baselines. In particular, our proposed method achieves an improvement of of 2%  in the Area Under the Receiver Operating Characteristic Curve (AUROC) compared to vanilla MSN and 5% to 8% compared to other baselines, including uni-modal ones. Furthermore, our findings reveal that demographic features provide the most significant performance improvement. Our work highlights the potential of EHR-enhanced self-supervised pretraining for medical imaging and opens opportunities for future research to address limitations in existing representation learning methods for other medical imaging modalities, such as neuro-, ophthalmic, and sonar imaging.

References

  1. Lancet. 2020 Aug 22;396(10250):565-582 [PMID: 32828189]
  2. IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4037-4058 [PMID: 32386141]
  3. PeerJ Comput Sci. 2022 Jul 19;8:e1045 [PMID: 36091989]
  4. IEEE Trans Pattern Anal Mach Intell. 2024 Mar;46(3):1362-1377 [PMID: 36306295]
  5. IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443 [PMID: 29994351]
  6. NPJ Digit Med. 2022 Nov 7;5(1):171 [PMID: 36344814]
  7. Nat Biomed Eng. 2022 Dec;6(12):1346-1352 [PMID: 35953649]
  8. Nat Biomed Eng. 2022 Dec;6(12):1399-1406 [PMID: 36109605]
  9. Nurs Older People. 2006 Jun 1;18(5):36 [PMID: 27741777]
  10. J Am Coll Radiol. 2007 Sep;4(9):617-21 [PMID: 17845967]
  11. Nature. 2015 May 28;521(7553):436-44 [PMID: 26017442]
  12. Sci Data. 2023 Jan 3;10(1):1 [PMID: 36596836]
  13. Pattern Recognit. 2021 May;113:107826 [PMID: 33518813]
  14. NPJ Digit Med. 2020 Oct 16;3:136 [PMID: 33083571]
  15. Br J Radiol. 2000 Oct;73(874):1052-5 [PMID: 11271897]
  16. Sci Data. 2019 Dec 12;6(1):317 [PMID: 31831740]
  17. J Anim Ecol. 2015 Jul;84(4):892-7 [PMID: 26074184]
  18. IEEE Trans Med Imaging. 2021 Sep;40(9):2284-2294 [PMID: 33891550]
  19. Lab Invest. 2021 Apr;101(4):412-422 [PMID: 33454724]
  20. Brief Bioinform. 2022 Mar 10;23(2): [PMID: 35089332]
  21. Sci Rep. 2021 Feb 5;11(1):3254 [PMID: 33547343]
  22. Int J Comput Assist Radiol Surg. 2018 Jun;13(6):925-933 [PMID: 29704196]

Grants

  1. CG010/NYUAD Center for AI & Robotics
  2. CG010/NYUAD Center for AI & Robotics
  3. CG001/Center for Interacting Urban Networks
  4. CG001/Center for Interacting Urban Networks
  5. G1104/Center for Cyber Security
  6. G1104/Center for Cyber Security

MeSH Term

Humans
Radiography, Thoracic
Neural Networks, Computer
Electronic Health Records
Female
Male
Supervised Machine Learning
ROC Curve

Word Cloud

Created with Highcharts 10.0.0learningimagingpretrainingEHRchestMSNmedicaldataself-supervisedimprovementcomparedmethodsmodalityscaninformationmultimodalqualityrepresentationsthreeincludingdemographicX-rayproposedmethodsignificantvanillabaselinesrepresentationSelf-supervisedimagesprimarilyrelyAlthoughapproachesdeliverpromisingresultstakeadvantageassociatedpatientcollectedwithinElectronicHealthRecordsstudyaimsdevelopapproachradiographsconsidersincorporationadditionaltrainingproposeincorporateMaskedSiameseNetworkenhanceradiographinvestigatetypesmetadatainpatientstayevaluatepubliclyavailabledatasetsMIMIC-CXRCheXpertNIH-14usingtwovisiontransformerViTbackbonesspecificallyViT-TinyViT-Smallassessinglinearevaluationdemonstratesstate-of-the-artparticularachieves2% AreaReceiverOperatingCharacteristicCurveAUROC5%8%uni-modalonesFurthermorefindingsrevealfeaturesprovideperformanceworkhighlightspotentialEHR-enhancedopensopportunitiesfutureresearchaddresslimitationsexistingmodalitiesneuro-ophthalmicsonarMultimodalmaskedsiamesenetworkimproves

Similar Articles

Cited By