ChatGPT achieves comparable accuracy to specialist physicians in predicting the efficacy of high-flow oxygen therapy.

Taotao Liu, Yaocong Duan, Yanchun Li, Yingying Hu, Lingling Su, Aiping Zhang
Author Information
  1. Taotao Liu: Department of Surgical Intensive Care Unit, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, 100730, China.
  2. Yaocong Duan: School of Psychology and Neuroscience, University of Glasgow, Glasgow, G12 8QQ, UK.
  3. Yanchun Li: The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, 471000, China.
  4. Yingying Hu: The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology, Luoyang, 471000, China.
  5. Lingling Su: Department of Respiratory and Critical Care Medicine, Jiangyan Hospital Affiliated to Nanjing University of Chinese Medicine, Taizhou, 225500, China.
  6. Aiping Zhang: Department of Respiratory and Critical Care Medicine, Jiangyan Hospital Affiliated to Nanjing University of Chinese Medicine, Taizhou, 225500, China.

Abstract

Background: The failure of high-flow nasal cannula (HFNC) oxygen therapy can necessitate endotracheal intubation in patients, making timely prediction of the intubation risk following HFNC therapy crucial for reducing mortality due to delays in intubation.
Objectives: To investigate the accuracy of ChatGPT in predicting the endotracheal intubation risk within 48 h following HFNC therapy and compare it with the predictive accuracy of specialist and non-specialist physicians.
Methods: We conducted a prospective multicenter cohort study based on the data of 71 adult patients who received HFNC therapy. For each patient, their baseline data and physiological parameters after 6-h HFNC therapy were recorded to create a 6-alternative-forced-choice questionnaire that asked participants to predict the 48-h endotracheal intubation risk using scale options ranging from 1 to 6, with higher scores indicating a greater risk. GPT-3.5, GPT-4.0, respiratory and critical care specialist physicians and non-specialist physicians completed the same questionnaires (N = 71) respectively. We then determined the optimal diagnostic cutoff point, using the Youden index, for each predictor and 6-h ROX index, and compared their predictive performance using receiver operating characteristic (ROC) analysis.
Results: The optimal diagnostic cutoff points were determined to be ≥ 4 for both GPT-4.0 and specialist physicians. GPT-4.0 demonstrated a precision of 76.1 %, with a specificity of 78.6 % (95%CI = 52.4-92.4 %) and sensitivity of 75.4 % (95%CI = 62.9-84.8 %). In comparison, the precision of specialist physicians was 80.3 %, with a specificity of 71.4 % (95%CI = 45.4-88.3 %) and sensitivity of 82.5 % (95%CI = 70.6-90.2 %). For GPT-3.5 and non-specialist physicians, the optimal diagnostic cutoff points were ≥5, with precisions of 73.2 % and 64.8 %, respectively. The area under the curve (AUC) in ROC analysis for GPT-4.0 was 0.821 (95%CI = 0.698-0.943), which was the highest among the predictors and significantly higher than that of non-specialist physicians [0.662 (95%CI = 0.518-0.805), P = 0.011].
Conclusion: GPT-4.0 achieves an accuracy level comparable to specialist physicians in predicting the 48-h endotracheal intubation risk following HFNC therapy, based on patient baseline data and physiological parameters after 6-h HFNC therapy.

Keywords

References

  1. Am J Respir Crit Care Med. 2022 Dec 1;206(11):1326-1335 [PMID: 35771533]
  2. Respir Care. 2021 Jun;66(6):909-919 [PMID: 33328179]
  3. JAMA. 2016 Oct 18;316(15):1565-1574 [PMID: 27706464]
  4. Gac Sanit. 2021;35 Suppl 1:S67-S70 [PMID: 33832631]
  5. N Engl J Med. 2015 Jun 4;372(23):2185-96 [PMID: 25981908]
  6. Pulmonology. 2022 Jan-Feb;28(1):13-17 [PMID: 34049831]
  7. J Crit Care. 2016 Oct;35:200-5 [PMID: 27481760]
  8. BMC Pulm Med. 2021 May 13;21(1):160 [PMID: 33985472]
  9. J Robot Surg. 2023 Aug;17(4):1847-1855 [PMID: 37002463]
  10. Lancet. 2019 Apr 20;393(10181):1577-1579 [PMID: 31007185]
  11. Intensive Care Med. 2015 Apr;41(4):623-32 [PMID: 25691263]
  12. Lancet Respir Med. 2022 Jun;10(6):573-583 [PMID: 35305308]
  13. J Crit Care. 2021 Dec;66:102-108 [PMID: 34507079]
  14. Chest. 2015 Jul;148(1):253-261 [PMID: 25742321]
  15. Lancet Digit Health. 2022 Jun;4(6):e436-e444 [PMID: 35430151]
  16. BMC Pulm Med. 2022 Sep 16;22(1):350 [PMID: 36114516]
  17. Healthc Manage Forum. 2020 Jan;33(1):47-49 [PMID: 31340674]

Word Cloud

Created with Highcharts 10.0.0physicianstherapyHFNCintubationspecialist0riskGPT-4endotrachealaccuracynon-specialistoxygenfollowingChatGPTpredictingdata6-husingoptimaldiagnosticcutoffindex4 %high-flownasalcannulapatientspredictivebased71patientbaselinephysiologicalparameters48-hhigherGPT-35respectivelydeterminedROXROCanalysispointsprecisionspecificitysensitivity8 %3 %2 %95%CI = 0achievescomparableBackground:failurecannecessitatemakingtimelypredictioncrucialreducingmortalityduedelaysObjectives:investigatewithin48 hcompareMethods:conductedprospectivemulticentercohortstudyadultreceivedrecordedcreate6-alternative-forced-choicequestionnaireaskedparticipantspredictscaleoptionsranging16scoresindicatinggreaterrespiratorycriticalcarecompletedquestionnairesN = 71pointYoudenpredictorcomparedperformancereceiveroperatingcharacteristicResults:be ≥ 4demonstrated761 %786 %95%CI = 524-927595%CI = 629-84comparison8095%CI = 454-88825 %95%CI = 706-90≥5precisions7364areacurveAUC821698-0943highestamongpredictorssignificantly[0662518-0805P = 0011]Conclusion:levelefficacyArtificialintelligenceHigh-flow

Similar Articles

Cited By