Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study.

Robert Olszewski, Klaudia Watros, Ma��gorzata Ma��czak, Jakub Owoc, Krzysztof Jeziorski, Jakub Brzezi��ski
Author Information
  1. Robert Olszewski: Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland; Department of Ultrasound, Institute of Fundamental Technological Research, Polish Academy of Sciences. Electronic address: robert.olszewski@spartanska.pl.
  2. Klaudia Watros: Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland. Electronic address: klaudia.watros@spartanska.pl.
  3. Ma��gorzata Ma��czak: Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland. Electronic address: m.manczak@op.pl.
  4. Jakub Owoc: Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland. Electronic address: kowoc@wp.pl.
  5. Krzysztof Jeziorski: Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland; Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland. Electronic address: krzysztof.jeziorski@spartanska.pl.
  6. Jakub Brzezi��ski: Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland. Electronic address: jakub.brzezinski@spartanska.pl.

Abstract

BACKGROUND: Chatbots using the Large Language Model (LLM) generate human responses to questions from all categories. Due to staff shortages in healthcare systems, patients waiting for an appointment increasingly use chatbots to get information about their condition. Given the number of chatbots currently available, assessing the responses they generate is essential.
METHODS: Five chatbots with free access were selected (Gemini, Microsoft Copilot, PiAI, ChatGPT, ChatSpot) and blinded using letters (A, B, C, D, E). Each chatbot was asked questions about cardiology, oncology, and psoriasis. Responses were compared to guidelines from the European Society of Cardiology, American Academy of Dermatology and American Society of Clinical Oncology. All answers were assessed using readability scales (Flesch Reading Scale, Gunning Fog Scale Level, Flesch-Kincaid Grade Level and Dale-Chall Score). Using a 3-point Likert scale, two independent medical professionals assessed the compliance of the responses with the guidelines.
RESULTS: A total of 45 questions were asked of all chatbots. Chatbot C gave the shortest answers, 7.0 (6.0 - 8.0), and Chatbot A the longest 17.5 (13.0 - 24.5). The Flesch Reading Ease Scale ranged from 16.3 (12.2 - 21.9) (Chatbot D) to 39.8 (29.0 - 50.4) (Chatbot A). Flesch-Kincaid Grade Level ranged from 12.5 (10.6 - 14.6) (Chatbot A) to 15.9 (15.1 - 17.1) (Chatbot D). Gunning Fog Scale Level ranged from 15.77 (Chatbot A) to 19.73 (Chatbot D). Dale-Chall Score ranged from 10.3 (9.3 - 11.3) (Chatbot A) to 11.9 (11.5 - 12.4) (Chatbot D).
CONCLUSION: This study indicates that chatbots vary in length, quality, and readability. They answer each question in their own way, based on the data they have pulled from the web. Reliability of the responses generated by chatbots is high. This suggests that people who want information from a chatbot need to be careful and verify the answers they receive, particularly when they ask about medical and health aspects.

Keywords

MeSH Term

Humans
Psoriasis
Cardiovascular Diseases
Comprehension
Medical Oncology
Health Literacy

Word Cloud

Created with Highcharts 10.0.0Chatbot-chatbotsD0responsesScaleLevel5ranged39usingquestionsanswersreadability6121511healthChatbotsgenerateinformationCchatbotaskedoncologyguidelinesSocietyAmericanOncologyassessedFleschReadingGunningFogFlesch-KincaidGradeDale-ChallScoremedical8174101studyqualityBACKGROUND:LargeLanguageModelLLMhumancategoriesDuestaffshortageshealthcaresystemspatientswaitingappointmentincreasinglyusegetconditionGivennumbercurrentlyavailableassessingessentialMETHODS:FivefreeaccessselectedGeminiMicrosoftCopilotPiAIChatGPTChatSpotblindedlettersBEcardiologypsoriasisResponsescomparedEuropeanCardiologyAcademyDermatologyClinicalscalesUsing3-pointLikertscaletwoindependentprofessionalscomplianceRESULTS:total45gaveshortest7longest1324Ease1622139295014771973CONCLUSION:indicatesvarylengthanswerquestionwaybaseddatapulledwebReliabilitygeneratedhighsuggestspeoplewantneedcarefulverifyreceiveparticularlyaskaspectsAssessingresponsecardiovascularpsoriasis:comparativeCardiovascularPsoriasisReadability

Similar Articles

Cited By