Paraphrasing to improve the performance of Electronic Health Records Question Answering.

Sarvesh Soni, Kirk Roberts
Author Information
  1. Sarvesh Soni: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston TX, USA.
  2. Kirk Roberts: School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston TX, USA.

Abstract

This paper describes a paraphrasing approach to improve the performance of question answering (QA) for electronic health records (EHRs). QA systems for structured EHR data usually rely on semantic parsing, which aims to generate machine-understandable logical forms from free-text questions. Training semantic parsers requires large datasets of question-logical form (QL) pairs, which are labor-intensive to create. Considering the scarcity of large QL datasets in the clinical domain, we propose a framework for expanding an existing dataset using paraphrasing. We experiment with different heuristics for multiple sample sizes and iterations to assess the effect of adding paraphrasing to the task of semantic parsing. We found that adding paraphrases to an existing dataset based on TERTHRESHOLD scores results in an improved performance in the majority (74%) of the experimental runs. Hence, the proposed paraphrasing-based framework has the potential to improve the performance of QA systems using a limited set of existing QL annotations.

References

  1. J Biomed Inform. 2017 Mar;67:69-79 [PMID: 28088527]
  2. LREC Int Conf Lang Resour Eval. 2016 May;2016:3772-3778 [PMID: 28503677]
  3. AMIA Annu Symp Proc. 2018 Apr 16;2017:1478-1487 [PMID: 29854217]
  4. AMIA Annu Symp Proc. 2020 Mar 04;2019:1207-1215 [PMID: 32308918]

Grants

  1. R00 LM012104/NLM NIH HHS

Word Cloud

Created with Highcharts 10.0.0performanceparaphrasingimproveQAsemanticQLexistingsystemsparsinglargedatasetsframeworkdatasetusingaddingpaperdescribesapproachquestionansweringelectronichealthrecordsEHRsstructuredEHRdatausuallyrelyaimsgeneratemachine-understandablelogicalformsfree-textquestionsTrainingparsersrequiresquestion-logicalformpairslabor-intensivecreateConsideringscarcityclinicaldomainproposeexpandingexperimentdifferentheuristicsmultiplesamplesizesiterationsassesseffecttaskfoundparaphrasesbasedTERTHRESHOLDscoresresultsimprovedmajority74%experimentalrunsHenceproposedparaphrasing-basedpotentiallimitedsetannotationsParaphrasingElectronicHealthRecordsQuestionAnswering

Similar Articles

Cited By (1)