Data extraction methods for systematic review (semi)automation: Update of a living systematic review.

Lena Schmidt, Ailbhe N Finnerty Mutlu, Rebecca Elmore, Babatunde K Olorisade, James Thomas, Julian P T Higgins
Author Information
  1. Lena Schmidt: NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK. ORCID
  2. Ailbhe N Finnerty Mutlu: UCL Social Research Institute, University College London, London, WC1H 0AL, UK.
  3. Rebecca Elmore: Sciome LLC, Research Triangle Park, North Carolina, 27713, USA. ORCID
  4. Babatunde K Olorisade: Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK.
  5. James Thomas: UCL Social Research Institute, University College London, London, WC1H 0AL, UK. ORCID
  6. Julian P T Higgins: Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK.

Abstract

The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the  . Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This living review update includes publications up to December 2022 and OpenAlex content up to March 2023. 76 publications are included in this review. Of these, 64 (84%) of the publications addressed extraction of data from abstracts, while 19 (25%) used full texts. A total of 71 (93%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 25 (33%), and code from 30 (39%) publications. Six (8%) implemented publicly available tools  This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. Between review updates, trends for sharing data and code increased strongly: in the base-review, data and code were available for 13 and 19% respectively, these numbers increased to 78 and 87% within the 23 new publications. Compared with the base-review, we observed another research trend, away from straightforward data extraction and towards additionally extracting relations between entities or automatic text summarisation. With this living review we aim to review the literature continually.

Keywords

References

  1. Bioinformatics. 2020 Jun 1;36(12):3856-3862 [PMID: 32311009]
  2. J Biomed Inform. 2016 Dec;64:265-272 [PMID: 27989816]
  3. Z Evid Fortbild Qual Gesundhwes. 2023 Sep;181:65-75 [PMID: 37596160]
  4. Proc Conf Assoc Comput Linguist Meet. 2017 Jul;2017:7-12 [PMID: 29093610]
  5. J Am Med Inform Assoc. 2020 Dec 9;27(12):1903-1912 [PMID: 32940710]
  6. BMC Med Inform Decis Mak. 2021 Feb 22;21(1):69 [PMID: 33618727]
  7. J Biomed Inform. 2017 Jun;70:27-34 [PMID: 28455150]
  8. J Mach Learn Res. 2016;17: [PMID: 27746703]
  9. BMC Med Inform Decis Mak. 2018 Dec 4;18(1):128 [PMID: 30509272]
  10. On Move Meaningful Internet Syst. 2016 Oct;10033:699-708 [PMID: 28664200]
  11. Inform Prim Care. 2007;15(1):9-16 [PMID: 17612476]
  12. Syst Rev. 2019 Jul 11;8(1):163 [PMID: 31296265]
  13. F1000Res. 2023 May 30;11:783 [PMID: 37360941]
  14. BMC Med Inform Decis Mak. 2010 Sep 28;10:56 [PMID: 20920176]
  15. BMC Med Inform Decis Mak. 2009 Feb 10;9:10 [PMID: 19208256]
  16. J Am Med Inform Assoc. 2018 Jul 1;25(7):774-779 [PMID: 29409012]
  17. Artif Intell Med. 2020 Aug;108:101949 [PMID: 32972669]
  18. JCO Clin Cancer Inform. 2021 Jan;5:102-111 [PMID: 33439724]
  19. BMC Med Inform Decis Mak. 2019 Dec 5;19(1):256 [PMID: 31805934]
  20. Stud Health Technol Inform. 2007;129(Pt 1):550-4 [PMID: 17911777]
  21. Syst Rev. 2015 Jan 14;4:5 [PMID: 25588314]
  22. J Am Med Inform Assoc. 2006 Jan-Feb;13(1):52-60 [PMID: 16221937]
  23. JAMIA Open. 2023 Jan 09;6(1):ooac107 [PMID: 36632329]
  24. J Biomed Inform. 2015 Aug;56:42-56 [PMID: 26003938]
  25. Nature. 2019 Aug;572(7767):27-29 [PMID: 31363197]
  26. AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:203-12 [PMID: 27570671]
  27. BMJ. 2021 Mar 29;372:n71 [PMID: 33782057]
  28. J Biomed Inform. 2013 Oct;46(5):940-6 [PMID: 23899909]
  29. Artif Intell Med. 2021 Aug;118:102098 [PMID: 34412851]
  30. Artif Intell Med. 2023 Oct;144:102661 [PMID: 37783549]
  31. J Med Internet Res. 2020 Oct 23;22(10):e19810 [PMID: 33095174]
  32. J Biomed Semantics. 2014 May 19;5:22 [PMID: 24949194]
  33. Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589 [PMID: 37674787]
  34. Syst Rev. 2018 May 19;7(1):77 [PMID: 29778096]
  35. IEEE J Biomed Health Inform. 2020 Apr 03;PP: [PMID: 32275627]
  36. F1000Res. 2020 Mar 25;9:210 [PMID: 32724560]
  37. Stud Health Technol Inform. 2012;180:589-93 [PMID: 22874259]
  38. AMIA Annu Symp Proc. 2008 Nov 06;:141-5 [PMID: 18999067]
  39. J Biomed Inform. 2017 Sep;73:1-13 [PMID: 28711679]
  40. Syst Rev. 2015 Jun 15;4:78 [PMID: 26073888]
  41. J Clin Epidemiol. 2022 Apr;144:22-42 [PMID: 34896236]
  42. Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:197-207 [PMID: 30305770]
  43. AMIA Annu Symp Proc. 2018 Dec 05;2018:817-826 [PMID: 30815124]
  44. AMIA Annu Symp Proc. 2012;2012:1070-8 [PMID: 23304383]
  45. AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:485-494 [PMID: 34457164]
  46. J Biomed Inform. 2021 Apr;116:103717 [PMID: 33647518]
  47. J Am Med Inform Assoc. 2016 Jan;23(1):193-201 [PMID: 26104742]
  48. J Biomed Inform. 2019 Jun;94:103177 [PMID: 30986506]
  49. Syst Rev. 2014 Jul 09;3:74 [PMID: 25005128]
  50. Proc ACM Int Conf Inf Knowl Manag. 2017 Nov;2017:1519-1528 [PMID: 29308293]
  51. BMJ Evid Based Med. 2021 Feb;26(1):24-27 [PMID: 31467247]
  52. BMC Med Inform Decis Mak. 2010 May 15;10:29 [PMID: 20470429]
  53. Int J Epidemiol. 2016 Feb;45(1):266-77 [PMID: 26659355]
  54. J Am Med Inform Assoc. 2021 Jul 30;28(8):1703-1711 [PMID: 33956981]
  55. J Biomed Inform. 2022 Oct;134:104185 [PMID: 36038066]
  56. AMIA Annu Symp Proc. 2011;2011:843-52 [PMID: 22195142]
  57. Evid Based Complement Alternat Med. 2022 May 13;2022:1679589 [PMID: 35600940]
  58. BMC Bioinformatics. 2011 Mar 29;12 Suppl 2:S5 [PMID: 21489224]
  59. Syst Rev. 2022 Sep 30;11(1):209 [PMID: 36180888]
  60. BMC Med Res Methodol. 2022 Dec 16;22(1):322 [PMID: 36522637]
  61. J Telemed Telecare. 2008;14(7):354-8 [PMID: 18852316]
  62. J Biomed Inform. 2009 Oct;42(5):790-800 [PMID: 19166975]
  63. AMIA Annu Symp Proc. 2017 Feb 10;2016:1900-1909 [PMID: 28269949]
  64. F1000Res. 2020 Sep 4;9:1097 [PMID: 33604025]
  65. F1000Res. 2021 Mar 8;10:192 [PMID: 35136567]
  66. AMIA Annu Symp Proc. 2018 Dec 05;2018:368-376 [PMID: 30815076]
  67. J Biomed Semantics. 2022 May 23;13(1):14 [PMID: 35606797]
  68. Syst Rev. 2021 May 26;10(1):156 [PMID: 34039433]
  69. Stud Health Technol Inform. 2019 Aug 21;264:188-192 [PMID: 31437911]
  70. Syst Rev. 2017 Nov 25;6(1):233 [PMID: 29178925]
  71. Proc Conf. 2020 Jul;2020:63-69 [PMID: 34136886]
  72. AMIA Annu Symp Proc. 2010 Nov 13;2010:897-901 [PMID: 21347108]
  73. J Biomed Inform. 2014 Jun;49:159-70 [PMID: 24530879]

Word Cloud

Created with Highcharts 10.0.0reviewdatapublicationsextractionsystematiclivingsemiDataavailablecodeliteratureincludedstudiescontinuallyOpenAlextextwithin30entitiesextractedextractingincreasedbase-reviewreliableusableautomationcansupportfieldreducingworkloadrequiredgatherinformationconductresultsexaminespublishedapproachesreportsclinicalsystematicallysearchPubMedACLAnthologyarXivviaEPPI-Reviewerthe Fullscreeningconductedopen-sourceapplicationcreatedpurposeupdateincludesDecember2022contentMarch2023766484%addressedabstracts1925%usedfulltextstotal7193%developedclassifiersrandomisedcontrolledtrialsPICOspopulationinterventioncomparatoroutcomefrequentlyfrom 2533%39%Six8%implementedpubliclytools Thispresentsoverviewautomateddata-extractioninterestdifferenttypesidentifiedbroadevidencebasedescribinginterventionalreviewssmallnumberepidemiologicaldiagnosticaccuracyupdatestrendssharingstrongly:1319%respectivelynumbers7887%23newComparedobservedanotherresearchtrendawaystraightforwardtowardsadditionallyrelationsautomaticsummarisation Withaimmethodsautomation:UpdateExtractionNaturalLanguageProcessingReproducibilitySystematicReviewsTextMining

Similar Articles

Cited By