Unsupervised title and abstract screening for systematic review: a retrospective case-study using topic modelling methodology.

Agnes Natukunda, Leacky K Muchene
Author Information
  1. Agnes Natukunda: Immunomodulation and Vaccines Programme, MRC/UVRI and LSHTM Uganda Research Unit, Entebbe, Uganda. natukundagnes2@gmail.com. ORCID
  2. Leacky K Muchene: StatsDecide Analytics and Consulting Limited, Nairobi, Kenya.

Abstract

BACKGROUND: The importance of systematic reviews in collating and summarising available research output on a particular topic cannot be over-emphasized. However, initial screening of retrieved literature is significantly time and labour intensive. Attempts at automating parts of the systematic review process have been made with varying degree of success partly due to being domain-specific, requiring vendor-specific software or manually labelled training data. Our primary objective was to develop statistical methodology for performing automated title and abstract screening for systematic reviews. Secondary objectives included (1) to retrospectively apply the automated screening methodology to previously manually screened systematic reviews and (2) to characterize the performance of the automated screening methodology scoring algorithm in a simulation study.
METHODS: We implemented a Latent Dirichlet Allocation-based topic model to derive representative topics from the retrieved documents' title and abstract. The second step involves defining a score threshold for classifying the documents as relevant for full-text review or not. The score is derived based on a set of search keywords (often the database retrieval search terms). Two systematic review studies were retrospectively used to illustrate the methodology.
RESULTS: In one case study (helminth dataset), [Formula: see text] sensitivity compared to manual title and abstract screening was achieved. This is against a false positive rate of [Formula: see text]. For the second case study (Wilson disease dataset), a sensitivity of [Formula: see text] and specificity of [Formula: see text] were achieved.
CONCLUSIONS: Unsupervised title and abstract screening has the potential to reduce the workload involved in conducting systematic review. While sensitivity of the methodology on the tested data is low, approximately [Formula: see text] specificity was achieved. Users ought to keep in mind that potentially low sensitivity might occur. One approach to mitigate this might be to incorporate additional targeted search keywords such as the indexing databases terms into the search term copora. Moreover, automated screening can be used as an additional screener to the manual screeners.

Keywords

References

  1. Syst Rev. 2019 Jul 11;8(1):163 [PMID: 31296265]
  2. J Biomed Inform. 2014 Oct;51:242-53 [PMID: 24954015]
  3. BMC Bioinformatics. 2010 Jan 26;11:55 [PMID: 20102628]
  4. Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1:5228-35 [PMID: 14872004]
  5. Liver Int. 2019 Nov;39(11):2136-2152 [PMID: 31206982]
  6. Stat Med. 2002 Jun 15;21(11):1635-40 [PMID: 12111924]
  7. Syst Rev. 2015 Nov 26;4:172 [PMID: 26612232]
  8. J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19 [PMID: 16357352]
  9. Syst Rev. 2015 Jan 14;4:5 [PMID: 25588314]
  10. Am J Inf Manag. 2016 Nov;1(1):1-9 [PMID: 29071308]
  11. Syst Rev. 2018 May 19;7(1):77 [PMID: 29778096]
  12. BMJ Open. 2017 Feb 27;7(2):e012545 [PMID: 28242767]
  13. Parasite Immunol. 2022 Sep;44(9):e12939 [PMID: 35712983]
  14. Res Synth Methods. 2017 Sep;8(3):275-280 [PMID: 28374510]
  15. Syst Rev. 2014 Jul 09;3:74 [PMID: 25005128]
  16. Conserv Biol. 2019 Apr;33(2):434-443 [PMID: 30285277]
  17. Evid Based Nurs. 2011 Jul;14(3):64 [PMID: 21659560]

MeSH Term

Humans
Retrospective Studies
Systematic Reviews as Topic
Algorithms
Software
Computer Simulation

Word Cloud

Created with Highcharts 10.0.0screeningsystematicmethodologyreviewtitleabstract[Formula:seetext]automatedsearchsensitivityreviewstopicstudyachievedUnsupervisedretrievedmanuallydataretrospectivelyLatentDirichletsecondscorekeywordstermsusedcasedatasetmanualspecificitylowmightadditionalmodellingBACKGROUND:importancecollatingsummarisingavailableresearchoutputparticularover-emphasizedHoweverinitialliteraturesignificantlytimelabourintensiveAttemptsautomatingpartsprocessmadevaryingdegreesuccesspartlyduedomain-specificrequiringvendor-specificsoftwarelabelledtrainingprimaryobjectivedevelopstatisticalperformingSecondaryobjectivesincluded1applypreviouslyscreened2characterizeperformancescoringalgorithmsimulationMETHODS:implementedAllocation-basedmodelderiverepresentativetopicsdocuments'stepinvolvesdefiningthresholdclassifyingdocumentsrelevantfull-textderivedbasedsetoftendatabaseretrievalTwostudiesillustrateRESULTS:onehelminthcomparedfalsepositiverateWilsondiseaseCONCLUSIONS:potentialreduceworkloadinvolvedconductingtestedapproximatelyUserskeepmindpotentiallyoccurOneapproachmitigateincorporatetargetedindexingdatabasestermcoporaMoreovercanscreenerscreenersreview:retrospectivecase-studyusingAbstractAutomatedAllocationTopiclearning

Similar Articles

Cited By