An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study.

Erol Orel, Iza Ciglenecki, Amaury Thiabaud, Alexander Temerev, Alexandra Calmy, Olivia Keiser, Aziza Merzouki
Author Information
  1. Erol Orel: Institute of Global Health, University of Geneva, Geneva, Switzerland. ORCID
  2. Iza Ciglenecki: Médecins Sans Frontières, Geneva, Switzerland. ORCID
  3. Amaury Thiabaud: Institute of Global Health, University of Geneva, Geneva, Switzerland. ORCID
  4. Alexander Temerev: Institute of Global Health, University of Geneva, Geneva, Switzerland. ORCID
  5. Alexandra Calmy: HIV/AIDS Unit, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland. ORCID
  6. Olivia Keiser: Institute of Global Health, University of Geneva, Geneva, Switzerland. ORCID
  7. Aziza Merzouki: Institute of Global Health, University of Geneva, Geneva, Switzerland. ORCID

Abstract

BACKGROUND: Literature reviews (LRs) identify, evaluate, and synthesize relevant papers to a particular research question to advance understanding and support decision-making. However, LRs, especially traditional systematic reviews, are slow, resource-intensive, and become outdated quickly.
OBJECTIVE: LiteRev is an advanced and enhanced version of an existing automation tool designed to assist researchers in conducting LRs through the implementation of cutting-edge technologies such as natural language processing and machine learning techniques. In this paper, we present a comprehensive explanation of LiteRev's capabilities, its methodology, and an evaluation of its accuracy and efficiency to a manual LR, highlighting the benefits of using LiteRev.
METHODS: Based on the user's query, LiteRev performs an automated search on a wide range of open-access databases and retrieves relevant metadata on the resulting papers, including abstracts or full texts when available. These abstracts (or full texts) are text processed and represented as a term frequency-inverse document frequency matrix. Using dimensionality reduction (pairwise controlled manifold approximation) and clustering (hierarchical density-based spatial clustering of applications with noise) techniques, the corpus is divided into different topics described by a list of the most important keywords. The user can then select one or several topics of interest, enter additional keywords to refine its search, or provide key papers to the research question. Based on these inputs, LiteRev performs a k-nearest neighbor (k-NN) search and suggests a list of potentially interesting papers. By tagging the relevant ones, the user triggers new k-NN searches until no additional paper is suggested for screening. To assess the performance of LiteRev, we ran it in parallel to a manual LR on the burden and care for acute and early HIV infection in sub-Saharan Africa. We assessed the performance of LiteRev using true and false predictive values, recall, and work saved over sampling.
RESULTS: LiteRev extracted, processed, and transformed text into a term frequency-inverse document frequency matrix of 631 unique papers from PubMed. The topic modeling module identified 16 topics and highlighted 2 topics of interest to the research question. Based on 18 key papers, the k-NNs module suggested 193 papers for screening out of 613 papers in total (31.5% of the whole corpus) and correctly identified 64 relevant papers out of the 87 papers found by the manual abstract screening (recall rate of 73.6%). Compared to the manual full text screening, LiteRev identified 42 relevant papers out of the 48 papers found manually (recall rate of 87.5%). This represents a total work saved over sampling of 56%.
CONCLUSIONS: We presented the features and functionalities of LiteRev, an automation tool that uses natural language processing and machine learning methods to streamline and accelerate LRs and support researchers in getting quick and in-depth overviews on any topic of interest.

Keywords

References

  1. J Acquir Immune Defic Syndr. 2019 Dec 1;82 Suppl 2:S104-S112 [PMID: 31658196]
  2. Syst Rev. 2022 Dec 1;11(1):258 [PMID: 36457048]
  3. J Infect Dis. 2005 May 1;191(9):1403-9 [PMID: 15809897]
  4. Lancet HIV. 2019 Sep;6(9):e632-e638 [PMID: 31331822]
  5. J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19 [PMID: 16357352]
  6. JMIR Med Educ. 2021 May 31;7(2):e24418 [PMID: 34057072]
  7. J Am Med Inform Assoc. 2010 Jul-Aug;17(4):446-53 [PMID: 20595313]
  8. J Clin Epidemiol. 2020 May;121:81-90 [PMID: 32004673]
  9. Syst Rev. 2018 May 19;7(1):77 [PMID: 29778096]
  10. Syst Rev. 2015 Jun 15;4:78 [PMID: 26073888]
  11. Res Synth Methods. 2017 Sep;8(3):275-280 [PMID: 28374510]
  12. Int J STD AIDS. 2014 Sep;25(10):695-704 [PMID: 24759563]
  13. BMC Med Res Methodol. 2020 Jan 13;20(1):7 [PMID: 31931747]
  14. Health Info Libr J. 2019 Sep;36(3):202-222 [PMID: 31541534]
  15. J Int AIDS Soc. 2017 Jun 28;20(1):21579 [PMID: 28691435]
  16. Ann Intern Med. 2007 Aug 21;147(4):224-33 [PMID: 17638714]
  17. J Int AIDS Soc. 2017 Jun 28;20(1):21708 [PMID: 28691441]
  18. J Med Internet Res. 2020 Aug 14;22(8):e18747 [PMID: 32795992]
  19. Syst Rev. 2016 Dec 5;5(1):210 [PMID: 27919275]

MeSH Term

Humans
Cluster Analysis
Databases, Factual
HIV Infections
Machine Learning
Natural Language Processing
Review Literature as Topic

Word Cloud

Created with Highcharts 10.0.0papersLiteRevrelevantLRsmanualtopicsscreeningresearchquestionautomationnaturallanguageprocessingmachinelearningBasedsearchfulltextclusteringinterestrecalltopicidentifiedLiteraturereviewssupporttoolresearcherstechniquespaperLRusingperformsabstractstextsprocessedtermfrequency-inversedocumentfrequencymatrixUsingcorpuslistkeywordsuseradditionalkeyk-NNsuggestedperformanceacuteearlyHIVworksavedsamplingmoduletotal5%87foundrateBACKGROUND:identifyevaluatesynthesizeparticularadvanceunderstandingdecision-makingHoweverespeciallytraditionalsystematicslowresource-intensivebecomeoutdatedquicklyOBJECTIVE:advancedenhancedversionexistingdesignedassistconductingimplementationcutting-edgetechnologiespresentcomprehensiveexplanationLiteRev'scapabilitiesmethodologyevaluationaccuracyefficiencyhighlightingbenefitsMETHODS:user'squeryautomatedwiderangeopen-accessdatabasesretrievesmetadataresultingincludingavailablerepresenteddimensionalityreductionpairwisecontrolledmanifoldapproximationhierarchicaldensity-basedspatialapplicationsnoisedivideddifferentdescribedimportantcanselectoneseveralenterrefineprovideinputsk-nearestneighborsuggestspotentiallyinterestingtaggingonestriggersnewsearchesassessranparallelburdencareinfectionsub-SaharanAfricaassessedtruefalsepredictivevaluesRESULTS:extractedtransformed631uniquePubMedmodeling16highlighted218k-NNs19361331wholecorrectly64abstract736%Compared4248manuallyrepresents56%CONCLUSIONS:presentedfeaturesfunctionalitiesusesmethodsstreamlineaccelerategettingquickin-depthoverviewsAutomatedReviewToolStreamliningAcceleratingResearchNaturalLanguageProcessingMachineLearning:DescriptivePerformanceEvaluationStudyliteraturereview

Similar Articles

Cited By