A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics.

Milan Gritta, Mohammad Taher Pilehvar, Nigel Collier
Author Information
  1. Milan Gritta: Language Technology Lab (LTL), Department of Theoretical and Applied Linguistics (DTAL), University of Cambridge, 9 West Road, Cambridge, CB3 9DP UK. ORCID
  2. Mohammad Taher Pilehvar: Language Technology Lab (LTL), Department of Theoretical and Applied Linguistics (DTAL), University of Cambridge, 9 West Road, Cambridge, CB3 9DP UK.
  3. Nigel Collier: Language Technology Lab (LTL), Department of Theoretical and Applied Linguistics (DTAL), University of Cambridge, 9 West Road, Cambridge, CB3 9DP UK.

Abstract

Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the , which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained . (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models.

Keywords

References

  1. Trop Med Health. 2017 Oct 26;45:33 [PMID: 29093641]
  2. Lang Resour Eval. 2018;52(2):603-623 [PMID: 31258456]
  3. Sci Transl Med. 2016 Jun 1;8(341):341ps12 [PMID: 27252173]
  4. Philos Trans A Math Phys Eng Sci. 2010 Aug 28;368(1925):3875-89 [PMID: 20643682]
  5. Neural Comput. 1998 Sep 15;10(7):1895-1923 [PMID: 9744903]
  6. Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:2786-2791 [PMID: 37351085]
  7. Nat Commun. 2017 Oct 24;8(1):1124 [PMID: 29066781]

Grants

  1. MR/M025160/1/Medical Research Council

Word Cloud

Created with Highcharts 10.0.0frameworkEvaluationnewNamedEntityRecognitionPartgeoparsingevaluationmetricsviafine-graineddatasetGeotaggingToponymGeocodinglearningToponymsEmpiricalmethodsthusfarlackedstandarddescribingtaskdatausedcomparestate-of-the-artsystemsmadeinconsistentevenunrepresentativerealworldusagelackdistinctionnecessitatesguidelinesconsolidationdetailedtoponymtaxonomyimplicationsNERbeyondaddressdeficienciesmanuscriptintroducesthreeparts1TaskDefinition:clarifiedcorpuslinguisticanalysisproposing2Metrics:discussedreviewedrigorousincludingrecommendationsNER/Geoparsingpractitioners3data:sharedcalledprovidetest/trainexamplesenableimmediateusecontributionsadditionResolutionalsosuitableprototypingevaluatingmachineNLPmodelspragmaticguideevaluation:pragmaticsGeonamesGeoparsingMachineNaturallanguageunderstandingPragmaticsresolution

Similar Articles

Cited By