Keyphrase Extraction for Technical Language Processing.

Alden Dima, Aaron Massey
Author Information
  1. Alden Dima: National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
  2. Aaron Massey: National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.

Abstract

Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.

Keywords

References

  1. J Pers Soc Psychol. 2003 Mar;84(3):608-18 [PMID: 12635920]
  2. Phys Rev B Condens Matter. 1996 Oct 15;54(16):11169-11186 [PMID: 9984901]
  3. J Res Natl Inst Stand Technol. 2019 Nov 01;124:1-5 [PMID: 34877166]
  4. Appl AI Lett. 2021 Sep;2(3): [PMID: 37057055]
  5. Proc Assoc Inf Sci Technol. 2021;58(1):218-229 [PMID: 34901396]
  6. J Phys Chem B. 2015 Oct 8;119(40):12912-20 [PMID: 26339862]

Word Cloud

Created with Highcharts 10.0.0articlestechnicalTLPtoolkitF-measure%extractionlanguageprocessingkeyphraseMauiSemEvalkeyphrasesapproachKeyphrasemethodstextfeaturestwofrstautomatictopicindexercollectionsThermodynamicsauthor-provided8importantfacetannotationtoolsofferprovisionmetadatanecessaryimposesadditionalrequirementstypicalnaturalNLPexaminedlenshypotheticalconsistscombinationclassiferssuitableuselow-resourceapplicationscomparedapproachesextraction:appliedtoolkit-baseduseddistributionalwordsphrasessecondwell-knownacademicmethodPerformancemeasuredliterature:1153JournalChemicalJCTcuratedNationalInstituteStandardsTechnologyResearchCenterTRC244Task5WorkshopSemanticEvaluationavailablealsoreader-providedfndingsindicatecompetitiveremovedTRC-JCTreported294obtained282usingNaïveBayesclassiferresulted20outperformedMaui's18ExtractionTechnicalLanguageProcessing

Similar Articles

Cited By

No available data.