A query suggestion workflow for life science IR-systems.

Maria Esch, Jinbo Chen, Stephan Weise, Keywan Hassani-Pak, Uwe Scholz, Matthias Lange
Author Information
  1. Maria Esch: Leibniz Institute of Plant Genetics and Crop Plant Research, Germany.
  2. Jinbo Chen: Leibniz Institute of Plant Genetics and Crop Plant Research, Germany.
  3. Stephan Weise: Leibniz Institute of Plant Genetics and Crop Plant Research, Germany.
  4. Keywan Hassani-Pak: Rothamsted Research, United Kingdom.
  5. Uwe Scholz: Leibniz Institute of Plant Genetics and Crop Plant Research, Germany.
  6. Matthias Lange: Leibniz Institute of Plant Genetics and Crop Plant Research, Germany.

Abstract

Information Retrieval (IR) plays a central role in the exploration and interpretation of integrated biological datasets that represent the heterogeneous ecosystem of life sciences. Here, keyword based query systems are popular user interfaces. In turn, to a large extend, the used query phrases determine the quality of the search result and the effort a scientist has to invest for query refinement. In this context, computer aided query expansion and suggestion is one of the most challenging tasks for life science information systems. Existing query front-ends support aspects like spelling correction, query refinement or query expansion. However, the majority of the front-ends only make limited use of enhanced IR algorithms to implement comprehensive and computer aided query refinement workflows. In this work, we present the design of a multi-stage query suggestion workflow and its implementation in the life science IR system LAILAPS. The presented workflow includes enhanced tokenisation, word breaking, spelling correction, query expansion and query suggestion ranking. A spelling correction benchmark with 5,401 queries and manually selected use cases for query expansion demonstrate the performance of the implemented workflow and its advantages compared with state-of-the-art systems.

Grants

  1. BBS/E/C/00004941/Biotechnology and Biological Sciences Research Council

MeSH Term

Algorithms
Biological Science Disciplines
Computational Biology
Databases, Factual
Humans
Information Storage and Retrieval
Language
PubMed
Search Engine
Software
User-Computer Interface
Workflow

Word Cloud

Similar Articles

Cited By