Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

Automatic concept recognition database

General information

URL: http://bio-lark.org/hpo_res;;http://bio-lark.org/hpo_res
Full name: Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
Description: It is the first corpus of manually annotated abstracts using the HPO. The corpus represents a valuable resource for gaining a deeper understanding of the linguistic characteristics of phenotypes both from an overall perspective, and with respect to their classification according to the HPO top-level categories.
Year founded: 2015
Last update:
Version:
Accessibility:
Unaccessible
Country/Region: Australia

Classification & Tag

Data type:
DNA
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: University of Queensland
Address: Conjoint Senior Lecturer, St Vincent's Clinical School, Faculty of Medicine, UNSW Australia
City:
Province/State:
Country/Region: Australia
Contact name (PI/Team): t.groza@garvan.org.au
Contact email (PI/Helpdesk): t.groza@garvan.org.au

Publications

25725061
Automatic concept recognition using the human phenotype ontology reference and test suite corpora. [PMID: 25725061]
Groza T, Köhler S, Doelken S, Collier N, Oellrich A, Smedley D, Couto FM, Baynam G, Zankl A, Robinson PN.

Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html. Database URL: http://bio-lark.org/hpo_res.html.

Database (Oxford). 2015:2015() | 45 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
2595/6895 (62.379%)
Genotype phenotype and variation:
384/1005 (61.891%)
Metadata:
257/719 (64.395%)
Health and medicine:
652/1738 (62.543%)
2595
Total Rank
44
Citations
4.4
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2018-01-27
Curated by:
Sidra Younas [2018-04-12]
Sidra Younas [2018-04-09]
Zhaohua Li [2018-01-27]