Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

CHEMDNER

General information

URL: http://biosemantics.org/chemdner-patents
Full name:
Description: We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small.Database URL: http://biosemantics.org/chemdner-patents.
Year founded: 2016
Last update:
Version:
Accessibility:
Accessible
Country/Region: Netherlands

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Erasmus University Rotterdam
Address: Department of Medical Informatics, Erasmus University Medical Center, PO Box 2040, 3000 CA Rotterdam
City:
Province/State: Rotterdam
Country/Region: Netherlands
Contact name (PI/Team): Jan A. Kors
Contact email (PI/Helpdesk): j.kors@erasmusmc.nl

Publications

27141091
Chemical entity recognition in patents by combining dictionary-based and statistical approaches. [PMID: 27141091]
Akhondi SA, Pons E, Afzal Z, van Haagen H, Becker BF, Hettne KM, van Mulligen EM, Kors JA.

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small.Database URL: http://biosemantics.org/chemdner-patents.

Database (Oxford). 2016:2016() | 9 Citations (from Europe PMC, 2026-04-04)

Ranking

All databases:
5350/6932 (22.836%)
Health and medicine:
1339/1755 (23.761%)
5350
Total Rank
9
Citations
0.9
z-index

Community reviews

5 Stars (1)
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2018-01-27
Curated by:
Farah Nazir [2018-04-06]