Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

BioRED-BC8

General information

URL: https://codalab.lisn.upsaclay.fr/competitions/16381
Full name:
Description: BioRED-BC8 is a manually-curated corpus of 1 000 PubMed abstracts containing six entity types (disease, gene/protein, chemical, cell line, gene variant, species) and eight pairwise relation classes, all normalized to standard vocabularies. Each relation is labeled “novel” if it represents a new finding, enabling training and benchmarking of document-level biomedical relation-extraction systems.
Year founded: 2024
Last update: 2024-08-09
Version: v1.0
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: National Institutes of Health (NIH)
Address:
City: Bethesda
Province/State: Maryland
Country/Region: United States
Contact name (PI/Team): Zhiyong Lu
Contact email (PI/Helpdesk): Zhiyong.Lu@nih.gov

Publications

39126204
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop. [PMID: 39126204]
Rezarta Islamaj, Chih-Hsuan Wei, Po-Ting Lai, Ling Luo, Cathleen Coss, Preeti Gokal Kochar, Nicholas Miliaras, Oleg Rodionov, Keiko Sekiya, Dorothy Trinh, Deborah Whitman, Zhiyong Lu

The automatic recognition of biomedical relationships is an important step in the semantic understanding of the information contained in the unstructured text of the published literature. The BioRED track at BioCreative VIII aimed to foster the development of such methods by providing the participants the BioRED-BC8 corpus, a collection of 1000 PubMed documents manually curated for diseases, gene/proteins, chemicals, cell lines, gene variants, and species, as well as pairwise relationships between them which are disease-gene, chemical-gene, disease-variant, gene-gene, chemical-disease, chemical-chemical, chemical-variant, and variant-variant. Furthermore, relationships are categorized into the following semantic categories: positive correlation, negative correlation, binding, conversion, drug interaction, comparison, cotreatment, and association. Unlike most of the previous publicly available corpora, all relationships are expressed at the document level as opposed to the sentence level, and as such, the entities are normalized to the corresponding concept identifiers of the standardized vocabularies, namely, diseases and chemicals are normalized to MeSH, genes (and proteins) to National Center for Biotechnology Information (NCBI) Gene, species to NCBI Taxonomy, cell lines to Cellosaurus, and gene/protein variants to Single Nucleotide Polymorphism Database. Finally, each annotated relationship is categorized as 'novel' depending on whether it is a novel finding or experimental verification in the publication it is expressed in. This distinction helps differentiate novel findings from other relationships in the same text that provides known facts and/or background knowledge. The BioRED-BC8 corpus uses the previous BioRED corpus of 600 PubMed articles as the training dataset and includes a set of newly published 400 articles to serve as the test data for the challenge. All test articles were manually annotated for the BioCreative VIII challenge by expert biocurators at the National Library of Medicine, using the original annotation guidelines, where each article is doubly annotated in a three-round annotation process until full agreement is reached between all curators. This manuscript details the characteristics of the BioRED-BC8 corpus as a critical resource for biomedical named entity recognition and relation extraction. Using this new resource, we have demonstrated advancements in biomedical text-mining algorithm development. Database URL: https://codalab.lisn.upsaclay.fr/competitions/16381.

Database (Oxford). 2024:2024() | 3 Citations (from Europe PMC, 2026-03-28)

Ranking

All databases:
4471/6932 (35.516%)
Health and medicine:
1124/1755 (36.011%)
Standard ontology and nomenclature:
166/239 (30.962%)
Literature:
383/577 (33.795%)
4471
Total Rank
3
Citations
1.5
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2025-06-28
Curated by:
shaosen zhang [2025-08-04]
Yiran Zhan [2025-07-12]
liu yuxi [2025-06-28]