Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

BioC

General information

URL: http://bioc.sourceforge.net/
Full name:
Description: BioC is a simple format to share text data and annotations. It allows a large number of different annotations to be represented. We provide simple code to hold this data, read it and write it back to XML, and perform some sample processing.
Year founded: 2013
Last update: NA
Version: v1.0
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: National Library of Medicine
Address: Bethesda, MD 20894, USA
City: Bethesda
Province/State: Maryland
Country/Region: United States
Contact name (PI/Team): Rezarta Islamaj Do_an
Contact email (PI/Helpdesk): Rezarta.Islamaj@nih.gov

Publications

24048470
BioC: a minimalist approach to interoperability for biomedical text processing. [PMID: 24048470]
Comeau DC, Islamaj Doğan R, Ciccarese P, Cohen KB, Krallinger M, Leitner F, Lu Z, Peng Y, Rinaldi F, Torii M, Valencia A, Verspoor K, Wiegers TC, Wu CH, Wilbur WJ.

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/

Database (Oxford). 2013:2013() | 114 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
1458/6895 (78.869%)
Literature:
134/577 (76.95%)
1458
Total Rank
113
Citations
9.417
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2015-06-20
Curated by:
Mengwei Li [2016-03-31]
Mengwei Li [2016-02-20]
Mengwei Li [2015-11-29]
Mengwei Li [2015-06-26]