Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

iProLINK

General information

URL: http://proteininformationresource.org/iprolink/
Full name: integrated Protein Literature INformation and Knowledge
Description: iProLINK is a resource with access to text mining tools and annotated corpora developed in house. The collection of data sources can be utilized by computational and biological researchers to explore literature information on proteins and their features or properties.
Year founded: 2004
Last update:
Version:
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: University of Delaware
Address:
City:
Province/State:
Country/Region: United States
Contact name (PI/Team): Cathy H. Wu
Contact email (PI/Helpdesk): wuc@udel.edu

Publications

15556482
iProLINK: an integrated protein resource for literature mining. [PMID: 15556482]
Hu ZZ, Mani I, Hermoso V, Liu H, Wu CH.

The exponential growth of large-scale molecular sequence data and of the PubMed scientific literature has prompted active research in biological literature mining and information extraction to facilitate genome/proteome annotation and improve the quality of biological databases. Motivated by the promise of text mining methodologies, but at the same time, the lack of adequate curated data for training and benchmarking, the Protein Information Resource (PIR) has developed a resource for protein literature mining--iProLINK (integrated Protein Literature INformation and Knowledge). As PIR focuses its effort on the curation of the UniProt protein sequence database, the goal of iProLINK is to provide curated data sources that can be utilized for text mining research in the areas of bibliography mapping, annotation extraction, protein named entity recognition, and protein ontology development. The data sources for bibliography mapping and annotation extraction include mapped citations (PubMed ID to protein entry and feature line mapping) and annotation-tagged literature corpora. The latter includes several hundred abstracts and full-text articles tagged with experimentally validated post-translational modifications (PTMs) annotated in the PIR protein sequence database. The data sources for entity recognition and ontology development include a protein name dictionary, word token dictionaries, protein name-tagged literature corpora along with tagging guidelines, as well as a protein ontology based on PIRSF protein family names. iProLINK is freely accessible at http://pir.georgetown.edu/iprolink, with hypertext links for all downloadable files.

Comput Biol Chem. 2004:28(5-6) | 27 Citations (from Europe PMC, 2025-12-20)

Ranking

All databases:
4906/6895 (28.861%)
Gene genome and annotation:
1480/2021 (26.818%)
Literature:
420/577 (27.383%)
4906
Total Rank
26
Citations
1.238
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2018-02-09
Curated by:
Lina Ma [2018-12-17]
[2018-12-05]
Hao Zhang [2018-03-01]