DISEASES


35348648	Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. [PMID: 35348648] Grissa D, Junge A, Oprea TI, Jensen LJ. Abstract The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org. Database (Oxford). 2022:2022() \| 68 Citations (from Europe PMC, 2025-12-20)
25484339	DISEASES: text mining and data integration of disease-gene associations. [PMID: 25484339] Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ. Abstract Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Methods. 2015:74() \| 376 Citations (from Europe PMC, 2025-12-20)

Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. [PMID: 35348648]

Grissa D, Junge A, Oprea TI, Jensen LJ.

Abstract

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.

Database (Oxford). 2022:2022() | 68 Citations (from Europe PMC, 2025-12-20)

DISEASES: text mining and data integration of disease-gene associations. [PMID: 25484339]

Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ.

Abstract

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.

Methods. 2015:74() | 376 Citations (from Europe PMC, 2025-12-20)

URL:	https://diseases.jensenlab.org
Full name:	Disease-gene associations mined from literature
Description:	A system for extracting disease–gene associations from biomedical abstracts which consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases.
Year founded:	2015
Last update:
Version:
Accessibility:	Accessible
Country/Region:	Denmark

Data type:	Other
Data object:	Animal
Database category:	Health and medicine Literature
Major species:	Homo sapiens
Keywords:	text mining named entity recognition information extraction data integration web resource

University/Institution:	University of Copenhagen
Address:	Novo Nordisk Foundation Center for Protein Research, Blegdamsvej 3b, 2200 Copenhagen N, Denmark.
City:	Copenhagen
Province/State:
Country/Region:	Denmark
Contact name (PI/Team):	Lars Juhl Jensen
Contact email (PI/Helpdesk):	lars.juhl.jensen@cpr.ku.dk

Database Commons
a catalog of worldwide biological databases

a catalog of worldwide biological databases

Database Profile

General information

Classification & Tag

Contact information

Publications

Ranking

Community reviews

Word cloud

Tags

Related Databases

Record metadata

Database Commons a catalog of worldwide biological databases