Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

Harmonizome

General information

URL: http://amp.pharm.mssm.edu/Harmonizome
Full name:
Description: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources.
Year founded: 2016
Last update: 2025-01-06
Version: v3.0
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: Icahn School of Medicine at Mount Sinai
Address: Department of Pharmacology and Systems Therapeutics, Department of Genetics and Genomic Sciences, BD2K-LINCS Data Coordination and Integration Center (DCIC), Mount Sinai's Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG)
City: New York
Province/State: New York
Country/Region: United States
Contact name (PI/Team): Avi Ma’ayan
Contact email (PI/Helpdesk): avi.maayan@mssm.edu

Publications

39565209
Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources. [PMID: 39565209]
Diamant I, Clarke DJB, Evangelista JE, Lingam N, Ma'ayan A.

By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene-attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.

Nucleic Acids Res. 2025:53(D1) | 48 Citations (from Europe PMC, 2025-12-13)
27374120
The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. [PMID: 27374120]
Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A.

Genomics, epigenomics, transcriptomics, proteomics and metabolomics efforts rapidly generate a plethora of data on the activity and levels of biomolecules within mammalian cells. At the same time, curation projects that organize knowledge from the biomedical literature into online databases are expanding. Hence, there is a wealth of information about genes, proteins and their associations, with an urgent need for data integration to achieve better knowledge extraction and data reuse. For this purpose, we developed the Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources. We extracted, abstracted and organized data into ?72 million functional associations between genes/proteins and their attributes. Such attributes could be physical relationships with other biomolecules, expression in cell lines and tissues, genetic associations with knockout mouse or human phenotypes, or changes in expression after drug treatment. We stored these associations in a relational database along with rich metadata for the genes/proteins, their attributes and the original resources. The freely available Harmonizome web portal provides a graphical user interface, a web service and a mobile app for querying, browsing and downloading all of the collected data. To demonstrate the utility of the Harmonizome, we computed and visualized gene-gene and attribute-attribute similarity networks, and through unsupervised clustering, identified many unexpected relationships by combining pairs of datasets such as the association between kinase perturbations and disease signatures. We also applied supervised machine learning methods to predict novel substrates for kinases, endogenous ligands for G-protein coupled receptors, mouse phenotypes for knockout genes, and classified unannotated transmembrane proteins for likelihood of being ion channels. The Harmonizome is a comprehensive resource of knowledge about genes and proteins, and as such, it enables researchers to discover novel relationships between biological entities, as well as form novel data-driven hypotheses for experimental validation.Database URL: http://amp.pharm.mssm.edu/Harmonizome. © The Author(s) 2016. Published by Oxford University Press.

Database (Oxford). 2016:2016() | 1126 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
134/6895 (98.071%)
Metadata:
15/719 (98.053%)
134
Total Rank
1,109
Citations
123.222
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2017-03-28
Curated by:
shaosen zhang [2025-07-10]
Lina Ma [2017-06-01]
Shixiang Sun [2017-03-28]