HGD: An Integrated Homologous Gene Database Across Multiple Species

NGDC  2022-11-25


Homologous genes, referred to as genes derived from a common ancestor, are often used to decipher evolutionary processes and infer potential functions of genes, and are of great value in evolutionary genomics studies and the comparative genomics research.

Recently, researchers from the National Genome Science Data Center, Beijing Institute of Genomics of Chinese Academy of Sciences (China National Center for Bioinformation) have developed the Homologous Gene Database (HGD). This work was published online in Nucleic Acids Research with the title "HGD: an integrated homologous gene database across multiple species".

Existing homolog gene resources vary in terms of the inferring methods, homologous relationships and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. To address this problem, HGD incorporates multiple international homology datasets, integrating multi-species, multi-resources and multi-omics data to provide a public, one-stop data service to complement existing resources. In addition, HGD delivers annotation profiles of homologous gene functions for inter-species comparisons, including gene ontology data (GO) and multi-omics annotation data related to traits, variants and expressions, providing a unified panel for comparative studies of homologous gene functions across species. Currently, HGD houses 112,383,644 homologous pairs from 37 species covering 19 animals, 16 plants and 2 microorganisms, with 10 model organisms in particular. Meanwhile, HGD integrates a variety of annotations from public resources, comprising 16,909 homologs with traits, 276,670 homologs with variants, 398,573 homologs with expression and 536,852 homologs with gene ontology (GO) annotations. Users can input various keywords, for example, gene symbols, UniProt ID, Ensembl protein ID, gene description to perform exact or fuzzy retrieve.

HGD also integrates information from multiple data resources from the National Genome Science Data Center (NGDC), including Genome Variation Map (GVM), Gene Expression Nebulas (GEN) and GWAS Atlas.

This work is supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, the National Key R&D Program of China, the Genomics Data Center Operation and Maintenance of the Chinese Academy of Sciences and the National Natural Science Foundation of China.

Contact:

Prof. ZHAO Wenming

Email: zhaowm@big.ac.cn