BIG Search

BIG Search is a scalable text search engine built based on ElasticSearch (a highly scalable open-source full-text search and analytics engine based on Apache Lucene). It features cross-domain search and facilitates users to gain access to a wide range of biomedical data, not only from NGDC databases but also partner databases throughout the world.

e.g., PRJCA000126;SAMC000385;tp53;EGFR; human; KaKs_Calculator

145,859,298 records from 88 NGDC & Partner databases.

Database Records Number Description
GVM 16,739,725 Genome Variation Map
BioSample 8,958,670 Biological Sample Library
lncRNASNP2 4,443,771
Pancan-MNVQTLdb 1,800,202 A database to evaluate the effects of MNVs on multiple molecular phenotypes
GSA 1,624,119 Genome Sequence Archive
RMVar 1,615,252 RNA Modification associated variants database
Pancan-meQTL 1,567,926 A database to evaluate the effects of SNPs on methylation.
PancanQTL 1,412,031 A database to systematically identify cis-eQTLs and trans-eQTLs in 33 cancer types.
ncRNA-eQTL 1,288,527 A database to evaluate the effects of SNPs on ncRNA expression
circAltas 610,406 circAtlas 2.0
EWAS Data Hub 597,253 A data hub of DNA methylation array data and metadata
LncBook 409,204 A curated knowledgebase of human long non-coding RNAs.
EWAS Atlas 262,089 A knowledgebase of epigenome-wide association studies
SNP2APA 147,072 A database to evaluate the effects of SNPs on APA events
BBCancer 137,210 BBCancer: an expression atlas of blood-based biomarkers in the early diagnosis of cancers
HemAtlas 133,914 a database resource for hematopoiesis
LncExpDB 101,293 Expression Database of Human Long non-coding RNAs
DMS_ProteinOntology 79,371 PRO provides an ontological representation of protein-related entities by explicitly defining them and showing the relationships between them. Each PRO term represents a distinct class of entities (including specific modified forms, orthologous isoforms, and protein complexes) ranging from the taxon-neutral to the taxon-specific (e.g. the entity representing all protein products of the human SMAD2 gene is described in PR:Q15796; one particular human SMAD2 protein form, phosphorylated on the last two serines of a conserved C-terminal SSxS motif is defined by PR:000025934).
BioProject 72,560 Biological Project Library
Gene Expression Nebulas 64,158 A data portal of transcriptomic profiles across multiple species
GenTree 63,151 GenTree, the time tree of genes along the evolutionary history
MethBank 4.0 61,408 a database of DNA methylation across a variety of species
MethBank SRMs 60,479 Methbank, Single-base Resolution Methylomes (SRMs)
Methbank CRMs 60,415 Methbank, Consensus Reference Methylomes (CRMs)
SEGreg 53,156 Database of specifically expressed genes and regulation
VCG 43,801 Virtual Chinese Genome Database is a dynamic genome database of Chinese population.
HGD 42,901 Homologous Gene Database
CancerSEA 34,227 CancerSEA: a cancer single-cell state atlas
EPSD 30,679 Eukaryotic Phosphorylation Site Database
DEG 28,458 Database of Essential Genes
lnCAR 28,420 lnCAR | A comprehensive resource for lncRNAs from Cancer Arrays
DMS_MeSH 19,382 MeSH (Medical Subject Headings) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed.
dbPAF 18,792 database of Phospho-sites in Animals and Fungi
DMS_PMO 16,364 a standardized ontology for human precision medicine with consistent, reusable and sustainable descriptions of human disease terms, genomic molecular, phenotype characteristics and related medical vocabulary disease concepts through collaborative efforts of researchers at Institute of Medical Information, Chinese Academy of Medical Sciences.
GWH 15,440 Genome Warehouse
AnimalTFDB 8,266 Animal Transcription Factor Database
BrainBase 7,598 Brain Disease Knowledgebase
ZCURVE_CoVdb 7,054 Database of Essential Genes
DMS_SnomedCT_US 4,506 The SNOMED CT United States (US) Edition is the official source of SNOMED CT for use in US healthcare systems. The US Edition is a standalone release that combines the content of both the US Extension and the International releases of SNOMED CT.
GSA for Human 3,851 Genome Sequence Archive for Human
Taxonomy 3,217 The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases. This currently represents about 10% of the described species of life on the planet.
LncRNAWiki 2.0 2,503 LncRNAWiki 2.0 is devoted to community curation of human long non-coding RNAs (lncRNAs) to provide a comprehensive and up-to-date resource of functionally annotated lncRNAs. It incorporates a comprehensive collection of experimentally studied lncRNAs and integrates a wealth of their annotations based on a standardized curation model, and improves curation quality through expert curator review and community error report.
DMS_SnomedCT_International 1,897 SNOMED International determines global standards for health terms, an essential part of improving the health of humankind. We are committed to maintaining and growing our leadership as the global experts in healthcare terminology, ensuring SNOMED CT is the global language for clinical terms.
Database Commons 840 a curated catalogue of biological databases.
BioCode 641 Archive Bioinformatics Codes for Open Source Projects
PTMD 594 A database of human disease-associated post-translational modifications
Brain Catalog 517 a One-Stop Shop for Brain-related Traits
CellMarker 467 CellMarker: a manually curated resource of cell markers in human and mouse.
NODE 405 The National Omics Data Encyclopedia
DMS_Chemical Entities of Biological Interest Ontology 264 Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
RhesusBase Genes 206
ASCancer Atlas 205 A comprehensive knowledgebase of alternative splicing in human cancers
OMIX 142 OMIX
EDK 110 Editome Disease Knowledgebase
GeneOntology 103 The Gene Ontology knowledgebase provides a computational representation of our current scientific knowledge about the functions of genes (or, more properly, the protein and non-coding RNA molecules produced by genes) from many different organisms, from humans to bacteria. It is widely used to support scientific research, and has been cited in tens of thousands of publications.
DMS_Swissprot 59 UniProtKB/Swiss-Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.
eLMSG 59 An eLibrary of Microbial Systematics and Genomics
VFDB 59 Virulence Factor Database
iPCD 54 database of PCD regulators
DMS_ICD-10-IN 49 International Classification of Diseases 10th Revision. The global standard for diagnostic health information.
lncRNASNP v3 45 lncRNASNP v3: a comprehensive resources for functional variants in long non-coding RNAs.
DMS_ICD-10-CM 44 An extended version of ICD-10-CM with selected ICD-9-CM (The International Classification of Diseases) diagnosis codes.
Cell Taxonomy 39 Cell Taxonomy is a curated repository of cell types with multifaceted characterization.
DMS_Drugbank 37 a comprehensive, free-to-access, online database containing information on drugs and drug targets.
DMS_Human Phenotype Ontology 29 The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains over 13,000 terms and over 156,000 annotations to hereditary diseases.
iEKPD 29 Integrated annotations for Eukaryotic protein Kinases, protein Phosphatases & phosphoprotein-binding Domains
ICG 27 internal control genes
PLMD 26 Protein Lysine Modifications Database
DMS_ICD-10-PCS 22 ICD-10-PCS is a totally new coding system designed to better accommodate the rapidly changing world of procedures. ICD-10-PCS provides a multi-axial design to the codes and is similar in design to Logical Observation Identifiers Names and Codes (LOINC).
DMS_ICD-10 17 International Classification of Diseases 10th Revision. The global standard for diagnostic health information.
hTFtarget 13 In this hTFtarget database, we collected comprehensive human TF ChIP-Seq data and customized an analysis workflow to identify reliable TF targets with taking epigenomic states into account
CGGA 11 Chinese Glioma Genome Atlas
HGNC 7 The HGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication.
DMS_Ensembl 6 Ensembl supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data.
CGDB 6 Circadian Gene Database
SequenceOntology 4 The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. SO includes different kinds of features which can be located on the sequence. Biological features are those which are defined by their disposition to be involved in a biological process.
TCOD 4 A multi-omics data platform for tropical crops
VarClear 3 Gene Variation Interpretation Database
KGCoV 3 KGCoV(Knowledge Graph of SARS-CoV-2) structures and matches COVID-19 epidemiological information and SARS-CoV-2 genomic data with combined curation methods, and integrates variation information generated by bioinformatic tools.
DoriC 2 Database of Replication Origins
iUUCD 2 integrated annotations for Ubiquitin and Ubiquitin-like Conjugation Database
DMS_eLMSG 1
GTDB 1 Glycosyltransferases Database
GWAS Atlas 1 GWAS Atlas is a curated resource of genome-wide variant-trait associations
OpenLB 4,703,798 Open Library of Bioscience
Genbase Nucleotide 42,724,795 a collection of nucleotide sequences from several sources
Genbase Protein 52,631,511 a collection of protein sequences from several sources
RCoV19 3,113,323 Resource for Coronavirus 2019
Database Records Number Description

Powered by EBISearch

Database Records Number Description

Powered by NCBI Entrez

Database Records Number Description

Powered by EBI AlphaFold DB