Bioinformatics databases

There are thousands online bioinformatics databases available on the Internet. The best way to find a database that you are interested in is to look through the Database Common at the Beijing Institute of Genomics, Chinese Academy of Sciences.

Sequence databases

  • UniProt – The main web site for international protein sequence database which consists of the protein knowledgebase (UniProtKB), the sequence clusters (UniRef) and the sequence archive (UniParc).
  • neXtProt – The human proteome platform.
  • RefSeq – The Reference Sequence collection constructed by NCBI to provide a comprehensive, integrated, non-redundant set of DNA, RNA sequences and protein products.
  • GenBank – The web portal to the NIH genetic sequence database maintained by NCBI, also a part of the International Nucleotide Database Collaboration. Literature citation, release notes and an example record can be found in this page.
  • ENA – The web portal to nucleotide sequence database maintained by EBI, also a part of the International Nucleotide Database Collaboration. Various documentations such as release notes, database statistics, user guide, feature table definition and sample entry and FAQs are provided.
  • GSA – A data repository for omics data maintained by the Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), serving as a primary archive of genome sequencing data.

Protein structure

  • RCSB – The main repository of macromolecular structures maintained by the Research Collaboration for Structural Bioinformatics.
  • PDBe – The entry point for the EBI macromolecular structure database.
  • PDBSum – The PDB summary database maintained by EBI.
  • MMDB – The macromolecular database maintained by NCBI.
  • BMRB – The biological magnetic resonance data bank maintained at University of Wisconsin-Madison.
  • SBKB – The structural biology knowledgebase maintained by the Protien Structure Initiative.
  • SCOP – The database of Structure Classification of Proteins developed and maintained by Cambridge University.
  • CATH – The database of Calcification, Architecture, Topology and Homologous superfamily developed and maintained by University College, London.

Databases of protein domain, function and expression

  • InterPro – Classification of protein families maintained at the EBI.
  • CDD – A database of conserved protein domains created and maintained by the NCBI structure group.
  • ProDom – A database of comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases, developed and maintained by the University Claude Bernard, France.
  • Expression Atlas – EBI open resource that gives users a powerful way to find information about gene and protein expression.
  • HPA – A web site for the the human protein atlas which shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry images, developed and maintained y Proteome Resource Center, Sweden.

Family databases

  • PFam – The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
  • RFam – The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
  • DFam – The Dfam database is a collection of Repetitive DNA element sequence alignments, hidden Markov models (HMMs) and matches lists for complete Eukaryote genomes.
  • TreeFam – A database composed of phylogenetic trees inferred from animal genomes. It provides orthology/parology predictions as well the evolutionary history of genes.

Pathway databases

  • REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
  • REACTOME Mirror – The REACTOME mirror at the National Center for Protein Science, Beijing (NCPSB).
  • Plant REACTOME – The Plant REACTOME at Gramene.
  • KEGG – Kyoto Encyclopedia of Genes and Genomes.

Genome databases and genome browsers

  • ENSEMBL – The web server of the European eukaryotic genome resource developed by EBI and the Sanger Institute.
  • UCSC Genome Information – The genome browser website containing the reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz (UCSC).
  • Phytozome – A tool for green plant comparative genomics, maintained by the Center for Integrative Genomics, Joint Genome Institute.
  • Gramene – A curated open-source data resource for plant genome analysis.
  • NCBI Genome Data Viewer – A genome browser supporting the exploration and analysis of eukaryotic RefSeq genome assemblies.
  • NCBI Genome – The entry portal to various NCBI genomic biology tools and resources.
  • NCBI Genome Information – The NCBI genomic information table lists the general information of genomes for all species.
  • VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
  • GOLD – Genomes Online Database, a comprehensive information resource for complete and ongoing genome sequencing projects with flowcharts and tables of statistical data.

Database of Model Organism

  • MGI – The international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
  • RGD – The Rat Genome Database at the Wisconsin University, to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research.
  • XenBase – The Aferican clawed frog Xenopus laevis and Xenopus tropicalis biology and genomics resource.
  • ZFIN – The Zebrafish International Resource Center.
  • Flybase – A comprehensive database of drosophila genes and genomes maintained by Indiana University.
  • WormBase – The biology and genome resource of the Caenorhabditis elegans genome.
  • SGD – The Saccharomyces Genome database.

Plant Databases

  • PlantTFDB – The database of plant transcription factors built by the Center for Bioinformatics, Peking University.
  • Crop Database – Crop databases at the Chinese Academy of Agricultural Sciences.
  • TAIR – The Arabidopsis information resource maintained by Stanford University. It includes the complete genome sequence along with gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community.
  • AraPort – Araport is a one-stop-shop for Arabidopsis thaliana genomics. Araport offers gene and protein reports with orthology, expression, interactions and the latest annotation, plus analysis tools, community apps, and web services. Araport is 100% free and open-source. Registered members can save their analysis, publish science apps, and post announcements.
  • IC4R – A rice knowledgebase for data integration through community-contributed modules, integrating data from remote resources through web APIs and featuring collaborative integration of rice data from multiple committed modules and low costs for database update and maintenance.
  • Oryzabase – A comprehensive rice science database maintained by National Institute of Genetics, Japan. It contains genetic resource stock information, gene dictionary, chromosome maps, mutant images and fundamental knowledge of rice science.
  • MaizeDB – The community database for biological information about the crop plant Zea mays ssp. mays, with genetic, genomic, sequence, gene product, functional characterization, literature reference.
  • SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
  • SGN – A collection of data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
  • ICuGI – The web portal for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc.
  • GDR – The genome database for Rosaceae, including apple, pear, peach, apricot, strawberry, rose, etc.

Bacrerial Genome Databases

  • PATRIC – the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.

Virus Genome Databases

  • Viral Genomes – the main page of NCBI viral genome information resource.
  • GISAID – Global Initiative on Sharing Avian Influenza Data.
  • OpenFlu – A database for human and animal influenza virus.
  • NCBI Flu – NCBI Influenza Virus Resource with influenza genomic data and analysis tools.
  • Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Database Journals

  • NAR Database Issue – The Journal Nucleic Acids Research publishes a Database Issue on the 1st January each year.
  • JBDC – The online Open Access Journal of Biological Databases and Curation.
Scroll to Top