Database Commons
Database Commons

a catalog of worldwide biological databases

Help

About

Database Commons is a manually curated catalog of worldwide biological databases, which has been frequently updated and enriched since its inception in 2015. It aims to provide a full landscape of biological databases throughout the world and enable easy retrieval and access to a specific collection of databases of interest.

Database Commons integrates relevant information for all collected databases (including database name, URL, description, hosted institution, related publication(s), contact information, etc.) and catalogues each database based on its data type, species, subjects, locating, accordingly enabling people to easily find a specific collection of databases of interest. We rank databases with total citations as well as the normalized z-index to highlight popular and high-quality databases. Meanwhile, Database Commons allows anyone to rate any database by considering data quality & quantity, content organization & presentation, and system accessibility & reliability, facilitating efficient location of appropriate databases of interest.

Together, Database Commons features cataloguing databases under different criteria and incorporating community rating on database utility, thus serving as a valuable resource for effective exploitation of all publicly available databases.

Classification and Labelling

Databases are classified based on data type, data object and database subjects. In addition, major species and keywords are tagged to further indicate the specific fields the database is related with.

Data Object

A database may encompass multiple data objects. In Database Commons, there are a total of 6 data objects as detailed below.

  1. Animal
  2. Plant
  3. Fungi
  4. Bacteria
  5. Archaea
  6. Virus

Data Type

A database may encompass multiple data types. In Database Commons, there are a total of 3 data types as detailed below.

  1. DNA: gene/chromosome/genome sequence, DNA mutation/modification, DNA structure, DNA elements including probe, primer, motif, repeat sequence, etc.
  2. RNA: RNA sequence, coding & non-coding transcripts, alternative splicing, RNA editing/modification, RNA probe and primer, RNA motif and structure, RNA expression
  3. Protein: protein sequence, protein motif and domain, protein structure, protein modification, protein-protein interaction, protein expression

Database Category

A database may encompass multiple database categories. In Database Commons, there are a total of 13 database categories as detailed below.

  1. Raw bio-data: raw data of nucleic acid/protein sequencing and microarray, and image, digit, video, audio from biological and medical research
  2. Gene, genome and annotation: gene/genetic element annotation, gene structure/family/motif/domain annotation, genome annotation, comparative genome (metagenome, pan-genome) analysis and annotation
  3. Genotype, phenotype and variation: genotypes, phenotypes, multiple-scale variations (including SNP, INDEL, CNV, chromosomal rearrangement and other structural variation), genotype-phenotype associations
  4. Phylogeny and homology: phylogeny reconstruction of genes/species, evolutionary history/process/event among individuals/organisms, homology identification
  5. Expression: RNA/protein expression, expression abundance and pattern, RNA probe or primer used for gene expression detection, differential expression analysis
  6. Modification: DNA modification, post-transcriptional modification of mRNA and non-coding RNA, post-translational modification of protein, modification type/technology/function
  7. Structure: secondary, tertiary and quaternary structure of DNA/RNA/protein, chromatin structure
  8. Interaction: direct (physical) and indirect (functional) associations, including protein-protein interaction, RNA-protein interaction, DNA-protein interaction, gene regulatory interaction, biochemical reaction, antigen and antibody, and genetic interaction
  9. Pathway: biological pathways for metabolic, signaling, gene regulatory analysis
  10. Health and medicine: disease variation/genotype-phenotype association, immune reaction, disease model, clinical biomarker, therapeutic target, drug & chemical compound, pharmacogenomics and pharmacodynamics, electronic health record
  11. Standard, ontology and nomenclature: standard, ontology and nomenclature for biological entities
  12. Literature: literature information, literature/text mining, textual annotation based on literature
  13. Metadata: metadata information for biological entities, e.g., project/sample/experiment/run/database/tool

Curation Model

For each database, four sections are curated, including “General Information”, “Classification and Labelling”, “Contact Information”, and “Publication”, involving 21 items. “General Information” details basic information such as short name, full name, URL, availability. “Classification and Labelling” aims to classify these databases based on their data type, data object and database subject, and lists more labels to indicate the distinctive features of the database. “Contact information” is used to contact people who take charge of database maintenance, and they are encouraged to participate in database curation. All the information provided for each database entry is manually curated by multiple curators.

To ensure curation quality, only registered users are allowed to edit, submit, score database. We have provided a curation handbook to introduce the standards and examples for each section and item, which is available here. Curation Model

Curation Rules

Curation Permission
  • To ensure content reliability, only registered users are allowed to edit/curate the database information.
  • Registered users can apply for curation permission by email. The Database Commons Team will review the applicants' qualification but does not guarantee the application will be approved.
  • The Database Commons Team performs curation from time to time, and also encourages database developers/curators/team-members to curate their own databases.
  • Database Commons reserves the right to stop, to limit or to terminate your curation permission for any inappropriate or disruptive behavior on our website or revelant webpages.
  • If you post or send offensive, inappropriate or objectionable content anywhere on or to our websites or otherwise engage in any disruptive behavior on any of our services, we may use your personal information from our security logs to stop such behavior and terminate your account. Where we reasonably believe that you are or may be in breach of any applicable laws we may use your personal information to inform relevant third parties about the content and your behavior.
Database Meta-Information
  • In terms of accessibility, databases can be classified into alive and dead, where the former are available, whereas the latter are unavailable temporarily or permanently due to various reasons.
  • Database Commons collects not only active databases but also dead ones, just considering that all related meta-information of dead databases can also provide important history and insights for users.
  • For dead databases, their meta-information are obtained and extracted from their related publications.
  • The "Year Founded" indicates which year the database was founded. Albeit debatably, Database Commons considers the year of its first publication as "Year Founded".
  • Many databases provide the last update information in the homepage and thus "Last Updated" was curated based on this.
  • URL is automatically obtained from publication and further manually curated by curators. This URL should direct users to the database homepage rather than other pages. If the URL has changed during update, it should be changed to the newest one.
  • The "Accessibility" includes two options, namely, "Accessible", "Unaccessible", which are manually curated and checked by curators
  • The "Description" is summarized by curators based on publication abstracts, which often should be concise and clear (1-3 sentences).
  • Controlled vocabularies are used for three meta-information items, viz., "Data Type", "Data Object", and "Database Category".
  • Three data types include "DNA", "RNA", and "Protein". A database may encompass multiple data types. If these three data types are not appliable, please select "Other".
  • There are a total of 6 data objects, viz., "Animal", "Plant", "Fungi", "Bacteria", "Archaea", and "Virus". A database may encompass multiple data objects. If no species information is available, input "NA".
  • There are a total of 13 database categories. A database may encompass multiple database categories.
  • For "Species", Latin names of the organisms are required and should be selected from the drop-down box. If the database covers quite a large number of organisms, users could input names of the major organisms.
  • Species list is obtained from NCBI Taxonomy database and some species may not be included in the present list.
  • Keywords are tagged to show the important features of databases. The singular form is preferred rather than the plural form. All letters should be in lowercase.
  • The contact information is provided to facilitate the update of database information, and it is curated based on the contact details in the database or the related publications. To ensure effective contact with database owners/developers, we give priority to the contact details shown in the database.
  • For "University/Institution", official English full name of the university/institution is required. If the university has multiple campuses, the campus's name should be included, e.g., University of California Santa Cruz. If institutions are affiliated to an academy, the academy's name should be listed, e.g., Beijing Institute of Genomics, Chinese Academy of Sciences.
Database Citation & Age
  • The "Citation" indicates the total citation count for a specific database, based on the summed citations (indexed by Europe PMC) over all its related publications.
  • Database age is calculated since the year of its first publication.
  • z-index is calculated by dividing citation by database age, and this index is conductive to reducing influence of database age and enables relatively fair comparison between newly constructed databases and old well-established databases.
  • Databases are ranked by z-index. Rank numbers among all databases and among specific database category/categories are listed in the database page.
  • For any given database, its related databases are classified into "Cited" and "Citing", where "Cited" represents databases that cite this database, while "Citing" represents databases that have been cited by this database.
  • Curation events are recorded by day. Curators may curate a specific database for many times per day, but this would be registered as one record in "Record metadata".

Evaluation System

The popular and high-quality biological and biomedical data, contributes greatly to biological and biomedical discoveries. Therefore, we incorporate evaluation system in Database Commons to measure database quality and impact.

There are four rating items, “Citation”, “z-index”, “Accessibility”, and “Community reviews”. “Citation” of a certain database is the total citations (indexed by Europe PMC) of all its published papers, and high citations always indicate popular and high-quality databases. “z-index” is calculated by dividing total citations by database age, and this rating item is conductive to reducing influence of database age and enables relatively fair comparison between newly constructed databases and the old well-known databases. “Accessibility” represents the accessibility status of the homepage, including the manually curated status and the analysis of HTTP status codes (listed as follows). “Community reviews” requires community engagement and it is a comprehensive evaluation of data quantity and quality, content organization & presentation, and system accessibility & reliability. Among the four rating items, Citation and z-index have been automatically calculated for all biological databases, and users can rank databases and refine search results based on the two items.

Community Rating

Database Commons features community rating on database utility by taking account of the following three criteria.

Data quality & quantity: consider data integrity, accuracy, standardization, consistency and comprehensiveness
Content organization & presentation: consider whether content is organized in an appropriate manner which makes content easily readable and understandable and is presented by user friendly web interface
System accessibility & reliability: consider whether system is always accessible and reliably working

A database containing high-quality curated data is abortive if data is poorly organized or presented.
A database containing high-quality curated data is unavailing if this database cannot be accessible or reliably working.

HTTP Status Codes

Here is a list of HTTP status codes with a brief explanation, which are represented by three digits and fall into two classes.

Accessible
2xx Success: e.g., 200 OK, that is standard response for successful HTTP requests.
3xx Redirection: e.g., 301 Moved Permanently
Unaccessible
4xx Client Error: e.g., 403 Forbidden, 404 Not Found
5xx Server Error: e.g., 500 Internal Server Error, 503 Service Unavailable

More information about HTTP status code can be found at Wikipedia.
In addition, unexpected exceptions including timeout, errors occurred when sending requests, etc., are indicated by "-1".

Database Usage

1. How to browse the biological databases?

In the browse page, all users can browse the biological databases by ‘Country/Region’, ‘Institution (Top 30)’, ‘Database Category’, ‘Data type’ or ‘Data object’ by selecting specific category from the drop-down boxes on the left of the page. Also, it is easy to view the databases by ‘z-index’, ‘Citation’, ‘Short name’ and the ‘Founded year’.

2. How to search the biological databases?

The home page provides global search for name, category, country, data type, etc. Search page allows both global search and advanced search, where users could quickly retrieve a specific group of databases of interest with customized filters.

3. How to submit the biological databases?

Only registered users are allowed to submit new databases in Database Commons. Please email us first if you would like to take part in the curation work. Curators will be given basic training for database curation, classification, and usage of the curation platform. You are able to curate or edit after administrator have upgraded the privileges. After login, click on ‘Submit’, and then input the database information for four sections based on the structured curation model. The curation handbook details the curation rules for each item.

4. How to edit the biological databases?

To ensure curation quality, only registered users are allowed to edit. Users can edit the databases by clicking the button nearby the database name in the database page, and update the information in the curation page. Don’t forget to click on ‘Save’ when you have made any changes.

5. How to score the biological databases?

To ensure curation quality, only registered users are allowed to score. Users can select the star number of ‘Data quality & quantity’, ‘Content organization & presentation’ and ‘System accessibility & reliability’, respectively, and then click on ‘Submit a review’.
More stars indicate higher quality.

6. How to cite Database Commons?

Database Commons: a curated catalogue of worldwide biological databases (in preparation)

Related Publications
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res, 2022. 50(D1): p. D27-D38. [PMID=34718731]
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021. Nucleic Acids Res, 2021. 49(D1): p. D18-D28. [PMID=33175170]
Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Res, 2020. 48(D1): p. D24-D33. [PMID=31702008]
Database Resources of the BIG Data Center in 2019. Nucleic Acids Res, 2019. 47(D1): p. D8-D14. [PMID=30365034]

Contact Information

National Genomics Data Center,
Beijing Institute of Genomics,
Chinese Academy of Sciences and China National Center for Bioinformation,
Beijing 100101, China
Email: databasecommons(AT)big.ac.cn
Tel: +86 (10) 84097845