Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

General information

Full name: DataBase for automated Carbohydrate-active enzyme ANnotation
Description: dbCAN is a web server and DataBase for automated Carbohydrate-active enzyme ANnotation.
Year founded: 2012
Last update: 2019/07/10
Version: v.2
Real time : Checking...
Country/Region: United States

Classification and Labelling

Data type:
Data object:
Database category:
Major species:

Contact information

University/Institution: Northern Illinois University
Address: Northern Illinois University, DeKalb, IL, USA
Country/Region: United States
Contact name (PI/Team): Yanbin Yin
Contact email (PI/Helpdesk):


dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. [PMID: 29771380]
Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, Busk PK, Xu Y, Yin Y.

Complex carbohydrates of plants are the main food sources of animals and microbes, and serve as promising renewable feedstock for biofuel and biomaterial production. Carbohydrate active enzymes (CAZymes) are the most important enzymes for complex carbohydrate metabolism. With an increasing number of plant and plant-associated microbial genomes and metagenomes being sequenced, there is an urgent need of automatic tools for genomic data mining of CAZymes. We developed the dbCAN web server in 2012 to provide a public service for automated CAZyme annotation for newly sequenced genomes. Here, dbCAN2 ( is presented as an updated meta server, which integrates three state-of-the-art tools for CAZome (all CAZymes of a genome) annotation: (i) HMMER search against the dbCAN HMM (hidden Markov model) database; (ii) DIAMOND search against the CAZy pre-annotated CAZyme sequence database and (iii) Hotpep search against the conserved CAZyme short peptide database. Combining the three outputs and removing CAZymes found by only one tool can significantly improve the CAZome annotation accuracy. In addition, dbCAN2 now also accepts nucleotide sequence submission, and offers the service to predict physically linked CAZyme gene clusters (CGCs), which will be a very useful online tool for identifying putative polysaccharide utilization loci (PULs) in microbial genomes or metagenomes.

Nucleic Acids Res. 2018:46(W1) | 444 Citations (from Europe PMC, 2022-12-03)
dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation. [PMID: 30053267]
Huang L, Zhang H, Wu P, Entwistle S, Li X, Yohe T, Yi H, Yang Z, Yin Y.

Carbohydrate-active enzyme (CAZymes) are not only the most important enzymes for bioenergy and agricultural industries, but also very important for human health, in that human gut microbiota encode hundreds of CAZyme genes in their genomes for degrading various dietary and host carbohydrates. We have built an online database dbCAN-seq ( to provide pre-computed CAZyme sequence and annotation data for 5,349 bacterial genomes. Compared to the other CAZyme resources, dbCAN-seq has the following new features: (i) a convenient download page to allow batch download of all the sequence and annotation data; (ii) an annotation page for every CAZyme to provide the most comprehensive annotation data; (iii) a metadata page to organize the bacterial genomes according to species metadata such as disease, habitat, oxygen requirement, temperature, metabolism; (iv) a very fast tool to identify physically linked CAZyme gene clusters (CGCs) and (v) a powerful search function to allow fast and efficient data query. With these unique utilities, dbCAN-seq will become a valuable web resource for CAZyme research, with a focus complementary to dbCAN (automated CAZyme annotation server) and CAZy (CAZyme family classification and reference database).

Nucleic Acids Res. 2018:46(D1) | 73 Citations (from Europe PMC, 2022-12-03)
dbCAN: a web resource for automated carbohydrate-active enzyme annotation. [PMID: 22645317]
Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y.

Carbohydrate-active enzymes (CAZymes) are very important to the biotech industry, particularly the emerging biofuel industry because CAZymes are responsible for the synthesis, degradation and modification of all the carbohydrates on Earth. We have developed a web resource, dbCAN (, to provide a capability for automated CAZyme signature domain-based annotation for any given protein data set (e.g. proteins from a newly sequenced genome) submitted to our server. To accomplish this, we have explicitly defined a signature domain for every CAZyme family, derived based on the CDD (conserved domain database) search and literature curation. We have also constructed a hidden Markov model to represent the signature domain of each CAZyme family. These CAZyme family-specific HMMs are our key contribution and the foundation for the automated CAZyme annotation.

Nucleic Acids Res. 2012:40(Web Server issue) | 784 Citations (from Europe PMC, 2022-12-03)


All databases:
82/5435 (98.51%)
Gene genome and annotation:
35/1456 (97.665%)
Total Rank

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Cited by

Record metadata

Created on: 2018-01-28
Curated by:
Shoaib Saleem [2019-11-25]
Shoaib Saleem [2019-11-19]
Rabail Raza [2018-12-26]
Pei Wang [2018-03-21]
Pei Wang [2018-03-11]
Pei Wang [2018-02-23]
Hao Zhang [2018-01-28]