Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

dbCAN-seq

General information

URL: https://bcb.unl.edu/dbCAN_seq
Full name: a database of CAZyme sequence and annotation
Description: dbCAN-seq database includes ∼498 000 CAZymes and ∼169 000 CAZyme gene clusters (CGCs) from 9421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments.
Year founded: 2018
Last update: 2022
Version: v2.0
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
DNA
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: University of Nebraska
Address: Nebraska Food for Health Center, Department of Food Science and Technology, University of Nebraska, Lincoln, NE 68588, USA.
City: Lincoln
Province/State:
Country/Region: United States
Contact name (PI/Team): Yanbin Yin
Contact email (PI/Helpdesk): yanbin.yin@gmail.com

Publications

36399503
dbCAN-seq update: CAZyme gene clusters and substrates in microbiomes. [PMID: 36399503]
Jinfang Zheng, Boyang Hu, Xinpeng Zhang, Qiwei Ge, Yuchen Yan, Jerry Akresi, Ved Piyush, Le Huang, Yanbin Yin

Carbohydrate Active EnZymes (CAZymes) are significantly important for microbial communities to thrive in carbohydrate rich environments such as animal guts, agricultural soils, forest floors, and ocean sediments. Since 2017, microbiome sequencing and assembly have produced numerous metagenome assembled genomes (MAGs). We have updated our dbCAN-seq database (https://bcb.unl.edu/dbCAN_seq) to include the following new data and features: (i) ∼498 000 CAZymes and ∼169 000 CAZyme gene clusters (CGCs) from 9421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments; (ii) Glycan substrates for 41 447 (24.54%) CGCs inferred by two novel approaches (dbCAN-PUL homology search and eCAMI subfamily majority voting) (the two approaches agreed on 4183 CGCs for substrate assignments); (iii) A redesigned CGC page to include the graphical display of CGC gene compositions, the alignment of query CGC and subject PUL (polysaccharide utilization loci) of dbCAN-PUL, and the eCAMI subfamily table to support the predicted substrates; (iv) A statistics page to organize all the data for easy CGC access according to substrates and taxonomic phyla; and (v) A batch download page. In summary, this updated dbCAN-seq database highlights glycan substrates predicted for CGCs from microbiomes. Future work will implement the substrate prediction function in our dbCAN2 web server.

Nucleic Acids Res. 2023:51(D1) | 53 Citations (from Europe PMC, 2025-12-13)
30053267
dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation. [PMID: 30053267]
Huang L, Zhang H, Wu P, Entwistle S, Li X, Yohe T, Yi H, Yang Z, Yin Y.

Carbohydrate-active enzyme (CAZymes) are not only the most important enzymes for bioenergy and agricultural industries, but also very important for human health, in that human gut microbiota encode hundreds of CAZyme genes in their genomes for degrading various dietary and host carbohydrates. We have built an online database dbCAN-seq (http://cys.bios.niu.edu/dbCAN_seq) to provide pre-computed CAZyme sequence and annotation data for 5,349 bacterial genomes. Compared to the other CAZyme resources, dbCAN-seq has the following new features: (i) a convenient download page to allow batch download of all the sequence and annotation data; (ii) an annotation page for every CAZyme to provide the most comprehensive annotation data; (iii) a metadata page to organize the bacterial genomes according to species metadata such as disease, habitat, oxygen requirement, temperature, metabolism; (iv) a very fast tool to identify physically linked CAZyme gene clusters (CGCs) and (v) a powerful search function to allow fast and efficient data query. With these unique utilities, dbCAN-seq will become a valuable web resource for CAZyme research, with a focus complementary to dbCAN (automated CAZyme annotation server) and CAZy (CAZyme family classification and reference database).

Nucleic Acids Res. 2018:46(D1) | 209 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
451/6895 (93.474%)
Gene genome and annotation:
158/2021 (92.232%)
Metadata:
46/719 (93.741%)
451
Total Rank
244
Citations
34.857
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2023-08-23
Curated by:
Yuxin Qin [2023-09-12]
Xinyu Zhou [2023-09-08]
Yue Qi [2023-08-23]