Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

SMC

General information

URL: https://smc.jgi.doe.gov
Full name: the secondary metabolism collaboratory
Description: SMC aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access.
Year founded: 2024
Last update: 2025-01-09
Version: 1.3.0
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
DNA
Data object:
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Lawrence Berkeley National Labs
Address:
City: Berkeley
Province/State: California
Country/Region: United States
Contact name (PI/Team): Daniel W Udwary
Contact email (PI/Helpdesk): dwudwary@lbl.gov

Publications

39540430
The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters. [PMID: 39540430]
Daniel W Udwary, Drew T Doering, Bryce Foster, Tatyana Smirnova, Satria A Kautsar, Nigel J Mouncey

Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.

Nucleic Acids Res. 2025:53(D1) | 16 Citations (from Europe PMC, 2026-05-09)

Ranking

All databases:
1269/6931 (81.705%)
Gene genome and annotation:
404/2040 (80.245%)
1269
Total Rank
10
Citations
10
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2025-07-01
Curated by:
Yuhao Zeng [2025-08-20]
Jinbiao Wang [2025-08-02]
Yiran Zhan [2025-07-01]