Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

GMSC

General information

URL: https://gmsc.big-data-biology.org
Full name: Global Microbial smORFs Catalogue
Description: The global microbial smORF catalogue (GMSC) is an integrated, consistently-processed, smORFs catalogue of the microbial world, combining publicly available metagenomes and high-quality isolated microbial genomes.
Year founded: 2024
Last update: 2024
Version: v1.0
Accessibility:
Accessible
Country/Region: China

Funding support

  • 2021YFF0703703

Contact information

University/Institution: Fudan University
Address: Fudan University Zhangjiang Campus, No. 825 Zhangheng Road, Zhangjiang Hi-Tech Park, Pudong New Area, Shanghai, China
City: Shanghai
Province/State: Shanghai
Country/Region: China
Contact name (PI/Team): GMSC Team
Contact email (PI/Helpdesk): 20110850018@fudan.edu.cn

Publications

39214983
A catalog of small proteins from the global microbiome. [PMID: 39214983]
Duan Y, Santos-JĂșnior CD, Schmidt TS, Fullam A, de Almeida BLS, Zhu C, Kuhn M, Zhao XM, Bork P, Coelho LP.

Small open reading frames (smORFs) shorter than 100 codons are widespread and perform essential roles in microorganisms, where they encode proteins active in several cell functions, including signal pathways, stress response, and antibacterial activities. However, the ecology, distribution and role of small proteins in the global microbiome remain unknown. Here, we construct a global microbial smORFs catalog (GMSC) derived from 63,410 publicly available metagenomes across 75 distinct habitats and 87,920 high-quality isolate genomes. GMSC contains 965 million non-redundant smORFs with comprehensive annotations. We find that archaea harbor more smORFs proportionally than bacteria. We moreover provide a tool called GMSC-mapper to identify and annotate small proteins from microbial (meta)genomes. Overall, this publicly-available resource demonstrates the immense and underexplored diversity of small proteins.

Nat Commun. 2024:15(1) | 19 Citations (from Europe PMC, 2026-03-28)

Ranking

All databases:
1623/6932 (76.601%)
Gene genome and annotation:
514/2039 (74.841%)
Expression:
329/1361 (75.9%)
1623
Total Rank
15
Citations
7.5
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2025-03-19
Curated by:
Yiqian Duan [2025-03-25]
Lina Ma [2025-03-24]
Yiqian Duan [2025-03-19]