TPMC-S

A comprehensive genome and gene catalog of sediment microbiome in Tibetan Plateau (TPMC-S)

Datasets

TPMC-S metagenome data

We analyzed 248 sediment metagenomes samples from four ecosystems: saline lakes (n=59), freshwater lakes (n=42), rivers (n=46), and wetlands (n=101). The Tibetan Plateau core region contributed 175 samples (42 freshwater lakes, 23 rivers, 21 saline lakes, 89 wetlands), while the Qilian margin provided 73 samples (38 saline lakes, 23 rivers, 12 wetlands). Sampling sites span the ancient core region of Tibetan Plateau to its northeastern margin shaped by neotectonics movements, covering key tectonic units like the Qiangtang Plateau, Gangdise Mountains, Nyainqentanglha Mountains and Qilian Mountains (28°N - 38°N, 89°E - 102°E), forming a 3D - observation system. Metadata for all 248 samples is available here. Metagenome samples can be accessed on the Genome Sequence Archive (GSA) section of the National Genomics Data Center (project accession number PRJCA036742).

TPMC-S genome catalog

The TPMC-S genome catalog comprises 13,696 medium- and high-quality metagenome-assembled genomes (MAGs), clustered into 6,233 representative genome-based species. These MAGs were reconstructed from 248 metagenomes derived from sediment samples spanning diverse sediment ecosystems across the Tibetan Plateau, including saline lakes, freshwater lakes, rivers, and wetlands. All the MAGs meet the medium-quality level of the MIMAG standard (mean completeness=75.9%, mean contamination=3.5%), and 788 out of them are assigned as high-quality with presence of the 23S, 16S, and 5S rRNA genes and at least 18 tRNAs. The entire TPMC-S genome catalog and associated data is available here.

To assess the biosynthetic capabilities of the TPMC-S, we applied antiSMASH on all the 13,696 MAGs to predict a total of 26,360 biosynthetic gene clusters (BGCs). These BGCs were categorized into eight groups using BiG-SCAPE, Terpene (n=7,441), RiPPs (n=6,037), Others (n=5,095), PKSother (n=2,718), NRPS (n=3,711), PKSI (n=843), PKS-NRP_Hybrids (n=514), Saccharides (n=1). The BGCs data of the TPMC-S is available here.

TPMC-S gene catalog

The TPMC-S gene catalog, with 511,056,752 non-redundant genes from a total of 701,528,671, outperforms the largest Tibetan Plateau aquatic sample dataset by presenting a richer gene pool using fewer MAGs. The non-redundant gene catalog was taxonomically and functionally annotated using the NR and Swiss-Prot databases, with 82.98% and 38.26% of the unigenes annotated, respectively. The gene catalog was also functionally annotated using COG, KEGG KO, KEGG Module, KEGG Pathway, CAZy, GO, CARD, and VFDB databases, with 78.03%, 44.85%, 27.58%, 17.72%, 1.47%, 5.39%, 0.01%, 8.23% of the unigenes annotated. The entire TPMC-S gene catalog and associated data are available here.

Results

By conducting metagenomic sequencing on 248 samples from four sedimentary ecosystems of the Tibetan Plateau (freshwater sediment, wetland sediment, river sediment, and saline lake sediment), we meticulously constructed a comprehensive catalog of Tibetan Plateau sediment microbiota. By employing a standardized cataloging pipeline consistent with that used for the Tibetan Plateau aquatic microbiomes (TPMC-A), we reliably reconstructed 13,696 metagenome-assembled genomes (MAGs) and identified 701,528,671 genes. Sediment microbiomes exhibit pronounced distance-decay relationships, with their community structure being strongly influenced by altitudinal gradients. Sediments act as a "time capsules" for ancient microbial lineages, with the discovery of novel Asgardarchaeota archaeal lineages (Asgard-Tibet-1 and Asgard-Qilian-1) being particularly noteworthy. Notably, the divergence of Asgardarchaeota archaea predates the orogenic events of the Tibetan Plateau (25-20 million years ago) and the Qilian Mountains (10-8 million years ago), yet they have adapted to drastic environmental changes through genetic mechanisms such as glycosylation pathways. Our findings enhance the understanding of the microbial patterns in the extreme environment and their links to changes in the natural geographic environment, and emphases the importance of evolutionary legacy preservation in microbial conservation.

Data Usage Policy

The TPMC-S catalogs are free to use. Underlying metagenomes are protected by GSA data release and utilization policies, and we encourage contacting us for any planned analyses or publications that may overlap with existing project goals.