TPMC

A comprehensive genome and gene catalog of aquatic microbiome in Tibetan Plateau.

Datasets

TPMC metagenome data

A total of 498 metagenomic samples were collected, sequenced, and analyzed in this metagenome study, across the central plateau (n=356) and the northeastern border Qilian Mountains-Qinghai Lake (n=142) of TP, which covered a range of water ecosystems including saline lake (n=104), freshwater lake (n=72), river (n=108), hot spring (n=76), wetland (n=132), and glacier (n=6).The metadata of all the 498 samples is available here.

To examine the TP microbiome biogeography across a 2,500-km transect that spans from the Tibetan Plateau to the east coast of China and has a ladder-like topography based on altitude, we additionally collected 109 freshwater samples from the second step of the ladder (n = 15) and the third step (n = 19), and river samples from the second step (n = 40) and the third step (n = 24), as well as wetland samples from the second step (n = 11).

All these 607 metagenomes are available at Genome Sequence Archive (GSA) section of the National Genomics Data Center (project accession number CRA011511).

TPMC genome catalog

TPMC genome catalog contains 32,355 medium- and high-quality metagenome-assembled genomes (MAGs) that are clustered into 10,723 representative genome-based species, constructed from 498 metagenomes collected from diverse water ecosystems in the Tibetan Plateau. All the MAGs meet the medium-quality level of the MIMAG standard (mean completeness=78.1%, mean contamination=2.6%), and 2,024 out of them are assigned as high-quality with presence of the 23S, 16S, and 5S rRNA genes and at least 18 tRNAs. The entire TPMC genome catalog and associated data is available here.

To assess the biosynthetic capabilities of the TPMC, we applied antiSMASH on all the 32,355 MAGs to predict a total of 73,864 biosynthetic gene clusters (BGCs). These BGCs were categorized into eight groups using BiG-SCAPE, Terpene (n=31,734), RiPPs (n=11,772), Others (n=11,513), PKSother (n=7,865), NRPS (n=7,044), PKSI (n=2,043), PKS-NRP_Hybrids (n=1,859), Saccharides (n=34). The TPMC BGCs data is available here.

TPMC gene catalog

TPMC gene catalog contains 296,289,678 non-redundant genes (unigenes), represents the largest and most comprehensive resource ever to capture the genomic and gene diversity across TP’s water ecosystems. The non-redundant gene catalog was taxonomically and functionally annotated using the NR, UniRef50, and Swiss-Prot databases, with 82.4%, 79.8%, and 35.1% of the unigenes annotated, respectively. The gene catalog was also functionally annotated using COG, KEGG, CAZy, GO, CARD, and VFDB databases, with 66.9%, 46.1%, 1.7%, 17.2%, 0.01%, and 8.9% of the unigenes annotated. The entire TPMC gene catalog and associated data is available here.

Moreover, we have also collected 109 samples from the other Chinese ladder steps to construct a non-redundant gene catalog comprising a total of 329,568,659 unigenes for the "three-step ladder topography" in China (TLGC) for the biogeography study. The entire TLGC gene catalog and associated data is available here.

Results

The Tibetan Plateau supplies water to nearly 2 billion people in Asia, but climate change poses threats to its aquatic microbial resources. Here, we construct the Tibetan Plateau Microbial Catalog by sequencing 498 metagenomes from six water ecosystems (saline lakes, freshwater lakes, rivers, hot springs, wetlands and glaciers). Our catalog expands knowledge of regional genomic diversity by presenting 32,355 metagenome-assembled genomes that de-replicated into 10,723 representative genome-based species, of which 88% were unannotated. The catalog contains nearly 300 million non-redundant gene clusters, of which 15% novel, and 73,864 biosynthetic gene clusters, of which 50% novel, thus expanding known functional diversity. Using these data, we investigate the Tibetan Plateau aquatic microbiome's biogeography along a distance of 2,500 km and >5 km in altitude. Microbial compositional similarity and the shared gene count with the Tibetan Plateau microbiome decline along with distance and altitude difference, suggesting a dispersal pattern. The Tibetan Plateau Microbial Catalog stands as a substantial repository for high-altitude aquatic microbiome resources, providing potential for discovering novel lineages and functions, and bridging knowledge gaps in microbiome biogeography.

Data Usage Policy

The TPMC catalogs are free to use. Underlying metagenomes are protected by GSA data release and utilization policies, and we encourage contacting us for any planned analyses or publications that may overlap with existing project goals.