Datasets
Sequencing data
The raw sequencing reads can be accessed through the National Genomics Data Center under accession numbers PRJCA026178 and PRJCA026179 (https://ngdc.cncb.ac.cn). All rumen metagenomic sequencing data obtained from 100 dairy cows in this study have been deposited in the European Nucleotide Archive (ENA) under accession numbers PRJNA1283621, PRJNA1283622, and PRJNA1283623 (https://www.ncbi.nlm.nih.gov/).
RCG catalog
The rumen ciliate genome (RCG) catalog was constructed using three omics approaches. First, we manually isolated ciliates based on their morphology from the rumen of cattle, sheep, goat, and deer, conducting single-cell next-generation and HiFi sequencing to generate 436 SAGs. Second, we compiled 2,007 metagenomic datasets from 50 studies across 16 ruminant species worldwide and performed co-assembly and binning to obtain 108 rumen ciliate MAGs. Third, we incorporated previously published data from 69 single-cell sequencing datasets of rumen ciliates and from the Entodinium caudatum genome. Genomic data of rumen ciliates often contain significant contamination from prokaryotes, plant materials, fungi, and endo- or ectosymbionts. The main challenge in obtaining ciliate genomes is distinguishing the ciliate sequences from the abundant contaminating sequences. We developed a new genome decontamination pipeline (iGDP, https://github.com/CodeFeiX/iGDP-rc) optimized for rumen ciliate genome purification, which effectively removed these contaminants. On average, 91.9% of the assembled sequences were identified as contaminants and subsequently removed, which only reduced the rumen ciliate genome BUSCO completeness by 11.2%, demonstrating the high effectiveness of iGDP. In total, we obtained 450 genomes that met medium-quality standards (completeness > 50%), including 233 genomes considered high-quality (completeness > 80%; 215 SAGs, 17 MAGs, and 1 genome from a ciliate monoculture). Notably, 87% of these genomes are novel, highlighting the previously unexplored representation of rumen ciliates. Genome evaluation showed that there were no discordant peaks for the GC content in these genomes, and an average of 97.4% of contigs in all genomes that contained telomeres specific to rumen ciliates or were supported by telomere sequences. These constitute the largest high-purity rumen ciliate genomes (RCGs) reported to date, providing novel guidelines and resources for studying the rumen microbiome. The RCG-associated gene prediction information is available at https://doi.org/10.6084/m9.figshare.27229761. The entire RCG catalog and associated information is available here .
RBAG catalog
Our comprehensive high-quality ciliate genome catalog allowed us to distinguish between prokaryotic and ciliate sequences in the rumen microbial genomes. We therefore compared the prokaryotic MAGs reconstructed from rumen metagenomes with all the telomere-supported sequences from the RCGs to identify the contaminating ciliate sequences. Surprisingly, the degree of ciliate contamination in 1,931 previously reported prokaryotic MAGs ranged from 0.04% to 70.3% of contig lengths, with an average contamination rate of 1%. Thus, the prokaryotic MAGs appear to have varying degrees of ciliate sequence contamination. We then compiled a set of 25,115 prokaryotic genomes from ruminants and removed 34.4 Mb of mixed ciliate sequences to obtain a non-redundant and refined set of 12,557 bacterial and 158 archaeal genomes (RBAGs), enabling us to further explore the community structure and functions of rumen ciliates more comprehensively.The entire RBAG catalog and associated information is available here .
Data Usage Policy
The RCG and RBAG catalogs are free to use. The data generated by this project is protected by GSA data release and utilization policies, and we encourage researchers to contact us about any planned analyses or publications that may overlap with existing project goals.

