Tools
To address the challenges posed by the repetitive nature of TEs in single-cell quantification, we developed scTEfinder, a robust and standardized pipeline designed for accurate TE subfamily quantification from raw scRNA-seq data. Built upon the foundation of the scTE toolkit, scTEfinder processes data from FASTQ files through three core modules: read mapping, quality control (QC), and TE quantification. Specifically, it aligns reads to the reference genome, keeps multi-mapped reads to capture TEs, filters BAM files based on cell QC, and counts TEs at the subfamily level. scTEfinder outputs a combined gene-TE count matrix that can be directly used in standard downstream analyses in Seurat (R) or Scanpy (Python).

To evaluate its performance, we benchmarked scTEfinder against TEtranscripts, one of the most widely used tools for TE analysis in bulk RNA-seq. Even without pseudobulk aggregation, scTEfinder showed high concordance with TEtranscripts in estimating TE subfamily expression levels, achieving a Pearson correlation close to 1. These results confirm the robustness and accuracy of scTEfinder in quantifying TE expression at the single-cell level.
Code repository: https://github.com/synnimeng/scTEfinder
