Zhang, W., Li, X.J., Liu, F. et al. Fungen: clustering and correcting long-read metatranscriptomic data for exploring eukaryotic microorganisms. Sci. China Life Sci. (2025). https://doi.org/10.1007/s11427-024-2853-x
Fungen is a reference-free tool to construct accurate transcripts using long-read metatranscriptomic data through read clustering and error correction.
1. FunGen: FunGen (Fungal Gene) is a reference-free tool to construct accurate transcripts using long-read metatranscriptomic data through read clustering and error correction. It is consists of two main stages: (1) clustering error-prone long reads that are potentially from the same gene and (2) performing error correction within each cluster. The local software can be installed through https://github.com/gyjames/Fungen 2. Online software usage: Data: ·Either upload the metatranscriptome Fastq file (uncompressed or .gz), ·Or paste raw sequences directly into the text box. Parameter: ·Threads: Number of CPU threads to use for read clustering. More threads speed up clustering at the cost of higher CPU usage. Default 10. ·Minimum length: After homopolymer compression, reads shorter than this (in bases) are discarded before clustering. Default 30. ·Minimizer size: Length of k‑mer used for computing minimizers. A minimizer is the lexicographically smallest k‑mer within a sliding window of the read. Default 11. ·Window size: Length of the sliding window (in compressed‑read bases) over which you select one minimizer. Larger windows yield fewer minimizers per read. Default 20. ·Chunk: Number of reads per chunk when splitting the input for batch‐wise clustering. Smaller chunks reduce peak memory but may increase total runtime. Default 10000. Output: ·cluster_results.txt: The final clustering results ·cluster_represents.fasta: Reference sequences for gene clusters ·corrected_seqs.fastq: Corrected reads