NGDC Cloud

Zhang, W., Li, X.J., Liu, F. et al. Fungen: clustering and correcting long-read metatranscriptomic data for exploring eukaryotic microorganisms. Sci. China Life Sci. (2025). https://doi.org/10.1007/s11427-024-2853-x

Fungen is a reference-free tool to construct accurate transcripts using long-read metatranscriptomic data through read clustering and error correction.

1. FunGen:
FunGen (Fungal Gene) is a reference-free tool to construct accurate transcripts using long-read metatranscriptomic data through read clustering and error correction. It is consists of two main stages: (1) clustering error-prone long reads that are potentially from the same gene and (2) performing error correction within each cluster.
The local software can be installed through https://github.com/gyjames/Fungen
2. Online software usage:
Data:
·Either upload the metatranscriptome Fastq file (uncompressed or .gz),
·Or paste raw sequences directly into the text box.
Parameter:
·Threads: Number of CPU threads to use for read clustering. More threads speed up clustering at the cost of higher CPU usage. Default 10.
·Minimum length: After homopolymer compression, reads shorter than this (in bases) are discarded before clustering. Default 30.
·Minimizer size: Length of k‑mer used for computing minimizers. A minimizer is the lexicographically smallest k‑mer within a sliding window of the read. Default 11.
·Window size: Length of the sliding window (in compressed‑read bases) over which you select one minimizer. Larger windows yield fewer minimizers per read. Default 20.
·Chunk: Number of reads per chunk when splitting the input for batch‐wise clustering. Smaller chunks reduce peak memory but may increase total runtime. Default 10000.
Output:
·cluster_results.txt: The final clustering results
·cluster_represents.fasta: Reference sequences for gene clusters
·corrected_seqs.fastq: Corrected reads

Data

Parameters

1. FunGen:

2. Online software usage: