HiTE: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation.
Kang Hu, Peng Ni, Minghua Xu, You Zou, Jianye Chang, Xin Gao, Yaohang Li, Jue Ruan, Bin Hu, Jianxin Wang
Author Information
Kang Hu: School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
Peng Ni: School of Computer Science and Engineering, Central South University, Changsha, 410083, China. ORCID
Minghua Xu: School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
You Zou: School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
Jianye Chang: Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China.
Xin Gao: Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia. ORCID
Yaohang Li: Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA. ORCID
Jue Ruan: Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China. ORCID
Bin Hu: Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education (Beijing Institute of Technology), Beijing, P. R. China. bh@bit.edu.cn.
Jianxin Wang: School of Computer Science and Engineering, Central South University, Changsha, 410083, China. jxwang@mail.csu.edu.cn. ORCID
Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.