HiTE: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation.

Kang Hu, Peng Ni, Minghua Xu, You Zou, Jianye Chang, Xin Gao, Yaohang Li, Jue Ruan, Bin Hu, Jianxin Wang
Author Information
  1. Kang Hu: School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
  2. Peng Ni: School of Computer Science and Engineering, Central South University, Changsha, 410083, China. ORCID
  3. Minghua Xu: School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
  4. You Zou: School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
  5. Jianye Chang: Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China.
  6. Xin Gao: Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia. ORCID
  7. Yaohang Li: Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA. ORCID
  8. Jue Ruan: Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518000, China. ORCID
  9. Bin Hu: Key Laboratory of Brain Health Intelligent Evaluation and Intervention, Ministry of Education (Beijing Institute of Technology), Beijing, P. R. China. bh@bit.edu.cn.
  10. Jianxin Wang: School of Computer Science and Engineering, Central South University, Changsha, 410083, China. jxwang@mail.csu.edu.cn. ORCID

Abstract

Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.

References

  1. PLoS Genet. 2019 Sep 9;15(9):e1008291 [PMID: 31498837]
  2. Nat Genet. 2020 Mar;52(3):306-319 [PMID: 32024998]
  3. BMC Bioinformatics. 2008 Jan 14;9:18 [PMID: 18194517]
  4. Nat Genet. 2023 Jul;55(7):1221-1231 [PMID: 37322109]
  5. Mob DNA. 2019 Dec 12;10:48 [PMID: 31857828]
  6. Nat Commun. 2017 Mar 03;8:14651 [PMID: 28256530]
  7. Bioinformatics. 2006 Jul 1;22(13):1658-9 [PMID: 16731699]
  8. Genome Res. 2002 Aug;12(8):1269-76 [PMID: 12176934]
  9. Mob DNA. 2015 Jun 02;6:11 [PMID: 26045719]
  10. Plant Physiol. 2018 Feb;176(2):1410-1422 [PMID: 29233850]
  11. Genome Biol. 2018 Nov 19;19(1):199 [PMID: 30454069]
  12. Genes (Basel). 2022 Apr 17;13(4): [PMID: 35456515]
  13. Sci Data. 2021 Jul 15;8(1):174 [PMID: 34267227]
  14. Mol Biol Evol. 2013 Apr;30(4):772-80 [PMID: 23329690]
  15. J Mol Evol. 2003;57 Suppl 1:S50-9 [PMID: 15008403]
  16. Mob DNA. 2021 Jan 12;12(1):2 [PMID: 33436076]
  17. Bioinformatics. 2005 Jun;21 Suppl 1:i152-8 [PMID: 15961452]
  18. BMC Bioinformatics. 2019 Jun 24;20(1):354 [PMID: 31234777]
  19. Nucleic Acids Res. 2010 Dec;38(22):e199 [PMID: 20880995]
  20. Nat Rev Mol Cell Biol. 2022 Jul;23(7):481-497 [PMID: 35228718]
  21. Mob DNA. 2020 Jul 27;11:28 [PMID: 32742313]
  22. Mob DNA. 2022 Apr 27;13(1):14 [PMID: 35477485]
  23. Science. 2021 Nov 12;374(6569):eabi7489 [PMID: 34762468]
  24. Bioinformatics. 2005 Jun;21 Suppl 1:i351-8 [PMID: 15961478]
  25. J Integr Plant Biol. 2021 May;63(5):913-923 [PMID: 32889758]
  26. Annu Rev Genet. 2020 Nov 23;54:539-561 [PMID: 32955944]
  27. Mob DNA. 2019 May 31;10:25 [PMID: 31164927]
  28. Methods Mol Biol. 2019;1910:505-530 [PMID: 31278675]
  29. Anal Biochem. 2008 Sep 1;380(1):77-83 [PMID: 18541131]
  30. Nucleic Acids Res. 1999 Jan 15;27(2):573-80 [PMID: 9862982]
  31. Cells. 2020 Jul 25;9(8): [PMID: 32722451]
  32. Nat Commun. 2024 Jul 2;15(1):5573 [PMID: 38956036]
  33. Proc Natl Acad Sci U S A. 2014 Jul 15;111(28):10263-8 [PMID: 24982153]
  34. Genes Genet Syst. 2020 Jan 30;94(6):233-252 [PMID: 30416149]
  35. Front Plant Sci. 2022 Sep 02;13:995586 [PMID: 36119578]
  36. Mol Plant. 2019 Mar 4;12(3):447-460 [PMID: 30802553]
  37. Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457 [PMID: 32300014]
  38. PLoS Genet. 2021 Oct 14;17(10):e1009768 [PMID: 34648488]
  39. Curr Biol. 2019 Apr 1;29(7):1161-1168.e6 [PMID: 30880010]
  40. Front Neurol. 2019 Aug 20;10:894 [PMID: 31481926]
  41. Rice (N Y). 2013 Feb 06;6(1):4 [PMID: 24280374]
  42. Nat Commun. 2022 Nov 19;13(1):7115 [PMID: 36402840]
  43. Mob DNA. 2022 Mar 30;13(1):7 [PMID: 35354491]
  44. Mol Plant. 2020 Jun 1;13(6):851-863 [PMID: 32087371]
  45. Nucleic Acids Res. 2021 Sep 20;49(16):9132-9153 [PMID: 34390351]
  46. Nat Biotechnol. 2017 Apr 11;35(4):316-319 [PMID: 28398311]
  47. Genome Res. 2022 Jul;32(7):1424-1436 [PMID: 35649578]
  48. Plant Physiol. 2019 Aug;180(4):1803-1815 [PMID: 31152127]
  49. Mol Biol Evol. 2024 Apr 2;41(4): [PMID: 38577785]
  50. Genome Biol. 2019 Dec 16;20(1):275 [PMID: 31843001]
  51. Cell. 2021 Mar 4;184(5):1156-1170.e14 [PMID: 33539781]
  52. Sci Rep. 2019 Oct 28;9(1):15399 [PMID: 31659260]
  53. Proc Natl Acad Sci U S A. 2022 Nov 29;119(48):e2209766119 [PMID: 36417430]
  54. Plant Cell. 2011 Sep;23(9):3117-28 [PMID: 21908723]
  55. Bioinformatics. 2017 Mar 1;33(5):743-745 [PMID: 28062442]
  56. Genetics. 2002 Jul;161(3):1293-305 [PMID: 12136031]
  57. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W265-8 [PMID: 17485477]
  58. Mol Plant. 2023 Aug 7;16(8):1232-1236 [PMID: 37553831]
  59. Front Plant Sci. 2020 Nov 12;11:577536 [PMID: 33281844]
  60. Nat Rev Genet. 2007 Dec;8(12):973-82 [PMID: 17984973]
  61. NAR Genom Bioinform. 2022 May 17;4(2):lqac040 [PMID: 35591887]
  62. Nat Rev Genet. 2019 Dec;20(12):760-772 [PMID: 31515540]
  63. Trends Genet. 2022 Jun;38(6):529-553 [PMID: 35307201]
  64. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  65. Curr Protoc. 2021 Jun;1(6):e154 [PMID: 34138525]
  66. Mol Cell. 2021 Oct 21;81(20):4271-4286.e4 [PMID: 34403695]
  67. Nat Food. 2020 Aug;1(8):489-499 [PMID: 37128077]
  68. Genome Res. 2022 Nov-Dec;32(11-12):2015-2027 [PMID: 36351773]

Grants

  1. 62350004/National Natural Science Foundation of China (National Science Foundation of China)
  2. 62332020/National Natural Science Foundation of China (National Science Foundation of China)

MeSH Term

DNA Transposable Elements
Molecular Sequence Annotation
Animals
Software
Humans
Reproducibility of Results
Computational Biology
Databases, Genetic
Algorithms
Genome

Chemicals

DNA Transposable Elements

Word Cloud

Created with Highcharts 10.0.0HiTEannotationTEgenomecomprehensiveTEsexistingavailablespeciesdetectionfastaccuratedynamicboundaryadjustmentapproachfull-lengthRecentadvancementsassemblygreatlyimprovedprospectsTransposableElementsHowevermethodsusingassembliessufferlimitedaccuracyrobustnessrequiringextensivemanualeditingadditioncurrentlygold-standarddatabasesevenextensivelystudiedhighlightingcriticalneedautomatedmethodsupplementrepositoriesstudyintroducedesigneddetectexperimentalresultsdemonstrateoutperformsRepeatModeler2state-of-the-arttoolacrossvariousFurthermoreidentifiednumerousnoveltransposonswell-definedstructurescontainingprotein-codingdomainsdirectlyinsertedwithincrucialgenesleadingdirectalterationsgeneexpressionNextflowversionalsoenhancedparallelismreproducibilityportabilityHiTE:transposableelement

Similar Articles

Cited By