Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation.

Xuanzeng Liu, Lina Zhao, Muhammad Majid, Yuan Huang
Author Information
  1. Xuanzeng Liu: College of Life Sciences, Shaanxi Normal University, Xi'an, China.
  2. Lina Zhao: College of Life Sciences, Shaanxi Normal University, Xi'an, China.
  3. Muhammad Majid: College of Life Sciences, Shaanxi Normal University, Xi'an, China.
  4. Yuan Huang: College of Life Sciences, Shaanxi Normal University, Xi'an, China. yuanh@snnu.edu.cn.

Abstract

Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of "level1/level2-level3". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.

Keywords

References

  1. Genome Res. 1998 May;8(5):464-78 [PMID: 9582191]
  2. Mob DNA. 2022 Apr 27;13(1):14 [PMID: 35477485]
  3. Ecol Evol. 2017 Jun 22;7(15):5939-5947 [PMID: 28811889]
  4. Annu Rev Entomol. 2021 Jan 7;66:355-372 [PMID: 32931312]
  5. Hortic Res. 2022 Feb 19;: [PMID: 35184178]
  6. Science. 2017 Jul 7;357(6346):93-97 [PMID: 28684525]
  7. Nat Commun. 2014 Aug 12;5:4611 [PMID: 25118180]
  8. Genome Biol Evol. 2015 Mar 11;7(4):1192-205 [PMID: 25767248]
  9. Science. 2000 Mar 24;287(5461):2185-95 [PMID: 10731132]
  10. PLoS One. 2023 Mar 15;18(3):e0275551 [PMID: 36920952]
  11. Science. 1996 Nov 1;274(5288):765-8 [PMID: 8864112]
  12. Mob DNA. 2019 Jan 03;10:1 [PMID: 30622655]
  13. Nature. 2005 Aug 11;436(7052):793-800 [PMID: 16100779]
  14. Front Genet. 2021 Jun 23;12:693541 [PMID: 34249107]
  15. Curr Protoc Bioinformatics. 2004 May;Chapter 4:Unit 4.10 [PMID: 18428725]
  16. BMC Evol Biol. 2019 Jan 9;19(1):11 [PMID: 30626321]
  17. Nucleic Acids Res. 2016 Jan 4;44(D1):D81-9 [PMID: 26612867]
  18. Genome Biol. 2018 Nov 19;19(1):199 [PMID: 30454069]
  19. Science. 2009 Nov 20;326(5956):1112-5 [PMID: 19965430]
  20. PLoS Genet. 2016 Jun 13;12(6):e1006108 [PMID: 27294409]
  21. Mob DNA. 2021 Jan 12;12(1):2 [PMID: 33436076]
  22. Front Physiol. 2020 Oct 22;11:567125 [PMID: 33192564]
  23. BMC Biol. 2022 Oct 28;20(1):243 [PMID: 36307800]
  24. Cytogenet Genome Res. 2005;110(1-4):462-7 [PMID: 16093699]
  25. Nature. 1980 Apr 17;284(5757):604-7 [PMID: 7366731]
  26. Syst Biol. 2006 Dec;55(6):875-85 [PMID: 17345670]
  27. Nat Commun. 2014;5:2957 [PMID: 24423660]
  28. Insects. 2021 Sep 17;12(9): [PMID: 34564277]
  29. PLoS Genet. 2011 Dec;7(12):e1002384 [PMID: 22144907]
  30. Mol Phylogenet Evol. 2015 May;86:90-109 [PMID: 25797922]
  31. Trends Genet. 1989 Apr;5(4):103-7 [PMID: 2543105]
  32. PLoS Genet. 2015 Jul 17;11(7):e1005406 [PMID: 26186437]
  33. Proc Natl Acad Sci U S A. 2007 May 29;104(22):9352-7 [PMID: 17483479]
  34. Nat Rev Genet. 2008 May;9(5):411-2; author reply 414 [PMID: 18421312]
  35. Nat Rev Genet. 2009 Apr;10(4):276 [PMID: 19238178]
  36. Nucleic Acids Res. 2015 Dec 15;43(22):10655-72 [PMID: 26578579]
  37. Elife. 2021 Feb 05;10: [PMID: 33543711]
  38. Mol Ecol Resour. 2021 Apr;21(3):969-981 [PMID: 33277787]
  39. Nat Rev Genet. 2007 Dec;8(12):973-82 [PMID: 17984973]

Grants

  1. 31872217/National Natural Science Foundation of China
  2. GK202206021, GK202101003/Fundamental Research Funds for the Central Universities

Word Cloud

Created with Highcharts 10.0.0TETEsgenomeOrthoptera-TElibOrthopteraannotationelementsspeciesdatabaselibrarydataDfamTransposableeukaryoticgenomessignificantlyanalyzingusingavailableOrthoptera-specificusednovoentriesnamingRepeatMaskermajorcomponentpresentalmostorganismshighlydynamicwithinaffectsgeneralapplicabilitydatabasesknowngroupclassInsectaenlarged093-2148 Gblargeexistingpublicefficiencysatisfactoryaddresslimitationbecomesimperativecontinuallyupdateresourceneedinsectpubliclycomplete12deannotatemanuallyre-annotateunclassifiedconstructnon-redundantlibrary:contains24021includingre-annotatedresults13964unknownadoptsencodedthree-levelform"level1/level2-level3"candirectlyinputreferencecompatiblemainstreamrepetitivesequenceanalysissoftwarednaPipeTEperformsbettercomparedRepbaseregardlesslow-coveragesequencingassemblyimprovedresultAngaracrisrhodopaincreased789%5328%FinallystoredSqlite3convenienceupdatesuseraccessOrthoptera-TElib:transposableDerepbase

Similar Articles

Cited By