ToxCodAn-Genome: an automated pipeline for toxin-gene annotation in genome assembly of venomous lineages.

Pedro G Nachtigall, Alan M Durham, Darin R Rokyta, In��cio L M Junqueira-de-Azevedo
Author Information
  1. Pedro G Nachtigall: Laborat��rio de Toxinologia Aplicada, CeTICS, Instituto Butantan, S��o Paulo, 05503-900 SP, Brazil. ORCID
  2. Alan M Durham: Departamento de Ci��ncia da Computa����o, Instituto de Matem��tica e Estat��stica, Universidade de S��o Paulo (USP), S��o Paulo, 05508-090 SP, Brazil. ORCID
  3. Darin R Rokyta: Department of Biological Science, Florida State University, Tallahassee, 32306-4295 FL, USA. ORCID
  4. In��cio L M Junqueira-de-Azevedo: Laborat��rio de Toxinologia Aplicada, CeTICS, Instituto Butantan, S��o Paulo, 05503-900 SP, Brazil. ORCID

Abstract

BACKGROUND: The rapid development of sequencing technologies resulted in a wide expansion of genomics studies using venomous lineages. This facilitated research focusing on understanding the evolution of adaptive traits and the search for novel compounds that can be applied in agriculture and medicine. However, the toxin annotation of genomes is a laborious and time-consuming task, and no consensus pipeline is currently available. No computational tool currently exists to address the challenges specific to toxin annotation and to ensure the reproducibility of the process.
RESULTS: Here, we present ToxCodAn-Genome, the first software designed to perform automated toxin annotation in genomes of venomous lineages. This pipeline was designed to retrieve the full-length coding sequences of toxins and to allow the detection of novel truncated paralogs and pseudogenes. We tested ToxCodAn-Genome using 12 genomes of venomous lineages and achieved high performance on recovering their current toxin annotations. This tool can be easily customized to allow improvements in the final toxin annotation set and can be expanded to virtually any venomous lineage. ToxCodAn-Genome is fast, allowing it to run on any personal computer, but it can also be executed in multicore mode, taking advantage of large high-performance servers. In addition, we provide a guide to direct future research in the venomics field to ensure a confident toxin annotation in the genome being studied. As a case study, we sequenced and annotated the toxin repertoire of Bothrops alternatus, which may facilitate future evolutionary and biomedical studies using vipers as models.
CONCLUSIONS: ToxCodAn-Genome is suitable to perform toxin annotation in the genome of venomous species and may help to improve the reproducibility of further studies. ToxCodAn-Genome and the guide are freely available at https://github.com/pedronachtigall/ToxCodAn-Genome.

Keywords

References

  1. Biology (Basel). 2020 Sep 18;9(9): [PMID: 32962098]
  2. BMC Bioinformatics. 2011 Aug 04;12:323 [PMID: 21816040]
  3. NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108 [PMID: 33575650]
  4. Toxins (Basel). 2022 Mar 25;14(4): [PMID: 35448846]
  5. Nat Methods. 2021 Feb;18(2):170-175 [PMID: 33526886]
  6. Mol Biol Evol. 2021 Oct 27;38(11):4867-4883 [PMID: 34320652]
  7. Toxicon. 2012 Sep 15;60(4):551-7 [PMID: 22465017]
  8. Nucleic Acids Res. 2019 Jul 2;47(W1):W5-W10 [PMID: 31062021]
  9. Bioinformatics. 2012 Dec 1;28(23):3150-2 [PMID: 23060610]
  10. Gigascience. 2019 Sep 1;8(9): [PMID: 31494669]
  11. Proc Natl Acad Sci U S A. 2022 Dec 20;119(51):e2214880119 [PMID: 36508672]
  12. Genome Res. 2022 Jun;32(6):1058-1073 [PMID: 35649579]
  13. Nature. 2006 Feb 2;439(7076):584-8 [PMID: 16292255]
  14. Bioinformatics. 2018 Mar 15;34(6):1074-1076 [PMID: 29069336]
  15. Comput Struct Biotechnol J. 2019 Nov 17;18:9-19 [PMID: 31890139]
  16. Nat Commun. 2016 Oct 06;7:13107 [PMID: 27708285]
  17. Genome Biol. 2006;7 Suppl 1:S10.1-12 [PMID: 16925832]
  18. Methods Mol Biol. 2019;1962:65-95 [PMID: 31020555]
  19. Mol Biol Evol. 2018 Jun 1;35(6):1390-1406 [PMID: 29562344]
  20. Front Pharmacol. 2020 Jul 24;11:1132 [PMID: 32848750]
  21. Mol Biol Evol. 2020 Jun 1;37(6):1744-1760 [PMID: 32077944]
  22. Nat Commun. 2013;4:2602 [PMID: 24129506]
  23. Toxins (Basel). 2020 Dec 11;12(12): [PMID: 33322460]
  24. Nucleic Acids Res. 2023 Jan 6;51(D1):D29-D38 [PMID: 36370100]
  25. Bioinformatics. 2014 May 1;30(9):1236-40 [PMID: 24451626]
  26. Cell. 2020 Jan 23;180(2):233-247.e21 [PMID: 31978343]
  27. BMC Biol. 2021 Dec 23;19(1):268 [PMID: 34949191]
  28. Nat Commun. 2022 Oct 27;13(1):6417 [PMID: 36302851]
  29. BMC Bioinformatics. 2005 Feb 15;6:31 [PMID: 15713233]
  30. Insect Mol Biol. 2010 Feb;19 Suppl 1:11-26 [PMID: 20167014]
  31. Mar Drugs. 2021 Dec 24;20(1): [PMID: 35049882]
  32. Nat Rev Genet. 2022 Mar;23(3):169-181 [PMID: 34837041]
  33. Brief Bioinform. 2021 Sep 2;22(5): [PMID: 33866357]
  34. Nat Biotechnol. 2019 Aug;37(8):907-915 [PMID: 31375807]
  35. Proc Natl Acad Sci U S A. 2013 Dec 17;110(51):20651-6 [PMID: 24297900]
  36. Genomics Proteomics Bioinformatics. 2023 Jun;21(3):501-514 [PMID: 36470576]
  37. Gigascience. 2022 Oct 30;11: [PMID: 36310247]
  38. Genome Biol. 2021 Nov 14;22(1):312 [PMID: 34775997]
  39. Genome Biol. 2019 May 16;20(1):92 [PMID: 31097009]
  40. Curr Biol. 2016 Sep 26;26(18):2434-2445 [PMID: 27641771]
  41. Genome Biol Evol. 2022 Jul 2;14(7): [PMID: 35670514]
  42. Nat Commun. 2023 Jan 16;14(1):249 [PMID: 36646703]
  43. Proc Natl Acad Sci U S A. 2021 May 18;118(20): [PMID: 33972420]
  44. Brief Bioinform. 2013 Jan;14(1):1-12 [PMID: 22408191]
  45. Methods Mol Biol. 2019;1962:161-177 [PMID: 31020559]
  46. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25 [PMID: 14534192]
  47. Science. 2023 Apr 28;380(6643):eabn3107 [PMID: 37104600]
  48. Genome Res. 2008 Jan;18(1):188-96 [PMID: 18025269]
  49. Toxicon. 2008 Dec 15;52(8):842-51 [PMID: 18983867]
  50. BMC Genomics. 2010 Oct 26;11:605 [PMID: 20977763]
  51. Comp Biochem Physiol Part D Genomics Proteomics. 2014 Dec;12:74-83 [PMID: 25463060]
  52. Nat Biotechnol. 2015 Mar;33(3):290-5 [PMID: 25690850]
  53. Genome Res. 2019 Apr;29(4):590-601 [PMID: 30898880]
  54. Gigascience. 2024 Jan 2;13: [PMID: 38241143]
  55. BMC Genomics. 2015 Jan 02;16:1 [PMID: 25553907]
  56. Toxins (Basel). 2021 Feb 15;13(2): [PMID: 33671927]
  57. Proc Natl Acad Sci U S A. 2021 Jan 26;118(4): [PMID: 33468678]
  58. Trends Ecol Evol. 2013 Apr;28(4):219-29 [PMID: 23219381]
  59. Toxicon. 2008 Aug 1;52(2):264-76 [PMID: 18619481]
  60. Nat Genet. 2020 Jan;52(1):106-117 [PMID: 31907489]
  61. Mol Biol Evol. 2015 Jan;32(1):268-74 [PMID: 25371430]
  62. BMC Genomics. 2019 Dec 17;20(1):992 [PMID: 31847811]
  63. PeerJ. 2018 Jul 31;6:e5361 [PMID: 30083468]
  64. NAR Genom Bioinform. 2020 Jun;2(2):lqaa026 [PMID: 32440658]
  65. Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773 [PMID: 30357393]
  66. J Mol Evol. 2021 Jun;89(4-5):313-328 [PMID: 33881604]
  67. PLoS Comput Biol. 2018 Jun 25;14(6):e1006277 [PMID: 29939994]
  68. Cell Rep. 2022 Jul 12;40(2):111079 [PMID: 35830808]
  69. J Dev Biol. 2022 Nov 15;10(4): [PMID: 36412642]
  70. G3 (Bethesda). 2015 May 07;5(7):1439-51 [PMID: 25953959]
  71. Bioinformatics. 2021 Jul 19;37(12):1639-1643 [PMID: 33320174]
  72. Genome Biol. 2019 Dec 16;20(1):275 [PMID: 31843001]
  73. Nucleic Acids Res. 2012 Jan;40(Database issue):D325-30 [PMID: 22058133]
  74. Gigascience. 2019 Jul 1;8(7): [PMID: 31289835]
  75. Nat Protoc. 2013 Aug;8(8):1494-512 [PMID: 23845962]
  76. Trends Pharmacol Sci. 2020 Aug;41(8):570-581 [PMID: 32564899]
  77. Nat Rev Genet. 2016 Dec;17(12):758-772 [PMID: 27773922]
  78. Toxicon. 2022 Sep;216:92-106 [PMID: 35820472]
  79. BMC Genomics. 2019 Apr 8;20(1):275 [PMID: 30961563]
  80. Proc Natl Acad Sci U S A. 2018 Apr 24;115(17):4325-4333 [PMID: 29686065]
  81. Gigascience. 2022 May 18;11: [PMID: 35640874]
  82. Proc Natl Acad Sci U S A. 2020 May 19;117(20):10911-10920 [PMID: 32366667]
  83. Proc Natl Acad Sci U S A. 2022 Jan 4;119(1): [PMID: 34983844]
  84. Mol Biol Evol. 2018 Mar 1;35(3):543-548 [PMID: 29220515]
  85. Nat Rev Genet. 2012 Apr 18;13(5):329-42 [PMID: 22510764]
  86. J Proteomics. 2009 Mar 6;72(2):127-36 [PMID: 19457354]
  87. BMC Bioinformatics. 2004 May 14;5:59 [PMID: 15144565]
  88. Proc Natl Acad Sci U S A. 2021 Apr 6;118(14): [PMID: 33782124]
  89. Mol Biol Evol. 2020 Oct 1;37(10):2777-2790 [PMID: 32462210]
  90. Toxins (Basel). 2018 Jun 19;10(6): [PMID: 29921759]

Grants

  1. 2013/07467-1/Funda����o de Amparo �� Pesquisa do Estado de S��o Paulo
  2. 1638902/National Science Foundation

MeSH Term

Molecular Sequence Annotation
Venoms
Reproducibility of Results
Genome
Software
Venomous Snakes
Bothrops

Chemicals

Venoms

Word Cloud

Created with Highcharts 10.0.0toxinannotationvenomousToxCodAn-GenomelineagescangenomestudiesusinggenomespipelineresearchnovelcurrentlyavailabletoolensurereproducibilitydesignedperformautomatedallowguidefuturevenomicsmayBACKGROUND:rapiddevelopmentsequencingtechnologiesresultedwideexpansiongenomicsfacilitatedfocusingunderstandingevolutionadaptivetraitssearchcompoundsappliedagriculturemedicineHoweverlaborioustime-consumingtaskconsensuscomputationalexistsaddresschallengesspecificprocessRESULTS:presentfirstsoftwareretrievefull-lengthcodingsequencestoxinsdetectiontruncatedparalogspseudogenestested12achievedhighperformancerecoveringcurrentannotationseasilycustomizedimprovementsfinalsetexpandedvirtuallylineagefastallowingrunpersonalcomputeralsoexecutedmulticoremodetakingadvantagelargehigh-performanceserversadditionprovidedirectfieldconfidentstudiedcasestudysequencedannotatedrepertoireBothropsalternatusfacilitateevolutionarybiomedicalvipersmodelsCONCLUSIONS:suitablespecieshelpimprovefreelyhttps://githubcom/pedronachtigall/ToxCodAn-GenomeToxCodAn-Genome:toxin-geneassemblyDNA-seqWGSgenemodel

Similar Articles

Cited By (3)