ICAnnoLncRNA: A Snakemake Pipeline for a Long Non-Coding-RNA Search and Annotation in Transcriptomic Sequences.

Artem Yu Pronozin, Dmitry A Afonnikov
Author Information
  1. Artem Yu Pronozin: Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia. ORCID
  2. Dmitry A Afonnikov: Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia. ORCID

Abstract

Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have shown the diversity and importance of lncRNA functions in plants. To expand knowledge about lncRNAs in other species, computational pipelines that allow for standardised data-processing steps in a mode that does not require user control up until the final result were actively developed recently. These advancements enable wider functionality for lncRNA data identification and analysis. In the present work, we propose the ICAnnoLncRNA pipeline for the automatic identification, classification and annotation of plant lncRNAs in assembled transcriptomic sequences. It uses the LncFinder software for the identification of lncRNAs and allows the adjustment of recognition parameters using genomic data for which lncRNA annotation is available. The pipeline allows the prediction of lncRNA candidates, alignment of lncRNA sequences to the reference genome, filtering of erroneous/noise transcripts and probable transposable elements, lncRNA classification by genome location, comparison with sequences from external databases and analysis of lncRNA structural features and expression. We used transcriptomic sequences from 15 maize libraries assembled by Trinity and Hisat2/StringTie to demonstrate the application of the ICAnnoLncRNA pipeline.

Keywords

References

  1. PLoS Genet. 2014 Nov 06;10(11):e1004745 [PMID: 25375861]
  2. Bioinformatics. 2012 Oct 1;28(19):2520-2 [PMID: 22908215]
  3. Sci Rep. 2017 May 8;7(1):1559 [PMID: 28484260]
  4. Cells. 2021 Mar 20;10(3): [PMID: 33804736]
  5. BMC Bioinformatics. 2021 Feb 9;22(1):59 [PMID: 33563213]
  6. Nat Struct Mol Biol. 2015 Jan;22(1):5-7 [PMID: 25565026]
  7. J Integr Plant Biol. 2019 Apr;61(4):394-405 [PMID: 30117291]
  8. Genes Dev. 2011 Sep 15;25(18):1915-27 [PMID: 21890647]
  9. Nat Rev Genet. 2014 Jan;15(1):7-21 [PMID: 24296535]
  10. Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6 [PMID: 22009675]
  11. Sci Rep. 2020 Jun 25;10(1):10395 [PMID: 32587349]
  12. Cell. 2013 Jul 3;154(1):26-46 [PMID: 23827673]
  13. Algorithms Mol Biol. 2011 Nov 24;6:26 [PMID: 22115189]
  14. Methods Mol Biol. 2016;1374:115-40 [PMID: 26519403]
  15. Nature. 2017 Jun 22;546(7659):524-527 [PMID: 28605751]
  16. Science. 2016 Aug 19;353(6301):814-8 [PMID: 27540173]
  17. Int J Mol Sci. 2019 Nov 08;20(22): [PMID: 31717266]
  18. Sci Rep. 2015 Dec 18;5:16946 [PMID: 26679690]
  19. BMC Bioinformatics. 2014 Sep 19;15:311 [PMID: 25239089]
  20. Sci China Life Sci. 2018 Feb;61(2):190-198 [PMID: 29101587]
  21. Nat Biotechnol. 2016 May;34(5):525-7 [PMID: 27043002]
  22. Nat Protoc. 2012 Mar 01;7(3):562-78 [PMID: 22383036]
  23. Bioinformatics. 2018 Sep 1;34(17):i884-i890 [PMID: 30423086]
  24. Sci Rep. 2017 Sep 12;7(1):11252 [PMID: 28900227]
  25. Genomics Proteomics Bioinformatics. 2017 Oct;15(5):301-312 [PMID: 29017967]
  26. Bioinformatics. 2010 Oct 1;26(19):2460-1 [PMID: 20709691]
  27. Genes (Basel). 2021 Sep 29;12(10): [PMID: 34680944]
  28. Nucleic Acids Res. 2015 Jan;43(Database issue):D982-9 [PMID: 25398903]
  29. Nature. 2010 Apr 15;464(7291):1071-6 [PMID: 20393566]
  30. Nat Biotechnol. 2015 Mar;33(3):290-5 [PMID: 25690850]
  31. RNA Biol. 2012 Mar;9(3):302-13 [PMID: 22336715]
  32. Methods Mol Biol. 2019;1933:415-429 [PMID: 30945201]
  33. Plant J. 2022 May;110(4):978-993 [PMID: 35218100]
  34. PLoS One. 2016 Jul 07;11(7):e0158784 [PMID: 27388760]
  35. Trends Plant Sci. 2012 Jan;17(1):16-21 [PMID: 22104407]
  36. Genome Biol. 2015 Jan 29;16:20 [PMID: 25630241]
  37. Nucleic Acids Res. 2021 Jan 8;49(D1):D86-D91 [PMID: 33221906]
  38. Front Genet. 2021 May 20;12:664260 [PMID: 34093657]
  39. Nat Biotechnol. 2010 May;28(5):503-10 [PMID: 20436462]
  40. Brief Bioinform. 2019 Nov 27;20(6):2009-2027 [PMID: 30084867]
  41. Bioinformatics. 2005 May 1;21(9):1859-75 [PMID: 15728110]
  42. Nucleic Acids Res. 2021 Jan 8;49(D1):D1489-D1495 [PMID: 33079992]
  43. F1000Res. 2020 Apr 28;9:304 [PMID: 32489650]
  44. Nucleic Acids Res. 2020 Jan 8;48(D1):D689-D695 [PMID: 31598706]
  45. Comput Biol Med. 2020 Dec;127:104028 [PMID: 33126123]
  46. Mol Cell. 2013 Jul 25;51(2):156-73 [PMID: 23870142]
  47. Noncoding RNA. 2017 Mar 24;3(2): [PMID: 29657289]
  48. Genome Biol. 2019 Dec 16;20(1):275 [PMID: 31843001]
  49. Nat Commun. 2018 Nov 29;9(1):5056 [PMID: 30498193]
  50. J Genet Genomics. 2018 Jul 20;45(7):399-401 [PMID: 30055874]
  51. Nat Methods. 2015 Apr;12(4):357-60 [PMID: 25751142]
  52. Mol Cell. 2010 Sep 24;39(6):925-38 [PMID: 20797886]
  53. Front Med (Lausanne). 2018 Sep 06;5:244 [PMID: 30238005]
  54. Bioinformatics. 2018 Nov 15;34(22):3825-3834 [PMID: 29850816]
  55. Genes Dev. 2012 Aug 1;26(15):1685-90 [PMID: 22855831]
  56. Nat Genet. 2015 Mar;47(3):199-208 [PMID: 25599403]
  57. Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16 [PMID: 28521017]
  58. Nat Protoc. 2013 Aug;8(8):1494-512 [PMID: 23845962]
  59. Genes (Basel). 2012 Mar 08;3(1):176-90 [PMID: 24704849]
  60. Nat Rev Mol Cell Biol. 2018 Mar;19(3):143-157 [PMID: 29138516]
  61. Genome Biol Evol. 2018 Sep 1;10(9):2551-2557 [PMID: 30184083]
  62. Genes (Basel). 2019 Jun 07;10(6): [PMID: 31181663]
  63. Nucleic Acids Res. 2013 Apr 1;41(6):e74 [PMID: 23335781]
  64. Nucleic Acids Res. 2016 Jan 4;44(D1):D1161-6 [PMID: 26578586]
  65. Nucleic Acids Res. 2020 Mar 18;48(5):2332-2347 [PMID: 31863587]
  66. BMC Genomics. 2021 Oct 14;22(Suppl 3):739 [PMID: 34649506]
  67. Sci Rep. 2019 Aug 21;9(1):12147 [PMID: 31434910]
  68. Genome Biol. 2014 Feb 27;15(2):R40 [PMID: 24576388]
  69. BMC Genomics. 2013;14 Suppl 2:S7 [PMID: 23445546]
  70. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  71. Genetics. 2013 Mar;193(3):651-69 [PMID: 23463798]

MeSH Term

RNA, Long Noncoding
Gene Expression Profiling
Transcriptome
Zea mays
Software

Chemicals

RNA, Long Noncoding

Word Cloud

Created with Highcharts 10.0.0lncRNAlncRNAspipelinesequencesidentificationclassificationannotationLongnon-codingRNAdataanalysisICAnnoLncRNAautomaticassembledtranscriptomicallowsgenomemaizeRNAsmoleculeslonger200nucleotidesencodeproteinsExperimentalstudiesshowndiversityimportancefunctionsplantsexpandknowledgespeciescomputationalpipelinesallowstandardiseddata-processingstepsmoderequireusercontrolfinalresultactivelydevelopedrecentlyadvancementsenablewiderfunctionalitypresentworkproposeplantusesLncFindersoftwareadjustmentrecognitionparametersusinggenomicavailablepredictioncandidatesalignmentreferencefilteringerroneous/noisetranscriptsprobabletransposableelementslocationcomparisonexternaldatabasesstructuralfeaturesexpressionused15librariesTrinityHisat2/StringTiedemonstrateapplicationICAnnoLncRNA:SnakemakePipelineNon-Coding-RNASearchAnnotationTranscriptomicSequenceslongtranscriptome

Similar Articles

Cited By