Long-read RNA sequencing reveals widespread sex-specific alternative splicing in threespine stickleback fish.

Alice S Naftaly, Shana Pau, Michael A White
Author Information
  1. Alice S Naftaly: Department of Genetics, University of Georgia, Athens, Georgia 30602, USA.
  2. Shana Pau: Department of Genetics, University of Georgia, Athens, Georgia 30602, USA.
  3. Michael A White: Department of Genetics, University of Georgia, Athens, Georgia 30602, USA. ORCID

Abstract

Alternate isoforms are important contributors to phenotypic diversity across eukaryotes. Although short-read RNA-sequencing has increased our understanding of isoform diversity, it is challenging to accurately detect full-length transcripts, preventing the identification of many alternate isoforms. Long-read sequencing technologies have made it possible to sequence full-length alternative transcripts, accurately characterizing alternative splicing events, alternate transcription start and end sites, and differences in UTR regions. Here, we use Pacific Biosciences (PacBio) long-read RNA-sequencing (Iso-Seq) to examine the transcriptomes of five organs in threespine stickleback fish (), a widely used genetic model species. The threespine stickleback fish has a refined genome assembly in which gene annotations are based on short-read RNA sequencing and predictions from coding sequence of other species. This suggests some of the existing annotations may be inaccurate or alternative transcripts may not be fully characterized. Using Iso-Seq we detected thousands of novel isoforms, indicating many isoforms are absent in the current Ensembl gene annotations. In addition, we refined many of the existing annotations within the genome. We noted many improperly positioned transcription start sites that were refined with long-read sequencing. The Iso-Seq-predicted transcription start sites were more accurate and verified through ATAC-seq. We also detected many alternative splicing events between sexes and across organs. We found a substantial number of genes in both somatic and gonadal samples that had sex-specific isoforms. Our study highlights the power of long-read sequencing to study the complexity of transcriptomes, greatly improving genomic resources for the threespine stickleback fish.

References

  1. Cell Rep. 2012 Sep 27;2(3):666-73 [PMID: 22939981]
  2. Mol Biol Evol. 2021 Jan 23;38(2):519-530 [PMID: 32977339]
  3. Cell Rep. 2013 Jun 27;3(6):2179-90 [PMID: 23791531]
  4. Mol Cell. 2000 Sep;6(3):605-16 [PMID: 11030340]
  5. Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688 [PMID: 31691826]
  6. Genes (Basel). 2019 Mar 27;10(4): [PMID: 30934798]
  7. Cell. 2014 Nov 6;159(4):800-13 [PMID: 25417157]
  8. Mol Cell Biol. 2010 Sep;30(18):4391-403 [PMID: 20647542]
  9. PLoS Genet. 2016 Dec 9;12(12):e1006464 [PMID: 27935948]
  10. Philos Trans R Soc Lond B Biol Sci. 2019 Nov 25;374(1786):20190097 [PMID: 31587638]
  11. Methods Mol Biol. 2012;883:97-110 [PMID: 22589127]
  12. Genome Biol. 2019 Dec 16;20(1):274 [PMID: 31842925]
  13. Nat Commun. 2018 Jul 27;9(1):2945 [PMID: 30054462]
  14. J Evol Biol. 2014 Jul;27(7):1443-53 [PMID: 25105198]
  15. Front Genet. 2019 Apr 26;10:384 [PMID: 31105749]
  16. Cell. 2015 May 21;161(5):1202-1214 [PMID: 26000488]
  17. PLoS One. 2011;6(7):e21374 [PMID: 21799735]
  18. Nat Rev Mol Cell Biol. 2017 Jul;18(7):437-451 [PMID: 28488700]
  19. J Neurosci. 2014 Sep 3;34(36):11929-47 [PMID: 25186741]
  20. BMC Genomics. 2017 Jun 13;18(1):461 [PMID: 28610618]
  21. Science. 2012 Dec 21;338(6114):1587-93 [PMID: 23258890]
  22. Nucleic Acids Res. 2002 Sep 1;30(17):3754-66 [PMID: 12202761]
  23. Evolution. 1984 Jul;38(4):735-742 [PMID: 28555827]
  24. Nucleic Acids Res. 2007;35(1):125-31 [PMID: 17158149]
  25. BMC Genomics. 2018 Feb 21;19(1):157 [PMID: 29466941]
  26. Proc Natl Acad Sci U S A. 2008 Jan 15;105(2):716-21 [PMID: 18184812]
  27. Annu Rev Genet. 1989;23:527-77 [PMID: 2694943]
  28. Genome Res. 2018 Feb 9;: [PMID: 29440222]
  29. J Exp Biol. 2013 Mar 1;216(Pt 5):835-40 [PMID: 23408802]
  30. Annu Rev Immunol. 2017 Apr 26;35:177-198 [PMID: 28125358]
  31. Nat Methods. 2013 Dec;10(12):1177-84 [PMID: 24185837]
  32. J Hered. 2010 Mar-Apr;101 Suppl 1:S94-9 [PMID: 20421329]
  33. Science. 2003 Jan 31;299(5607):697-700 [PMID: 12511656]
  34. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W293-7 [PMID: 16845012]
  35. Nat Rev Genet. 2009 Mar;10(3):155-9 [PMID: 19188922]
  36. J Hered. 2017 Sep 01;108(6):693-700 [PMID: 28821183]
  37. Nucleic Acids Res. 2013 Dec;41(22):10170-84 [PMID: 24038356]
  38. J Biomed Biotechnol. 2010;2010:853916 [PMID: 20625424]
  39. Bioinformatics. 2014 May 1;30(9):1236-40 [PMID: 24451626]
  40. Bioinformatics. 2014 Aug 1;30(15):2114-20 [PMID: 24695404]
  41. G3 (Bethesda). 2015 Jun 03;5(7):1463-72 [PMID: 26044731]
  42. Nature. 2003 Jun 19;423(6942):825-37 [PMID: 12815422]
  43. Nucleic Acids Res. 2018 Jul 2;46(W1):W71-W75 [PMID: 29788377]
  44. Mol Biol Evol. 2015 Aug;32(8):1981-95 [PMID: 25818858]
  45. Nat Biotechnol. 2019 Aug;37(8):907-915 [PMID: 31375807]
  46. Genome Biol. 2020 Feb 7;21(1):30 [PMID: 32033565]
  47. Mol Syst Biol. 2014 Feb 25;10:719 [PMID: 24569168]
  48. PLoS One. 2012;7(1):e30055 [PMID: 22276144]
  49. PLoS Genet. 2014 Jul 03;10(7):e1004427 [PMID: 24992477]
  50. Nature. 2011 Oct 19;478(7369):343-8 [PMID: 22012392]
  51. Bioinformatics. 2015 Oct 1;31(19):3210-2 [PMID: 26059717]
  52. Gigascience. 2018 Mar 01;7(3):1-12 [PMID: 29618047]
  53. Nat Rev Genet. 2010 May;11(5):345-55 [PMID: 20376054]
  54. Trends Genet. 2001 Feb;17(2):100-7 [PMID: 11173120]
  55. BMC Genomics. 2011 Jul 14;12:364 [PMID: 21756339]
  56. Integr Comp Biol. 2002 Aug;42(4):743-56 [PMID: 21708771]
  57. Nucleic Acids Res. 2017 Apr 7;45(6):e41 [PMID: 27903897]
  58. Methods Mol Biol. 2019;1962:227-245 [PMID: 31020564]
  59. DNA Res. 2019 Aug 1;26(4):301-311 [PMID: 31173073]
  60. Science. 2011 Feb 18;331(6019):916-20 [PMID: 21330546]
  61. Bioinformatics. 2009 May 1;25(9):1105-11 [PMID: 19289445]
  62. Nat Methods. 2013 Dec;10(12):1213-8 [PMID: 24097267]
  63. Dev Comp Immunol. 2017 Jan;66:73-83 [PMID: 27387152]
  64. Genome Res. 2010 Feb;20(2):180-9 [PMID: 20009012]
  65. BMC Genomics. 2019 Mar 12;20(1):202 [PMID: 30871468]
  66. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  67. Curr Opin Genet Dev. 2008 Dec;18(6):493-8 [PMID: 18929654]
  68. PLoS Genet. 2006 Mar;2(3):e43 [PMID: 16596168]
  69. Nucleic Acids Res. 2005 Sep 29;33(17):5659-66 [PMID: 16195578]
  70. Nature. 2001 Dec 20-27;414(6866):901-5 [PMID: 11780061]
  71. Database (Oxford). 2016 Jun 23;2016: [PMID: 27337980]
  72. Nat Rev Genet. 2007 Sep;8(9):689-98 [PMID: 17680007]
  73. Genome Biol. 2006;7(8):R79 [PMID: 16934145]
  74. BMC Genomics. 2017 Apr 24;18(1):323 [PMID: 28438136]
  75. Cells. 2020 Jan 22;9(2): [PMID: 31979061]
  76. Nat Commun. 2016 Jun 24;7:11708 [PMID: 27339440]
  77. G3 (Bethesda). 2019 Mar 7;9(3):755-767 [PMID: 30642874]
  78. Int Immunol. 2017 Apr 1;29(4):173-181 [PMID: 28498981]
  79. PLoS Comput Biol. 2009 Dec;5(12):e1000598 [PMID: 20011106]
  80. Development. 2008 Apr;135(7):1201-14 [PMID: 18287206]
  81. Science. 2015 May 8;348(6235):660-5 [PMID: 25954002]
  82. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 [PMID: 9254694]
  83. Genome Res. 2020 Dec;30(12):1716-1726 [PMID: 33208454]
  84. Genome Res. 2012 Oct;22(10):2008-17 [PMID: 22722343]
  85. Nat Methods. 2015 Apr;12(4):357-60 [PMID: 25751142]
  86. Heredity (Edinb). 2011 Feb;106(2):218-27 [PMID: 20700139]
  87. BMC Genomics. 2012 Nov 10;13:607 [PMID: 23140559]
  88. Nat Commun. 2016 Jun 24;7:11706 [PMID: 27339290]
  89. Genome Biol. 2016 Jan 26;17:13 [PMID: 26813401]
  90. Nat Rev Genet. 2011 Feb;12(2):136-49 [PMID: 21245830]
  91. G3 (Bethesda). 2021 Feb 9;11(2): [PMID: 33598708]
  92. Gen Comp Endocrinol. 2010 Feb 1;165(3):390-411 [PMID: 19348807]
  93. Nature. 2014 Aug 28;512(7515):393-9 [PMID: 24670639]
  94. DNA Res. 2018 Aug 1;25(4):421-437 [PMID: 29850846]
  95. Genetics. 2009 Feb;181(2):421-34 [PMID: 19015538]
  96. Nat Biotechnol. 2013 Nov;31(11):1009-14 [PMID: 24108091]
  97. Mol Biol Evol. 2010 Jul;27(7):1495-503 [PMID: 20142438]
  98. Genome Res. 2018 Sep;28(9):1415-1425 [PMID: 30061115]
  99. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  100. Mol Syst Biol. 2016 Jul 18;12(7):875 [PMID: 27430939]
  101. Nature. 2012 Apr 04;484(7392):55-61 [PMID: 22481358]
  102. Nature. 2010 Jan 28;463(7280):536-9 [PMID: 20072128]
  103. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  104. Genome Res. 2008 Jul;18(7):1073-83 [PMID: 18550805]
  105. Curr Biol. 2012 Jan 10;22(1):83-90 [PMID: 22197244]
  106. Nat Rev Genet. 2009 Dec;10(12):833-44 [PMID: 19920851]
  107. Cell. 1989 Mar 24;56(6):997-1010 [PMID: 2493994]
  108. BMC Bioinformatics. 2009 Dec 15;10:421 [PMID: 20003500]
  109. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  110. Genome Biol. 2020 Jul 19;21(1):177 [PMID: 32684159]
  111. Gigascience. 2017 Nov 1;6(11):1-13 [PMID: 29048540]

Grants

  1. T32 GM007103/NIGMS NIH HHS

MeSH Term

Alternative Splicing
Animals
Gene Expression Profiling
High-Throughput Nucleotide Sequencing
Sequence Analysis, RNA
Smegmamorpha
Transcriptome

Word Cloud

Created with Highcharts 10.0.0isoformsmanysequencingalternativethreespinesticklebackfishannotationstranscriptssplicingtranscriptionstartsiteslong-readrefineddiversityacrossshort-readRNA-sequencingaccuratelyfull-lengthalternateLong-readsequenceeventsIso-SeqtranscriptomesorgansspeciesgenomegeneRNAexistingmaydetectedsex-specificstudyAlternateimportantcontributorsphenotypiceukaryotesAlthoughincreasedunderstandingisoformchallengingdetectpreventingidentificationtechnologiesmadepossiblecharacterizingenddifferencesUTRregionsusePacificBiosciencesPacBioexaminefivewidelyusedgeneticmodelassemblybasedpredictionscodingsuggestsinaccuratefullycharacterizedUsingthousandsnovelindicatingabsentcurrentEnsembladditionwithinnotedimproperlypositionedIso-Seq-predictedaccurateverifiedATAC-seqalsosexesfoundsubstantialnumbergenessomaticgonadalsampleshighlightspowercomplexitygreatlyimprovinggenomicresourcesrevealswidespread

Similar Articles

Cited By