Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data.

H Beiki, H Liu, J Huang, N Manchanda, D Nonneman, T P L Smith, J M Reecy, C K Tuggle
Author Information
  1. H Beiki: Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA.
  2. H Liu: Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA.
  3. J Huang: Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA.
  4. N Manchanda: Department of Ecology, Evolution, and Organismal Biology, Iowa State University, 819 Wallace Road, Ames, IA, 50011, USA.
  5. D Nonneman: USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA.
  6. T P L Smith: USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA.
  7. J M Reecy: Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA.
  8. C K Tuggle: Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA. cktuggle@iastate.edu.

Abstract

BACKGROUND: Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig.
RESULTS: Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were identified; an increase over current EBI (1.9 tpg) and NCBI (2.9 tpg) annotations and closer to the number reported in human genome (4.2 tpg). Our new pig genome annotation extended more than 6000 known gene borders (5' end extension, 3' end extension, or both) compared to EBI or NCBI annotations. We validated a large proportion of these extensions by independent pig poly(A) selected 3'-RNA-seq data, or human FANTOM5 Cap Analysis of Gene Expression data. Further, we detected 10,465 novel genes (81% non-coding) not reported in current pig genome annotations. More than 80% of these novel genes had transcripts detected in > 1 tissue. In addition, more than 80% of novel intergenic genes with at least one transcript detected in liver tissue had H3K4me3 or H3K36me3 peaks mapping to their promoter and gene body, respectively, in independent liver chromatin immunoprecipitation data.
CONCLUSIONS: These validated results show significant improvement over current pig genome annotations.

Keywords

References

  1. Mol Cell. 2003 Dec;12(6):1439-52 [PMID: 14690598]
  2. Nat Genet. 2004 Oct;36(10):1073-8 [PMID: 15448691]
  3. Genome Biol. 2004;5(10):R74 [PMID: 15461793]
  4. Bioinformatics. 2005 May 1;21(9):1859-75 [PMID: 15728110]
  5. RNA. 2005 Oct;11(10):1530-44 [PMID: 16199763]
  6. Mol Cell Biol. 2006 Feb;26(4):1272-87 [PMID: 16449641]
  7. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W435-9 [PMID: 16845043]
  8. Trends Biochem Sci. 2006 Nov;31(11):639-46 [PMID: 17010613]
  9. Genome Biol. 2007;8(7):R139 [PMID: 17625002]
  10. BMC Bioinformatics. 2008 Feb 25;9:114 [PMID: 18298808]
  11. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  12. Science. 2009 Jan 2;323(5910):133-8 [PMID: 19023044]
  13. Bioinformatics. 2009 Apr 15;25(8):1091-3 [PMID: 19237447]
  14. Bioinformatics. 2009 May 1;25(9):1105-11 [PMID: 19289445]
  15. Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7 [PMID: 19474294]
  16. BMC Bioinformatics. 2009 Dec 15;10:421 [PMID: 20003500]
  17. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  18. BMC Res Notes. 2010 May 05;3:123 [PMID: 20444244]
  19. Nat Biotechnol. 2011 May 15;29(7):644-52 [PMID: 21572440]
  20. Bioinformatics. 2011 Sep 1;27(17):2325-9 [PMID: 21697122]
  21. Genome Res. 2011 Nov;21(11):1769-76 [PMID: 21875934]
  22. Nucleic Acids Res. 2012 Jan;40(Database issue):D1047-54 [PMID: 22139925]
  23. Trends Microbiol. 2012 Jan;20(1):50-7 [PMID: 22153753]
  24. Nat Protoc. 2012 Mar 01;7(3):562-78 [PMID: 22383036]
  25. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  26. Nat Biotechnol. 2012 Jul 01;30(7):693-700 [PMID: 22750884]
  27. Nat Protoc. 2012 Sep;7(9):1728-40 [PMID: 22936215]
  28. Genome Res. 2012 Sep;22(9):1775-89 [PMID: 22955988]
  29. Nature. 2012 Nov 15;491(7424):393-8 [PMID: 23151582]
  30. Plant Physiol. 2013 Jul;162(3):1750-63 [PMID: 23735510]
  31. Nat Biotechnol. 2013 Nov;31(11):1009-14 [PMID: 24108091]
  32. Bioinformatics. 2014 Mar 1;30(5):614-20 [PMID: 24142950]
  33. Genome Res. 2014 Apr;24(4):616-28 [PMID: 24429298]
  34. Nature. 2014 Jan 30;505(7485):635-40 [PMID: 24463510]
  35. Bioinformatics. 2014 Aug 1;30(15):2114-20 [PMID: 24695404]
  36. Proc Natl Acad Sci U S A. 2014 Apr 29;111(17):6131-8 [PMID: 24753594]
  37. Brief Bioinform. 2015 May;16(3):393-412 [PMID: 24916300]
  38. Bioinformatics. 2014 Nov 1;30(21):3004-11 [PMID: 25015988]
  39. Bioinformatics. 2015 Jan 15;31(2):166-9 [PMID: 25260700]
  40. Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63 [PMID: 25378336]
  41. Nucleic Acids Res. 2015 Jul 1;43(W1):W589-98 [PMID: 25897122]
  42. Cell Rep. 2015 May 19;11(7):1110-22 [PMID: 25959816]
  43. Biochim Biophys Acta. 2016 Jan;1859(1):3-15 [PMID: 26477492]
  44. BMC Genomics. 2015 Nov 18;16:970 [PMID: 26582032]
  45. Cell. 2016 Jun 2;165(6):1319-1322 [PMID: 27259145]
  46. Database (Oxford). 2016 Jun 23;2016: [PMID: 27337980]
  47. Nat Commun. 2016 Jun 24;7:11708 [PMID: 27339440]
  48. Nat Protoc. 2016 Sep;11(9):1650-67 [PMID: 27560171]
  49. Nat Genet. 2016 Oct;48(10):1112-8 [PMID: 27618451]
  50. Sci Rep. 2016 Oct 18;6:35520 [PMID: 27752099]
  51. BMC Genomics. 2016 Nov 2;17(1):846 [PMID: 27806696]
  52. BMC Genomics. 2017 Apr 24;18(1):323 [PMID: 28438136]
  53. J Proteome Res. 2017 Aug 4;16(8):2887-2898 [PMID: 28625053]
  54. Bioinformatics. 2017 Sep 15;33(18):2938-2940 [PMID: 28645171]
  55. Genome Biol. 2017 Aug 30;18(1):162 [PMID: 28854954]
  56. Sci Adv. 2017 Sep 27;3(9):eaao2110 [PMID: 28959731]
  57. Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761 [PMID: 29155950]
  58. BMC Immunol. 2018 Jan 5;19(1):2 [PMID: 29301495]
  59. DNA Res. 2018 Aug 1;25(4):421-437 [PMID: 29850846]
  60. Nat Methods. 2018 Jul;15(7):505-511 [PMID: 29867192]
  61. BMC Genomics. 2018 Jun 27;19(1):499 [PMID: 29945546]

Grants

  1. 3040 - 31000 - 099 - 00D/Agricultural Research Service (US)
  2. 3040 - 31000 - 100 - 00D/Agricultural Research Service
  3. none/NRSP-8 Swine Genome Coordination

MeSH Term

Alternative Splicing
Animals
Chromatin Immunoprecipitation
Computational Biology
Genome
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation
Sus scrofa

Word Cloud

Created with Highcharts 10.0.0pigtranscriptsnovelgenesgenomesequencingRNA-seqtpgannotationsdataRNAtissue80%knowngenecurrentannotationdetectedtranscripttissuespolyselectedIso-seqprotein-codinglongnon-codingaveragelargeproportionEBI9NCBI2reportedhumanendextensionvalidatedindependentliverBACKGROUND:understandingtranscriptomelimiteddiversityamongnineassessedusingsingle-moleculelong-readisoformIlluminasingleWhitecross-bredRESULTS:Acrosstotal67746uniqueobservedincluding605%predicted362%33%nonsense-mediateddecay90%splicejunctionssupportedwithinrepresentedmostlyproduced70%17%correspondedfourperidentifiedincrease1closernumber4newextended6000borders5'3'comparedextensions3'-RNA-seqFANTOM5CapAnalysisGeneExpression1046581%> 1additionintergenicleastoneH3K4me3H3K36me3peaksmappingpromoterbodyrespectivelychromatinimmunoprecipitationCONCLUSIONS:resultsshowsignificantimprovementImproveddomesticintegrationIso-SeqGenomePacBioPorcineSinglemoleculereadTranscriptome

Similar Articles

Cited By (40)