Global Analysis of Transcription Start Sites in the New Ovine Reference Genome ().

Mazdak Salavati, Alex Caulton, Richard Clark, Iveta Gazova, Timothy P L Smith, Kim C Worley, Noelle E Cockett, Alan L Archibald, Shannon M Clarke, Brenda M Murdoch, Emily L Clark
Author Information
  1. Mazdak Salavati: The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom.
  2. Alex Caulton: AgResearch, Invermay Agricultural Centre, Mosgiel, New Zealand.
  3. Richard Clark: Genetics Core, Edinburgh Clinical Research Facility, The University of Edinburgh, Edinburgh, United Kingdom.
  4. Iveta Gazova: The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom.
  5. Timothy P L Smith: USDA, Agricultural Research Service, U.S. Meat Animal Research Center, Clay Center, NE, United States.
  6. Kim C Worley: Baylor College of Medicine, Houston, TX, United States.
  7. Noelle E Cockett: Department of Animal, Dairy and Veterinary Sciences, Utah State University, Logan, UT, United States.
  8. Alan L Archibald: The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom.
  9. Shannon M Clarke: AgResearch, Invermay Agricultural Centre, Mosgiel, New Zealand.
  10. Brenda M Murdoch: Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID, United States.
  11. Emily L Clark: The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom.

Abstract

The overall aim of the Ovine FAANG project is to provide a comprehensive annotation of the new highly contiguous sheep reference genome sequence (). Mapping of transcription start sites (TSS) is a key first step in understanding transcript regulation and diversity. Using 56 tissue samples collected from the reference ewe Benz2616, we have performed a global analysis of TSS and TSS-Enhancer clusters using Cap Analysis Gene Expression (CAGE) sequencing. CAGE measures RNA expression by 5' cap-trapping and has been specifically designed to allow the characterization of TSS within promoters to single-nucleotide resolution. We have adapted an analysis pipeline that uses TagDust2 for clean-up and trimming, Bowtie2 for mapping, CAGEfightR for clustering, and the Integrative Genomics Viewer (IGV) for visualization. Mapping of CAGE tags indicated that the expression levels of CAGE tag clusters varied across tissues. Expression profiles across tissues were validated using corresponding polyA+ mRNA-Seq data from the same samples. After removal of CAGE tags with <10 read counts, 39.3% of TSS overlapped with 5' ends of 31,113 transcripts that had been previously annotated by NCBI (out of a total of 56,308 from the NCBI annotation). For 25,195 of the transcripts, previously annotated by NCBI, no TSS meeting stringent criteria were identified. A further 14.7% of TSS mapped to within 50 bp of annotated promoter regions. Intersecting these predicted TSS regions with annotated promoter regions (±50 bp) revealed 46% of the predicted TSS were "novel" and previously un-annotated. Using whole-genome bisulfite sequencing data from the same tissues, we were able to determine that a proportion of these "novel" TSS were hypo-methylated (32.2%) indicating that they are likely to be reproducible rather than "noise". This global analysis of TSS in sheep will significantly enhance the annotation of gene models in the new ovine reference assembly. Our analyses provide one of the highest resolution annotations of transcript regulation and diversity in a livestock species to date.

Keywords

References

  1. Genome Biol. 2015 Mar 25;16:57 [PMID: 25854118]
  2. Front Genet. 2014 May 13;5:126 [PMID: 24860595]
  3. Nat Biotechnol. 2016 May;34(5):525-7 [PMID: 27043002]
  4. Science. 2014 Jun 6;344(6188):1168-1173 [PMID: 24904168]
  5. Nat Rev Genet. 2019 Mar;20(3):135-156 [PMID: 30514919]
  6. BMC Bioinformatics. 2019 Oct 4;20(1):487 [PMID: 31585526]
  7. Bioinformatics. 2018 Feb 1;34(3):381-387 [PMID: 28968643]
  8. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  9. Gene. 2005 May 9;350(2):129-36 [PMID: 15784181]
  10. PLoS One. 2013 Dec 06;8(12):e81148 [PMID: 24324667]
  11. Genes Dev. 2018 Jan 1;32(1):1-3 [PMID: 29440223]
  12. Front Genet. 2020 Oct 23;11:580580 [PMID: 33193703]
  13. Nature. 2014 Mar 27;507(7493):462-70 [PMID: 24670764]
  14. Nat Commun. 2019 Jan 16;10(1):260 [PMID: 30651564]
  15. Genome Res. 2008 Dec;18(12):1969-78 [PMID: 18971312]
  16. F1000Res. 2019 Jun 18;8:886 [PMID: 31327999]
  17. PLoS Biol. 2017 Sep 5;15(9):e2002887 [PMID: 28873399]
  18. PLoS Comput Biol. 2013;9(8):e1003118 [PMID: 23950696]
  19. J Anim Sci Biotechnol. 2018 Dec 04;9:85 [PMID: 30524725]
  20. Bioinformatics. 2010 Sep 1;26(17):2204-7 [PMID: 20639541]
  21. Genome Biol. 2018 Nov 26;19(1):204 [PMID: 30477539]
  22. BMC Biol. 2019 Dec 30;17(1):108 [PMID: 31884969]
  23. Front Genet. 2020 May 13;11:424 [PMID: 32477401]
  24. Nature. 2007 Jun 14;447(7146):799-816 [PMID: 17571346]
  25. Nature. 2014 Mar 27;507(7493):455-461 [PMID: 24670763]
  26. PLoS One. 2013 Apr 29;8(4):e62856 [PMID: 23638157]
  27. Genome Res. 2020 Jul;30(7):1073-1081 [PMID: 32079618]
  28. PLoS Genet. 2017 Sep 15;13(9):e1006997 [PMID: 28915238]
  29. Annu Rev Anim Biosci. 2019 Feb 15;7:65-88 [PMID: 30427726]
  30. Sci Data. 2017 Oct 03;4:170147 [PMID: 28972578]
  31. BMC Bioinformatics. 2015 Jan 28;16:24 [PMID: 25627334]
  32. Front Genet. 2019 May 16;10:327 [PMID: 31156693]
  33. Front Genet. 2015 Sep 28;6:302 [PMID: 26442116]
  34. Trends Genet. 2015 May;31(5):274-80 [PMID: 25837375]
  35. FEBS Lett. 2019 Apr;593(7):670-679 [PMID: 30810230]
  36. BMC Genomics. 2019 May 7;20(1):344 [PMID: 31064321]
  37. Nat Genet. 2017 Mar 30;49(4):485-486 [PMID: 28358125]
  38. Nat Methods. 2015 Mar;12(3):230-2, 1 p following 232 [PMID: 25362363]
  39. Nat Commun. 2019 Jan 21;10(1):360 [PMID: 30664627]
  40. Clin Immunol Immunopathol. 1990 Jun;55(3):355-67 [PMID: 2160344]
  41. J Anim Sci Biotechnol. 2016 Feb 19;7:10 [PMID: 26900466]
  42. Methods Mol Biol. 2012;786:181-200 [PMID: 21938627]
  43. Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]
  44. Nat Commun. 2018 Feb 28;9(1):859 [PMID: 29491421]
  45. Nat Rev Genet. 2012 May 29;13(7):484-92 [PMID: 22641018]
  46. BMC Bioinformatics. 2007 Mar 30;8:111 [PMID: 17397530]
  47. Nucleic Acids Res. 2020 Jan 8;48(D1):D77-D83 [PMID: 31665515]
  48. Nat Genet. 2018 Apr;50(4):621-629 [PMID: 29632380]
  49. Methods Mol Biol. 2016;1418:335-51 [PMID: 27008022]
  50. Nat Genet. 2017 Apr;49(4):643-650 [PMID: 28263316]
  51. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  52. Anim Genet. 2018 Dec;49(6):520-526 [PMID: 30311252]
  53. PLoS Genet. 2017 Mar 6;13(3):e1006641 [PMID: 28263993]

Grants

  1. /Wellcome Trust

Word Cloud

Created with Highcharts 10.0.0TSSCAGEannotatedannotationreferenceanalysistissuespreviouslyNCBIpromoterregionsOvineFAANGprovidenewsheepMappingtranscriptregulationdiversityUsing56samplesglobalclustersusingAnalysisExpressionsequencingexpression5'withinresolutiontagsacrossdatatranscriptsbppredicted"novel"ovineoverallaimprojectcomprehensivehighlycontiguousgenomesequencetranscriptionstartsiteskeyfirststepunderstandingtissuecollectedeweBenz2616performedTSS-EnhancerCapGenemeasuresRNAcap-trappingspecificallydesignedallowcharacterizationpromoterssingle-nucleotideadaptedpipelineusesTagDust2clean-uptrimmingBowtie2mappingCAGEfightRclusteringIntegrativeGenomicsViewerIGVvisualizationindicatedlevelstagvariedprofilesvalidatedcorrespondingpolyA+mRNA-Seqremoval<10readcounts393%overlappedends31113total30825195meetingstringentcriteriaidentified147%mapped50Intersecting±50revealed46%un-annotatedwhole-genomebisulfiteabledetermineproportionhypo-methylated322%indicatinglikelyreproduciblerather"noise"willsignificantlyenhancegenemodelsassemblyanalysesonehighestannotationslivestockspeciesdateGlobalTranscriptionStartSitesNewReferenceGenomeWGBSenhancertranscriptome

Similar Articles

Cited By