Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data.

Hirak Sarkar, Avi Srivastava, Héctor Corrada Bravo, Michael I Love, Rob Patro
Author Information
  1. Hirak Sarkar: Department of Computer Science, University of Maryland, College Park, MD 20742, USA.
  2. Avi Srivastava: Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
  3. Héctor Corrada Bravo: Department of Computer Science, University of Maryland, College Park, MD 20742, USA.
  4. Michael I Love: Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA.
  5. Rob Patro: Department of Computer Science, University of Maryland, College Park, MD 20742, USA.

Abstract

MOTIVATION: Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects.
RESULTS: We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result.
AVAILABILITY AND IMPLEMENTATION: Terminus is implemented in Rust, and is freely available and open source. It can be obtained from https://github.com/COMBINE-lab/Terminus.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. PLoS Genet. 2016 Dec 9;12(12):e1006464 [PMID: 27935948]
  2. Genome Biol. 2015 Sep 03;16:177 [PMID: 26335491]
  3. Bioinformatics. 2014 Mar 1;30(5):644-51 [PMID: 24130305]
  4. Genome Res. 2011 Feb;21(2):193-202 [PMID: 20921232]
  5. Bioinformatics. 2012 Jul 1;28(13):1721-8 [PMID: 22563066]
  6. Bioinformatics. 2014 Jan 15;30(2):180-8 [PMID: 24281695]
  7. BMC Genomics. 2014;15 Suppl 8:S2 [PMID: 25435284]
  8. Nat Biotechnol. 2016 Dec;34(12):1287-1291 [PMID: 27669167]
  9. Nature. 2013 Sep 26;501(7468):506-11 [PMID: 24037378]
  10. Bioinformatics. 2013 Jan 1;29(1):15-21 [PMID: 23104886]
  11. Nat Methods. 2015 Apr;12(4):357-60 [PMID: 25751142]
  12. Genome Biol. 2009;10(3):R25 [PMID: 19261174]
  13. Nat Methods. 2017 Jul;14(7):687-690 [PMID: 28581496]
  14. Genome Biol. 2011;12(2):R13 [PMID: 21310039]
  15. F1000Res. 2018 Jun 27;7:952 [PMID: 30356428]
  16. Nucleic Acids Res. 2019 Oct 10;47(18):e105 [PMID: 31372651]
  17. Bioinformatics. 2018 Jul 1;34(13):2177-2184 [PMID: 29444201]
  18. BMC Bioinformatics. 2011 Aug 04;12:323 [PMID: 21816040]
  19. Nat Biotechnol. 2014 May;32(5):462-4 [PMID: 24752080]
  20. Bioinformatics. 2017 Jul 15;33(14):i142-i151 [PMID: 28881996]
  21. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  22. Bioinformatics. 2015 Sep 1;31(17):2778-84 [PMID: 25926345]
  23. Nat Methods. 2017 Apr;14(4):417-419 [PMID: 28263959]

Grants

  1. R01 HG009937/NHGRI NIH HHS
  2. R01 MH118349/NIMH NIH HHS
  3. P01 CA142538/NCI NIH HHS
  4. R01 GM114267/NIGMS NIH HHS
  5. R24 MH114815/NIMH NIH HHS

MeSH Term

Algorithms
Gene Expression Profiling
RNA-Seq
Sequence Analysis, RNA
Software

Word Cloud

Created with Highcharts 10.0.0transcript-levelanalysisdatainferentialuncertaintyrobusttranscriptsRNA-seqapproachdata-drivengroupsYetevenoftenremainssupporttogetherconfidentlytranscriptionalallowingimplementedcanTerminusavailableMOTIVATION:AdvancessequencingtechnologyinferencealgorithmsdifferentialtestingmethodologyenabledinherentabundanceestimationamongaccurateapproachesmeanschallengeConverselygene-levelcommonunderstandingcoarsensresultinglevelgenesstronglyspecificeffectsRESULTS:introducenewgroupingexperimentbasedTranscriptssharelargenumbersambiguously-mappingfragmentscomplexpatternsabundancesestimatedtotaloutputgroupwillgreatlyreducedthusconfidentdownstreamtoolterminusmannersupportedderivinghighresultAVAILABILITYANDIMPLEMENTATION:Rustfreelyopensourceobtainedhttps://githubcom/COMBINE-lab/TerminusSUPPLEMENTARYINFORMATION:SupplementaryBioinformaticsonlineenablesdiscoverytranscript

Similar Articles

Cited By