RNAIndel: discovering somatic coding indels from tumor RNA-Seq data.

Kohei Hagiwara, Liang Ding, Michael N Edmonson, Stephen V Rice, Scott Newman, John Easton, Juncheng Dai, Soheil Meshinchi, Rhonda E Ries, Michael Rusch, Jinghui Zhang
Author Information
  1. Kohei Hagiwara: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  2. Liang Ding: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  3. Michael N Edmonson: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  4. Stephen V Rice: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  5. Scott Newman: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  6. John Easton: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  7. Juncheng Dai: Department of Epidemiology, Nanjing Medical University School of Public Health, Jiangning District, Nanjing, 211166, People's Republic of China.
  8. Soheil Meshinchi: Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  9. Rhonda E Ries: Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
  10. Michael Rusch: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
  11. Jinghui Zhang: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.

Abstract

MOTIVATION: Reliable identification of expressed somatic insertions/deletions (indels) is an unmet need due to artifacts generated in PCR-based RNA-Seq library preparation and the lack of normal RNA-Seq data, presenting analytical challenges for discovery of somatic indels in tumor transcriptome.
RESULTS: We present RNAIndel, a tool for predicting somatic, germline and artifact indels from tumor RNA-Seq data. RNAIndel leverages features derived from indel sequence context and biological effect in a machine-learning framework. Except for tumor samples with microsatellite instability, RNAIndel robustly predicts 88-100% of somatic indels in five diverse test datasets of pediatric and adult cancers, even recovering subclonal (VAF range 0.01-0.15) driver indels missed by targeted deep-sequencing, outperforming the current best-practice for RNA-Seq variant calling which had 57% sensitivity but with 14 times more false positives.
AVAILABILITY AND IMPLEMENTATION: RNAIndel is freely available at https://github.com/stjude/RNAIndel.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. Nucleic Acids Res. 2003 Sep 15;31(18):5338-48 [PMID: 12954770]
  2. PLoS One. 2010 Aug 26;5(8):e12433 [PMID: 20865157]
  3. Nucleic Acids Res. 2001 Jan 1;29(1):308-11 [PMID: 11125122]
  4. Genome Med. 2014 Oct 28;6(10):89 [PMID: 25426171]
  5. Bioinformatics. 2015 Jul 1;31(13):2202-4 [PMID: 25701572]
  6. Nature. 2018 Mar 15;555(7696):371-376 [PMID: 29489755]
  7. Nat Genet. 2011 May;43(5):491-8 [PMID: 21478889]
  8. Brief Bioinform. 2017 Nov 1;18(6):973-983 [PMID: 27473065]
  9. Cell. 2016 Jun 2;165(6):1319-1322 [PMID: 27259145]
  10. Bioinformatics. 2013 Jan 1;29(1):15-21 [PMID: 23104886]
  11. Nature. 2012 Jul 18;487(7407):330-7 [PMID: 22810696]
  12. PLoS Genet. 2008 Aug 15;4(8):e1000160 [PMID: 18704161]
  13. Nucleic Acids Res. 2019 Jan 8;47(D1):D941-D947 [PMID: 30371878]
  14. Nat Med. 2016 Jan;22(1):97-104 [PMID: 26657142]
  15. Am J Hum Genet. 2002 Oct;71(4):854-62 [PMID: 12205564]
  16. Nat Biotechnol. 2013 Mar;31(3):213-9 [PMID: 23396013]
  17. Genome Res. 2019 Sep;29(9):1555-1565 [PMID: 31439692]
  18. Nat Commun. 2018 Sep 27;9(1):3962 [PMID: 30262806]
  19. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4 [PMID: 15608248]
  20. Wellcome Open Res. 2017 Jan 17;2:6 [PMID: 28239666]
  21. Bioinformatics. 2011 Aug 1;27(15):2156-8 [PMID: 21653522]
  22. Trends Biochem Sci. 2006 Apr;31(4):206-14 [PMID: 16545956]
  23. Genome Res. 2002 Apr;12(4):656-64 [PMID: 11932250]
  24. Nat Med. 2018 Jan;24(1):103-112 [PMID: 29227476]
  25. Nucleic Acids Res. 2014 Jan;42(Database issue):D980-5 [PMID: 24234437]
  26. Bioinformatics. 2011 Mar 15;27(6):865-6 [PMID: 21278191]
  27. Bioinformatics. 2014 Dec 1;30(23):3414-6 [PMID: 25170027]
  28. Nat Commun. 2017 Jun 06;8:15180 [PMID: 28585546]
  29. Nat Commun. 2018 May 24;9(1):2054 [PMID: 29799009]
  30. Bioinformatics. 2012 Jul 15;28(14):1811-7 [PMID: 22581179]
  31. Nucleic Acids Res. 2014 Dec 16;42(22):e172 [PMID: 25352556]
  32. Am J Hum Genet. 2013 Oct 3;93(4):641-51 [PMID: 24075185]

Grants

  1. P50 GM115279/NIGMS NIH HHS

MeSH Term

Child
High-Throughput Nucleotide Sequencing
Humans
INDEL Mutation
Neoplasms
RNA-Seq
Software
Exome Sequencing

Word Cloud

Created with Highcharts 10.0.0indelssomaticRNA-SeqdatatumorRNAIndelavailableMOTIVATION:Reliableidentificationexpressedinsertions/deletionsunmetneeddueartifactsgeneratedPCR-basedlibrarypreparationlacknormalpresentinganalyticalchallengesdiscoverytranscriptomeRESULTS:presenttoolpredictinggermlineartifactleveragesfeaturesderivedindelsequencecontextbiologicaleffectmachine-learningframeworkExceptsamplesmicrosatelliteinstabilityrobustlypredicts88-100%fivediversetestdatasetspediatricadultcancersevenrecoveringsubclonalVAFrange001-015drivermissedtargeteddeep-sequencingoutperformingcurrentbest-practicevariantcalling57%sensitivity14timesfalsepositivesAVAILABILITYANDIMPLEMENTATION:freelyhttps://githubcom/stjude/RNAIndelSUPPLEMENTARYINFORMATION:SupplementaryBioinformaticsonlineRNAIndel:discoveringcoding

Similar Articles

Cited By