RNAIndel: discovering somatic coding indels from tumor RNA-Seq data.
Kohei Hagiwara, Liang Ding, Michael N Edmonson, Stephen V Rice, Scott Newman, John Easton, Juncheng Dai, Soheil Meshinchi, Rhonda E Ries, Michael Rusch, Jinghui Zhang
Author Information
Kohei Hagiwara: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Liang Ding: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Michael N Edmonson: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Stephen V Rice: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Scott Newman: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
John Easton: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Juncheng Dai: Department of Epidemiology, Nanjing Medical University School of Public Health, Jiangning District, Nanjing, 211166, People's Republic of China.
Soheil Meshinchi: Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Rhonda E Ries: Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Michael Rusch: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Jinghui Zhang: Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
MOTIVATION: Reliable identification of expressed somatic insertions/deletions (indels) is an unmet need due to artifacts generated in PCR-based RNA-Seq library preparation and the lack of normal RNA-Seq data, presenting analytical challenges for discovery of somatic indels in tumor transcriptome. RESULTS: We present RNAIndel, a tool for predicting somatic, germline and artifact indels from tumor RNA-Seq data. RNAIndel leverages features derived from indel sequence context and biological effect in a machine-learning framework. Except for tumor samples with microsatellite instability, RNAIndel robustly predicts 88-100% of somatic indels in five diverse test datasets of pediatric and adult cancers, even recovering subclonal (VAF range 0.01-0.15) driver indels missed by targeted deep-sequencing, outperforming the current best-practice for RNA-Seq variant calling which had 57% sensitivity but with 14 times more false positives. AVAILABILITY AND IMPLEMENTATION: RNAIndel is freely available at https://github.com/stjude/RNAIndel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.