A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes.

Naoki Nariai, Kaname Kojima, Takahiro Mimori, Yosuke Kawai, Masao Nagasaki
Author Information
  1. Naoki Nariai: Present address: Institute for Genomic Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, 92093, California, USA. nnariai@ucsd.edu.
  2. Kaname Kojima: Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan. kojima@megabank.tohoku.ac.jp.
  3. Takahiro Mimori: Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan. mimori@megabank.tohoku.ac.jp.
  4. Yosuke Kawai: Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan. kawai@megabank.tohoku.ac.jp.
  5. Masao Nagasaki: Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan. nagasaki@megabank.tohoku.ac.jp.

Abstract

BACKGROUND: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences.
RESULTS: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified.
CONCLUSIONS: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar .

References

  1. BMC Genomics. 2014 Oct 23;15:920 [PMID: 25339465]
  2. Genome Res. 2011 Oct;21(10):1728-37 [PMID: 21873452]
  3. Bioinformatics. 2009 Dec 15;25(24):3207-12 [PMID: 19808877]
  4. Genome Res. 2012 May;22(5):860-9 [PMID: 22300769]
  5. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  6. Proc Natl Acad Sci U S A. 2014 Jul 8;111(27):9869-74 [PMID: 24961374]
  7. Bioinformatics. 2013 Sep 15;29(18):2292-9 [PMID: 23821651]
  8. BMC Bioinformatics. 2011 Aug 04;12:323 [PMID: 21816040]
  9. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  10. Nature. 2003 Mar 20;422(6929):297-302 [PMID: 12646919]
  11. Bioinformatics. 2009 Apr 15;25(8):1026-32 [PMID: 19244387]
  12. BMC Genomics. 2014;15 Suppl 10:S5 [PMID: 25560536]
  13. Nat Biotechnol. 2010 May;28(5):511-5 [PMID: 20436464]
  14. Nucleic Acids Res. 2012 Sep;40(16):e127 [PMID: 22584625]
  15. Mol Syst Biol. 2011 Aug 02;7:522 [PMID: 21811232]
  16. Bioinformatics. 2010 Feb 15;26(4):493-500 [PMID: 20022975]
  17. Oncogene. 2009 Sep 24;28(38):3345-8 [PMID: 19597467]
  18. Cell. 2012 Mar 16;148(6):1293-307 [PMID: 22424236]
  19. Trends Genet. 2004 Mar;20(3):113-6 [PMID: 15049300]
  20. Genome Biol. 2003;4(5):P3 [PMID: 12734009]
  21. Science. 2014 Mar 7;343(6175):1246949 [PMID: 24604202]
  22. Bioinformatics. 2012 Jul 1;28(13):1721-8 [PMID: 22563066]
  23. Science. 2010 Apr 9;328(5975):235-9 [PMID: 20299549]
  24. Genome Res. 2014 Mar;24(3):496-510 [PMID: 24299736]
  25. Nat Methods. 2015 Nov;12(11):1061-3 [PMID: 26366987]
  26. Hum Mol Genet. 2004 Oct 1;13 Spec No 2:R255-60 [PMID: 15358732]
  27. Nature. 1961 Apr 22;190:372-3 [PMID: 13764598]
  28. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  29. Genome Res. 1998 Mar;8(3):175-85 [PMID: 9521921]
  30. PLoS Genet. 2009 Jun;5(6):e1000529 [PMID: 19543373]
  31. Nucleic Acids Res. 2003 Jan 1;31(1):365-70 [PMID: 12520024]

MeSH Term

Algorithms
Alleles
Bayes Theorem
Cell Line, Tumor
Diploidy
Gene Expression Regulation
Genome, Human
Humans
Protein Isoforms
Proteins
RNA
Sequence Analysis, RNA

Chemicals

Protein Isoforms
Proteins
RNA

Word Cloud

Created with Highcharts 10.0.0expressionASEapproachdataRNA-Seqgenomeaccurateestimationallele-specificallelediploidsequencesBayesianreadsreferencehaploidinformationisoformlevelsanalysiseffectivenessproposedcomparedexistingshowenablesGM12878genesidentifiedmethodBACKGROUND:RNA-sequencingbecomepopulartooltranscriptomeprofilingmammalsHoweverbasedalignmentschallengingcontainsonemosaicEvenprecisealignmentcorrectdifficulthigh-similaritycorrespondingRESULTS:proposeestimatestatisticalframeworkchoicemodeledhiddenvariableestimatedsimultaneouslyvariationalinferencesimulationdemonstratetermsidentifyingalsobetterquantificationmethodsTIGAR2RSEMCufflinksrealhumanlymphoblastoidcelllineautosomalskewedpaternalX-chromosomeinactivationCONCLUSIONS:calledASE-TIGARgenemannerresultsutilizingpersonalgenomicimplementationavailablehttp://nagasakilabcsmlorg/ase-tigarestimatinggenomes

Similar Articles

Cited By