High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis.

Weitong Cui, Huaru Xue, Lei Wei, Jinghua Jin, Xuewen Tian, Qinglu Wang
Author Information
  1. Weitong Cui: Key Laboratory of Biomedical Engineering & Technology of Shandong High School, Qilu Medical University, Zibo, 255300, China.
  2. Huaru Xue: Key Laboratory of Biomedical Engineering & Technology of Shandong High School, Qilu Medical University, Zibo, 255300, China.
  3. Lei Wei: Key Laboratory of Biomedical Engineering & Technology of Shandong High School, Qilu Medical University, Zibo, 255300, China.
  4. Jinghua Jin: Environmental Protection Research Institute of Light Industry, Beijing, 100089, China.
  5. Xuewen Tian: Shandong Sport University, Jinan, 250102, China.
  6. Qinglu Wang: Key Laboratory of Biomedical Engineering & Technology of Shandong High School, Qilu Medical University, Zibo, 255300, China. wql_zcq@126.com.

Abstract

BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible.
RESULTS: Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis.
CONCLUSIONS: High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

Keywords

References

  1. Genome Res. 2011 Dec;21(12):2213-23 [PMID: 21903743]
  2. Nature. 2012 Mar 28;483(7391):531-3 [PMID: 22460880]
  3. Front Plant Sci. 2018 Feb 14;9:108 [PMID: 29491871]
  4. Biochim Biophys Acta. 2010 Jan;1805(1):105-17 [PMID: 19931353]
  5. Mol Oncol. 2014 Sep 12;8(6):1095-111 [PMID: 25087573]
  6. PeerJ. 2014 Sep 23;2:e576 [PMID: 25337456]
  7. Nature. 2016 May 25;533(7604):452-4 [PMID: 27225100]
  8. BMC Genomics. 2020 Jan 28;21(1):75 [PMID: 31992223]
  9. BMC Bioinformatics. 2013 Mar 09;14:91 [PMID: 23497356]
  10. BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):575 [PMID: 29297307]
  11. RNA. 2016 Oct;22(10):1641 [PMID: 27638913]
  12. Genome Biol. 2013;14(9):R95 [PMID: 24020486]
  13. Genes Brain Behav. 2013 Feb;12(1):1-12 [PMID: 23194347]
  14. Commun Integr Biol. 2013 Nov 1;6(6):e25849 [PMID: 26442135]
  15. PLoS One. 2014 Jan 29;9(1):e87782 [PMID: 24489963]
  16. Nat Biotechnol. 2011 Jul 11;29(7):572-3 [PMID: 21747377]
  17. BMC Bioinformatics. 2010 Feb 18;11:94 [PMID: 20167110]
  18. PLoS One. 2017 Jan 25;12(1):e0170632 [PMID: 28122052]
  19. Cancers (Basel). 2019 Jul 27;11(8): [PMID: 31357599]
  20. Semin Cancer Biol. 2013 Aug;23(4):279-85 [PMID: 23791722]
  21. RNA. 2016 Jun;22(6):839-51 [PMID: 27022035]
  22. JCO Precis Oncol. 2018;2018: [PMID: 31058252]
  23. Oncol Rep. 2017 Jun;37(6):3543-3553 [PMID: 28498428]
  24. Front Genet. 2016 Sep 16;7:164 [PMID: 27695478]
  25. Nat Rev Genet. 2019 Nov;20(11):631-656 [PMID: 31341269]
  26. Bioinformatics. 2012 Nov 1;28(21):2782-8 [PMID: 22923299]
  27. Genome Biol. 2010;11(3):R25 [PMID: 20196867]
  28. Genome Res. 2008 Sep;18(9):1509-17 [PMID: 18550803]
  29. Genome Biol. 2010;11(10):R106 [PMID: 20979621]
  30. Nat Rev Clin Oncol. 2018 Feb;15(2):81-94 [PMID: 29115304]
  31. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  32. Nat Rev Clin Oncol. 2011 Mar 30;8(4):189-90 [PMID: 21448176]
  33. Eur Urol. 2017 Sep;72(3):354-365 [PMID: 28365159]
  34. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  35. Contemp Oncol (Pozn). 2015;19(1A):A68-77 [PMID: 25691825]
  36. Eur Urol. 2017 Feb;71(2):183-192 [PMID: 27451135]
  37. Brief Bioinform. 2015 Jan;16(1):59-70 [PMID: 24300110]
  38. BMC Med Genet. 2019 Mar 29;20(1):54 [PMID: 30925905]
  39. Semin Cancer Biol. 2013 Aug;23(4):286-92 [PMID: 23792107]
  40. Nat Commun. 2018 Oct 8;9(1):4120 [PMID: 30297886]

MeSH Term

Gene Expression Profiling
Gene Expression Regulation, Neoplastic
Genetic Heterogeneity
Humans
Neoplasm Proteins
Neoplasms
RNA-Seq
Software
Transcriptome

Chemicals

Neoplasm Proteins

Word Cloud

Created with Highcharts 10.0.0expressionresultsDEreproducibilityvariationgeneheterogeneitysamplesRNA-SeqhighlevelsdifferentialbiologicalDEGssamplesizesalsoHighRNAsequencingcausedtumormaystudieddifferentiallyexpressedpoorexistslargeusedifferentoutlierdataanalysisgeneralizationBACKGROUND:widelyappliedoncologymonitoringtranscriptomechangesHoweveremergingproblemaffectrarelyinvestigatedgivennumberreplicates324exploredgreatmanygenesreproducibleRESULTS:findingsdemonstratesmallrelativelyQuitedetectedspecificrathergenuinelyconditionsPoormainlyEventhoughaccountmucheffectcountneedstreatedseriouslyseverelyinterfereCONCLUSIONS:tissuecancertypenormalleadsunderminingThereforenecessaryleast10possibleexperimentaldesignsreduceimpactvariabilityinterpretedcautiouslyunlesssoundlyvalidatedunderminesDifferentialHeterogeneityOutlierReproducibilityTumor

Similar Articles

Cited By