Gene set analysis methods: a systematic comparison.

Ravi Mathur, Daniel Rotroff, Jun Ma, Ali Shojaie, Alison Motsinger-Reif
Author Information
  1. Ravi Mathur: 1Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA.
  2. Daniel Rotroff: 1Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA.
  3. Jun Ma: 1Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA.
  4. Ali Shojaie: 3Department of Biostatistics, University of Washington, Seattle, WA USA.
  5. Alison Motsinger-Reif: 1Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA. ORCID

Abstract

BACKGROUND: Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope.
METHODS: In this study we present a semi-synthetic simulation study using real datasets in order to test and compare commonly used methods.
RESULTS: A software pipeline, Flexible Algorithm for Novel Gene set Simulation (FANGS) develops simulated data based on a prostate cancer dataset where the KRAS and TGF-β pathways were differentially expressed. The FANGS software is compatible with other datasets and pathways. Comparisons of gene set analysis methods are presented for Gene Set Enrichment Analysis (GSEA), Significance Analysis of Function and Expression (SAFE), sigPathway, and Correlation Adjusted Mean RAnk (CAMERA) methods. All gene set analysis methods are tested using gene sets from the MSigDB knowledge base. The false positive rate and power are estimated and presented for comparison. Recommendations are made for the utility of the default settings of methods and each method's sensitivity towards various effect sizes.
CONCLUSIONS: The results of this study provide empirical guidance to users of gene set analysis methods. The FANGS software is available for researchers for continued methods comparisons.

Keywords

References

  1. Biostatistics. 2009 Apr;10(2):352-63 [PMID: 19068485]
  2. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50 [PMID: 16199517]
  3. PLoS One. 2010 Sep 17;5(9):null [PMID: 20862301]
  4. Nucleic Acids Res. 2012 Sep 1;40(17):e133 [PMID: 22638577]
  5. Bioinformatics. 2007 Apr 15;23(8):980-7 [PMID: 17303618]
  6. Bioinformatics. 2009 Jan 1;25(1):75-82 [PMID: 18990722]
  7. Nat Chem Biol. 2016 Jul;12 (7):504-10 [PMID: 27159579]
  8. Nat Methods. 2008 Jul;5(7):621-8 [PMID: 18516045]
  9. PLoS Comput Biol. 2012;8(2):e1002375 [PMID: 22383865]
  10. Nature. 2005 Feb 17;433(7027):769-73 [PMID: 15685193]
  11. Bioinformatics. 2003 Jan 22;19(2):185-93 [PMID: 12538238]
  12. Cell. 2011 Mar 4;144(5):646-74 [PMID: 21376230]
  13. Nat Cell Biol. 2016 May;18(5):467-79 [PMID: 27088858]
  14. Bioinformatics. 2004 Feb 12;20(3):307-15 [PMID: 14960456]
  15. Am J Clin Nutr. 2015 Aug;102(2):433-43 [PMID: 26156741]
  16. Nucleic Acids Res. 2013 Apr;41(8):4378-91 [PMID: 23444143]
  17. Cancer Med. 2016 Aug;5(8):1962-72 [PMID: 27318801]
  18. BMC Genomics. 2016 Jan 13;17:50 [PMID: 26758761]
  19. Nucleic Acids Res. 2015 Apr 20;43(7):e47 [PMID: 25605792]
  20. Nucleic Acids Res. 2014 Jan;42(Database issue):D459-71 [PMID: 24225315]
  21. Cancer Lett. 2016 Feb 28;371(2):326-33 [PMID: 26679053]
  22. Science. 2006 Sep 29;313(5795):1929-35 [PMID: 17008526]
  23. Nature. 2014 Mar 27;507(7493):448-54 [PMID: 24670762]
  24. Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5 [PMID: 23193258]
  25. Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13544-9 [PMID: 16174746]
  26. Brief Bioinform. 2014 Jul;15(4):504-18 [PMID: 23413432]
  27. Bioinformatics. 2005 May 1;21(9):1943-9 [PMID: 15647293]
  28. J Cereb Blood Flow Metab. 2012 Jun;32(6):1061-72 [PMID: 22453632]
  29. Cancer Cell. 2009 Mar 3;15(3):171-83 [PMID: 19249676]
  30. Bioinformatics. 2010 Oct 1;26(19):2363-7 [PMID: 20688976]
  31. BMC Med Genomics. 2011 Oct 25;4:74 [PMID: 22027401]
  32. Brief Bioinform. 2012 May;13(3):281-91 [PMID: 21900207]
  33. Trends Genet. 2012 Jul;28(7):323-32 [PMID: 22480918]
  34. Cancer Epidemiol Biomarkers Prev. 2015 Jan;24(1):255-60 [PMID: 25371445]
  35. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  36. Bioinformatics. 2016 Oct 15;32(20):3165-3174 [PMID: 27357170]
  37. Nature. 2007 Apr 12;446(7137):758-64 [PMID: 17344859]
  38. BMC Bioinformatics. 2009 Feb 03;10:47 [PMID: 19192285]
  39. Nucleic Acids Res. 2003 Feb 15;31(4):e15 [PMID: 12582260]
  40. Nucleic Acids Res. 2014 Jan;42(Database issue):D199-205 [PMID: 24214961]
  41. Nat Commun. 2016 Jul 13;7:12222 [PMID: 27406316]
  42. Biostatistics. 2003 Apr;4(2):249-64 [PMID: 12925520]

Grants

  1. R01 HL110380/NHLBI NIH HHS
  2. K01 HL124050/NHLBI NIH HHS
  3. R01 CA161608/NCI NIH HHS
  4. P01 CA142538/NCI NIH HHS
  5. R01 GM114029/NIGMS NIH HHS

Word Cloud

Created with Highcharts 10.0.0setanalysismethodsgeneGenestudysoftwareFANGScomparisondatasetssystematicusingdatasetspathwayspresentedAnalysisBACKGROUND:valuabletoolsummarizehigh-dimensionalexpressiontermsbiologicallyrelevantactivearearesearchnumerousdevelopedDespitepopularitycomparativestudieslimitedscopeMETHODS:presentsemi-syntheticsimulationrealordertestcomparecommonlyusedRESULTS:pipelineFlexibleAlgorithmNovelSimulationdevelopssimulatedbasedprostatecancerdatasetKRASTGF-βdifferentiallyexpressedcompatibleComparisonsSetEnrichmentGSEASignificanceFunctionExpressionSAFEsigPathwayCorrelationAdjustedMeanRAnkCAMERAtestedMSigDBknowledgebasefalsepositiveratepowerestimatedRecommendationsmadeutilitydefaultsettingsmethod'ssensitivitytowardsvariouseffectsizesCONCLUSIONS:resultsprovideempiricalguidanceusersavailableresearcherscontinuedcomparisonsmethods:MethodsPathway

Similar Articles

Cited By