Statistical approaches for the analysis of DNA methylation microarray data.

Kimberly D Siegmund
Author Information
  1. Kimberly D Siegmund: Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, CA 90089, USA. kims@usc.edu

Abstract

Following the rapid development and adoption in DNA methylation microarray assays, we are now experiencing a growth in the number of statistical tools to analyze the resulting large-scale data sets. As is the case for other microarray applications, biases caused by technical issues are of concern. Some of these issues are old (e.g., two-color dye bias and probe- and array-specific effects), while others are new (e.g., fragment length bias and bisulfite conversion efficiency). Here, I highlight characteristics of DNA methylation that suggest standard statistical tools developed for other data types may not be directly suitable. I then describe the microarray technologies most commonly in use, along with the methods used for preprocessing and obtaining a summary measure. I finish with a section describing downstream analyses of the data, focusing on methods that model percentage DNA methylation as the outcome, and methods for integrating DNA methylation with gene expression or genotype data.

References

  1. Genome Res. 2008 May;18(5):780-90 [PMID: 18316654]
  2. Nat Rev Genet. 2010 Mar;11(3):191-203 [PMID: 20125086]
  3. Cancer Cell. 2010 May 18;17(5):510-22 [PMID: 20399149]
  4. Nucleic Acids Res. 2002 Feb 15;30(4):e15 [PMID: 11842121]
  5. Genome Biol. 2007;8(8):R178 [PMID: 17727723]
  6. Bioinformatics. 2010 Nov 15;26(22):2849-55 [PMID: 20880956]
  7. BMC Bioinformatics. 2008 Feb 06;9:85 [PMID: 18254947]
  8. Genes Dev. 2002 Jan 1;16(1):6-21 [PMID: 11782440]
  9. J Cell Biochem. 2010 Mar 1;109(4):818-27 [PMID: 20069569]
  10. Am J Hum Genet. 2010 Mar 12;86(3):411-9 [PMID: 20215007]
  11. Bioinformatics. 2008 May 1;24(9):1161-7 [PMID: 18353789]
  12. Bioinformatics. 2010 Oct 15;26(20):2578-85 [PMID: 20834038]
  13. Genome Res. 2010 Dec;20(12):1719-29 [PMID: 21045081]
  14. Biostatistics. 2009 Apr;10(2):352-63 [PMID: 19068485]
  15. Carcinogenesis. 2006 Dec;27(12):2409-23 [PMID: 16952911]
  16. PLoS Biol. 2010 Nov 09;8(11):e1000533 [PMID: 21085693]
  17. Nat Genet. 2005 Aug;37(8):853-62 [PMID: 16007088]
  18. Biostatistics. 2011 Apr;12(2):197-210 [PMID: 20858772]
  19. Stat Appl Genet Mol Biol. 2009;8:Article27 [PMID: 19572826]
  20. Biostatistics. 2003 Apr;4(2):249-64 [PMID: 12925520]
  21. BMC Bioinformatics. 2009 Dec 09;10:404 [PMID: 20003206]
  22. Nat Biotechnol. 2010 Oct;28(10):1106-14 [PMID: 20852634]
  23. Stat Methods Med Res. 2009 Oct;18(5):437-52 [PMID: 19153169]
  24. Nat Biotechnol. 2010 Oct;28(10):1097-105 [PMID: 20852635]
  25. Carcinogenesis. 2009 Mar;30(3):416-22 [PMID: 19126652]
  26. Bioinformatics. 2011 Mar 1;27(5):633-40 [PMID: 21169374]
  27. Nature. 2008 Aug 7;454(7205):711-5 [PMID: 18685699]
  28. Nat Biotechnol. 2010 Oct;28(10):1057-68 [PMID: 20944598]
  29. PLoS One. 2009 Dec 18;4(12):e8274 [PMID: 20019873]
  30. Nucleic Acids Res. 2010 Dec;38(22):e204 [PMID: 20929874]
  31. Cancer Res. 2006 Aug 15;66(16):7939-47 [PMID: 16912168]
  32. Stat Appl Genet Mol Biol. 2009;8:Article 1 [PMID: 19222376]
  33. Cell. 2007 Feb 23;128(4):683-92 [PMID: 17320506]
  34. Cancer Res. 2010 Oct 15;70(20):8169-78 [PMID: 20841482]
  35. Genome Res. 2009 Sep;19(9):1639-45 [PMID: 19541911]
  36. Genome Biol. 2011;12(1):R10 [PMID: 21251332]
  37. Nat Rev Cancer. 2003 Apr;3(4):253-66 [PMID: 12671664]
  38. Cell Growth Differ. 2002 Apr;13(4):149-62 [PMID: 11971815]
  39. Nature. 2009 Nov 19;462(7271):315-22 [PMID: 19829295]
  40. Nature. 2010 Sep 16;467(7313):338-42 [PMID: 20720541]
  41. Bioinformatics. 2009 Nov 15;25(22):2906-12 [PMID: 19759197]
  42. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  43. Stat Appl Genet Mol Biol. 2009;8:Article28 [PMID: 19572827]
  44. BMC Bioinformatics. 2008 Oct 23;9:453 [PMID: 18947421]
  45. BMC Bioinformatics. 2010 Nov 30;11:587 [PMID: 21118553]
  46. Comput Stat Data Anal. 2009 Mar 15;53(5):1701-1710 [PMID: 20161265]
  47. Genome Res. 2006 Mar;16(3):383-93 [PMID: 16449502]
  48. Bioinformatics. 2007 Aug 15;23(16):2183-4 [PMID: 17586828]
  49. Nat Biotechnol. 2010 Oct;28(10):1069-78 [PMID: 20944599]
  50. J Comput Biol. 2010 Oct;17(10):1385-95 [PMID: 20976876]
  51. BMC Bioinformatics. 2010 Jun 04;11:305 [PMID: 20525369]
  52. Nat Biotechnol. 2008 Jul;26(7):779-85 [PMID: 18612301]
  53. Bioinformatics. 2010 Jul 1;26(13):1662-3 [PMID: 20457667]
  54. BMC Bioinformatics. 2010 Nov 23;11:572 [PMID: 21092284]
  55. Bioinformatics. 2009 Mar 15;25(6):751-7 [PMID: 19193732]
  56. Proc Natl Acad Sci U S A. 2006 Aug 15;103(33):12457-62 [PMID: 16895995]
  57. Bioinformatics. 2008 Jul 1;24(13):1547-8 [PMID: 18467348]
  58. BMC Proc. 2007;1 Suppl 1:S119 [PMID: 18466460]
  59. Genome Res. 2010 Oct;20(10):1441-50 [PMID: 20802089]
  60. Ann Hum Genet. 2004 May;68(Pt 3):196-204 [PMID: 15180700]
  61. BMC Bioinformatics. 2008 Sep 09;9:365 [PMID: 18782434]
  62. Nat Rev Genet. 2010 Oct;11(10):733-9 [PMID: 20838408]
  63. Nat Genet. 2006 Dec;38(12):1378-85 [PMID: 17072317]
  64. Genome Res. 2008 Oct;18(10):1652-9 [PMID: 18765822]
  65. BMC Med Genomics. 2010 Nov 25;3:55 [PMID: 21108837]
  66. Nucleic Acids Res. 2009 Jul;37(12):3829-39 [PMID: 19386619]
  67. Hum Mol Genet. 2010 Oct 15;19(R2):R210-20 [PMID: 20855472]

Grants

  1. P30 ES007048/NIEHS NIH HHS
  2. R01 CA097346/NCI NIH HHS
  3. R01 CA097346-06/NCI NIH HHS
  4. P30 ES07048/NIEHS NIH HHS

MeSH Term

DNA Methylation
Humans
Oligonucleotide Array Sequence Analysis

Word Cloud

Created with Highcharts 10.0.0DNAmethylationdatamicroarraymethodsstatisticaltoolsissuesegbiasFollowingrapiddevelopmentadoptionassaysnowexperiencinggrowthnumberanalyzeresultinglarge-scalesetscaseapplicationsbiasescausedtechnicalconcernoldtwo-colordyeprobe-array-specificeffectsothersnewfragmentlengthbisulfiteconversionefficiencyhighlightcharacteristicssuggeststandarddevelopedtypesmaydirectlysuitabledescribetechnologiescommonlyusealongusedpreprocessingobtainingsummarymeasurefinishsectiondescribingdownstreamanalysesfocusingmodelpercentageoutcomeintegratinggeneexpressiongenotypeStatisticalapproachesanalysis

Similar Articles

Cited By