From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data.

Aaron T L Lun, Gordon K Smyth
Author Information
  1. Aaron T L Lun: The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Medical Biology, The University of Melbourne, Melbourne, Australia.
  2. Gordon K Smyth: The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia.

Abstract

Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify the genomic binding sites for protein of interest. Most conventional approaches to ChIP-seq data analysis involve the detection of the absolute presence (or absence) of a binding site. However, an alternative strategy is to identify changes in the binding intensity between two biological conditions, i.e., differential binding (DB). This may yield more relevant results than conventional analyses, as changes in binding can be associated with the biological difference being investigated. The aim of this article is to facilitate the implementation of DB analyses, by comprehensively describing a computational workflow for the detection of DB regions from ChIP-seq data. The workflow is based primarily on R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, from alignment of read sequences to interpretation and visualization of putative DB regions. In particular, detection of DB regions will be conducted using the counts for sliding windows from the csaw package, with statistical modelling performed using methods in the edgeR package. Analyses will be demonstrated on real histone mark and transcription factor data sets. This will provide readers with practical usage examples that can be applied in their own studies.

Keywords

References

  1. Nat Biotechnol. 2008 Dec;26(12):1351-9 [PMID: 19029915]
  2. EMBO J. 2012 Jun 05;31(14 ):3130-46 [PMID: 22669466]
  3. Nucleic Acids Res. 2013 May 1;41(10):e108 [PMID: 23558742]
  4. Bioinformatics. 2004 Nov 1;20(16):2778-86 [PMID: 15166021]
  5. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  6. Nucleic Acids Res. 2014 Jun;42(11):e95 [PMID: 24852250]
  7. Stat Appl Genet Mol Biol. 2012 Oct 22;11(5):null [PMID: 23104842]
  8. BMC Bioinformatics. 2010 May 11;11:237 [PMID: 20459804]
  9. Nature. 2012 Sep 6;489(7414):57-74 [PMID: 22955616]
  10. Nucleic Acids Res. 2014 Oct;42(18):11363-82 [PMID: 25249627]
  11. Bioinformatics. 2012 Jan 1;28(1):121-2 [PMID: 22057161]
  12. Genome Res. 2012 Sep;22(9):1813-31 [PMID: 22955991]
  13. J Immunol. 2003 May 15;170(10):5143-51 [PMID: 12734361]
  14. BMC Genomics. 2012 Aug 24;13:424 [PMID: 22920947]
  15. Genome Biol. 2010;11(3):R25 [PMID: 20196867]
  16. BMC Genomics. 2013 Nov 24;14:826 [PMID: 24267901]
  17. Genome Biol. 2008;9(9):R137 [PMID: 18798982]
  18. Nature. 2012 Jan 04;481(7381):389-93 [PMID: 22217937]
  19. BMC Bioinformatics. 2011 Jan 31;12:39 [PMID: 21281468]
  20. Nucleic Acids Res. 2002 Jan 1;30(1):207-10 [PMID: 11752295]
  21. Bioinformatics. 2009 Jul 15;25(14):1841-2 [PMID: 19468054]
  22. Nucleic Acids Res. 2012 May;40(10):4288-97 [PMID: 22287627]
  23. Genome Res. 2002 Jan;12(1):98-111 [PMID: 11779835]
  24. Nat Methods. 2015 Feb;12(2):115-21 [PMID: 25633503]
  25. Nucleic Acids Res. 2016 Mar 18;44(5):e45 [PMID: 26578583]
  26. Nucleic Acids Res. 2015 Jan;43(Database issue):D670-81 [PMID: 25428374]
  27. Cell Rep. 2013 Feb 21;3(2):411-26 [PMID: 23375371]

Word Cloud

Created with Highcharts 10.0.0bindingChIP-seqdataDBdetectionworkflowregionswillchangesbiologicaldifferentialanalysescanBioconductoralignmentreadpackageChromatinimmunoprecipitationmassivelyparallelsequencingis widelyusedidentifygenomicsitesproteininterestconventional approachesanalysisinvolveabsolute presenceabsencesiteHoweveralternativestrategyto identifyintensitytwoconditionsiemayyieldrelevantresultsconventional asassociateddifferencebeing investigatedaimarticlefacilitateimplementation bycomprehensivelydescribingcomputationalof DBbasedprimarilyRsoftware packagesopen-sourceprojectcoversstepsthe analysispipelinesequencesinterpretationvisualization ofputativeparticularconducted usingcountsslidingwindowscsawstatisticalmodelling performedusingmethodsedgeRAnalysesdemonstrated onrealhistonemarktranscriptionfactorsetsprovide readerspracticalusageexamplesappliedstudiesreadsregions:detectDifferentialbioinformaticsgenomicsvisualization

Similar Articles

Cited By