csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows.

Aaron T L Lun, Gordon K Smyth
Author Information
  1. Aaron T L Lun: The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia.
  2. Gordon K Smyth: The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia smyth@wehi.edu.au.

Abstract

Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project.

References

  1. PLoS One. 2013 Jun 10;8(6):e65598 [PMID: 23762400]
  2. Genome Biol. 2014 Sep 04;15(9):451 [PMID: 25201068]
  3. PLoS Comput Biol. 2014 Mar 27;10(3):e1003501 [PMID: 24675637]
  4. Bioinformatics. 2004 Nov 1;20(16):2778-86 [PMID: 15166021]
  5. Methods Mol Biol. 2016;1418:391-416 [PMID: 27008025]
  6. Nucleic Acids Res. 2013 May 1;41(10):e108 [PMID: 23558742]
  7. Nucleic Acids Res. 2015 Apr 20;43(7):e47 [PMID: 25605792]
  8. Nat Biotechnol. 2008 Dec;26(12):1351-9 [PMID: 19029915]
  9. Cell Rep. 2013 Feb 21;3(2):411-26 [PMID: 23375371]
  10. Genome Biol. 2010;11(3):R25 [PMID: 20196867]
  11. Genome Biol. 2008;9(9):R137 [PMID: 18798982]
  12. Bioinformatics. 2009 Aug 1;25(15):1952-8 [PMID: 19505939]
  13. Bioinformatics. 2014 Sep 15;30(18):2568-75 [PMID: 24894502]
  14. Stat Appl Genet Mol Biol. 2012 Oct 22;11(5): [PMID: 23104842]
  15. Nat Methods. 2015 Feb;12(2):115-21 [PMID: 25633503]
  16. BMC Bioinformatics. 2015 Feb 22;16:60 [PMID: 25884684]
  17. Development. 2015 Apr 15;142(8):1458-69 [PMID: 25790853]
  18. Nat Commun. 2015 Sep 02;6:8152 [PMID: 26328764]
  19. Nucleic Acids Res. 2014 Jun;42(11):e95 [PMID: 24852250]
  20. Nature. 2012 Jan 04;481(7381):389-93 [PMID: 22217937]
  21. Nucleic Acids Res. 2012 May;40(10):4288-97 [PMID: 22287627]
  22. Genom Data. 2015 Jul 16;5:346-51 [PMID: 26484284]
  23. F1000Res. 2015 Oct 16;4:1080 [PMID: 26834993]
  24. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  25. Bioinformatics. 2015 Jun 15;31(12):1889-96 [PMID: 25682068]
  26. BMC Bioinformatics. 2008 Dec 05;9:523 [PMID: 19061503]
  27. Mol Cell. 2010 May 28;38(4):576-89 [PMID: 20513432]
  28. Nat Methods. 2007 Aug;4(8):651-7 [PMID: 17558387]
  29. Biostatistics. 2014 Jul;15(3):413-26 [PMID: 24398039]
  30. Bioinformatics. 2012 Jan 1;28(1):121-2 [PMID: 22057161]
  31. Bioinformatics. 2003 Jan 22;19(2):185-93 [PMID: 12538238]

MeSH Term

Animals
Binding Sites
Chromatin Immunoprecipitation
Datasets as Topic
Genome
Genomics
High-Throughput Nucleotide Sequencing
Histones
Mice
Mouse Embryonic Stem Cells
Protein Binding
Software
Transcription Factors

Chemicals

Histones
Transcription Factors

Word Cloud

Created with Highcharts 10.0.0bindingcsawpackagecanregionsdatasequencingChIP-seqidentifyproteingenomechangestreatmentdifferentialdenovogenomicexistingsoftwarewindowsBioconductorChromatinimmunoprecipitationmassivelyparallelwidelyusedsitestargetimportantscientificapplicationdifferentconditionsiedetectrevealpotentialmechanismsmaycontributeeffectprovidesframeworkdetectiondifferentiallybounduseswindow-basedstrategysummarizereadcountsacrossexploitsstatisticaltestsignificantdifferenceswindowFinallyclustersoutputcontrolsfalsediscoveryrateproperlydetectedhandlearbitrarilycomplexexperimentaldesignsinvolvingbiologicalreplicatesappliedtranscriptionfactorhistonemarkdatasetsgenerallytypemeasuringcoverageperformsfavorablymethodsDBanalysessimulatedrealimplementedRfreelyavailableopen-sourceprojectcsaw:analysisusingsliding

Similar Articles

Cited By