Identification of transcription factor binding sites from ChIP-seq data at high resolution.

Anaïs F Bardet, Jonas Steinmann, Sangeeta Bafna, Juergen A Knoblich, Julia Zeitlinger, Alexander Stark
Author Information
  1. Anaïs F Bardet: Research Institute of Molecular Pathology (IMP), Institute of Molecular Biotechnology (IMBA), Vienna, Austria and Stowers Institute for Medical Research, Kansas City, MO, USA.

Abstract

MOTIVATION: Chromatin immunoprecipitation coupled to next-generation sequencing (ChIP-seq) is widely used to study the in vivo binding sites of transcription factors (TFs) and their regulatory targets. Recent improvements to ChIP-seq, such as increased resolution, promise deeper insights into transcriptional regulation, yet require novel computational tools to fully leverage their advantages.
RESULTS: To this aim, we have developed peakzilla, which can identify closely spaced TF binding sites at high resolution (i.e. resolves individual binding sites even if spaced closely), as we demonstrate using semisynthetic datasets, performing ChIP-seq for the TF Twist in Drosophila embryos with different experimental fragment sizes, and analyzing ChIP-exo datasets. We show that the increased resolution reached by peakzilla is highly relevant, as closely spaced Twist binding sites are strongly enriched in transcriptional enhancers, suggesting a signature to discriminate functional from abundant non-functional or neutral TF binding. Peakzilla is easy to use, as it estimates all the necessary parameters from the data and is freely available.
AVAILABILITY AND IMPLEMENTATION: The peakzilla program is available from https://github.com/steinmann/peakzilla or http://www.starklab.org/data/peakzilla/.
CONTACT: stark@starklab.org.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

References

  1. PLoS Biol. 2004 Sep;2(9):E271 [PMID: 15340490]
  2. Nat Protoc. 2011 Dec 15;7(1):45-61 [PMID: 22179591]
  3. Nat Biotechnol. 2008 Nov;26(11):1293-300 [PMID: 18978777]
  4. PLoS Biol. 2010 Mar 23;8(3):e1000343 [PMID: 20351773]
  5. Nature. 2001 Jan 25;409(6819):533-8 [PMID: 11206552]
  6. Genome Res. 2012 Apr;22(4):656-65 [PMID: 22247430]
  7. Nat Biotechnol. 2008 Dec;26(12):1351-9 [PMID: 19029915]
  8. Science. 2000 Dec 22;290(5500):2306-9 [PMID: 11125145]
  9. Nature. 2011 Mar 24;471(7339):480-5 [PMID: 21179089]
  10. BMC Bioinformatics. 2011 May 09;12:139 [PMID: 21554709]
  11. Genome Biol. 2008;9(9):R137 [PMID: 18798982]
  12. Genome Res. 2009 Jan;19(1):24-32 [PMID: 19056695]
  13. Genome Res. 2010 May;20(5):565-77 [PMID: 20363979]
  14. Nat Methods. 2008 Sep;5(9):829-34 [PMID: 19160518]
  15. PLoS Genet. 2010 Feb 19;6(2):e1000848 [PMID: 20174564]
  16. Cell. 2011 Dec 9;147(6):1408-19 [PMID: 22153082]
  17. Science. 2010 May 21;328(5981):1036-40 [PMID: 20378774]
  18. Bioinformatics. 1998;14(1):48-54 [PMID: 9520501]
  19. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4 [PMID: 14681366]
  20. Genome Res. 2003 Apr;13(4):579-88 [PMID: 12670999]
  21. Science. 2010 Dec 24;330(6012):1787-97 [PMID: 21177974]
  22. Nat Genet. 2012 Jan 08;44(2):148-56 [PMID: 22231485]
  23. PLoS Comput Biol. 2012;8(8):e1002638 [PMID: 22912568]
  24. Nucleic Acids Res. 2008 Sep;36(16):5221-31 [PMID: 18684996]
  25. Trends Genet. 2013 Jan;29(1):11-22 [PMID: 23102583]
  26. Nature. 2009 Jun 18;459(7249):927-30 [PMID: 19536255]
  27. Nucleic Acids Res. 2010 Jun;38(11):e126 [PMID: 20375099]
  28. Nature. 2010 Apr 22;464(7292):1187-91 [PMID: 20237471]
  29. Nat Genet. 2011 May;43(5):414-20 [PMID: 21478888]
  30. Nat Methods. 2009 Nov;6(11 Suppl):S22-32 [PMID: 19844228]
  31. PLoS Biol. 2008 Feb;6(2):e27 [PMID: 18271625]
  32. Nat Methods. 2007 Aug;4(8):651-7 [PMID: 17558387]
  33. Theor Biol Med Model. 2010 Jun 03;7:18 [PMID: 20525272]
  34. Science. 2010 Apr 9;328(5975):232-5 [PMID: 20299548]
  35. PLoS One. 2010 Jul 08;5(7):e11471 [PMID: 20628599]
  36. Science. 2007 Jun 8;316(5830):1497-502 [PMID: 17540862]
  37. Nat Methods. 2012 Jun;9(6):609-14 [PMID: 22522655]
  38. Proc Natl Acad Sci U S A. 2002 Jan 22;99(2):757-62 [PMID: 11805330]
  39. Genome Res. 2012 Oct;22(10):2018-30 [PMID: 22534400]
  40. Bioinformatics. 2010 Dec 15;26(24):3028-34 [PMID: 20966006]
  41. Genome Biol. 2012 Aug 13;13(8):418 [PMID: 22889292]
  42. Science. 2004 Oct 22;306(5696):636-40 [PMID: 15499007]
  43. Genes Dev. 2012 May 1;26(9):908-13 [PMID: 22499593]

Grants

  1. DP2 OD004561/NIH HHS
  2. Z 153/Austrian Science Fund FWF

MeSH Term

Algorithms
Animals
Binding Sites
Chromatin Immunoprecipitation
Drosophila
Drosophila Proteins
Enhancer Elements, Genetic
High-Throughput Nucleotide Sequencing
Humans
Mice
Sequence Analysis, DNA
Transcription Factors
Twist-Related Protein 1

Chemicals

Drosophila Proteins
Transcription Factors
Twi protein, Drosophila
Twist-Related Protein 1

Word Cloud

Created with Highcharts 10.0.0bindingsitesChIP-seqresolutionpeakzillacloselyspacedTFdataavailabletranscriptionincreasedtranscriptionalhighdatasetsTwistMOTIVATION:Chromatinimmunoprecipitationcouplednext-generationsequencingwidelyusedstudyvivofactorsTFsregulatorytargetsRecentimprovementspromisedeeperinsightsregulationyetrequirenovelcomputationaltoolsfullyleverageadvantagesRESULTS:aimdevelopedcanidentifyieresolvesindividualevendemonstrateusingsemisyntheticperformingDrosophilaembryosdifferentexperimentalfragmentsizesanalyzingChIP-exoshowreachedhighlyrelevantstronglyenrichedenhancerssuggestingsignaturediscriminatefunctionalabundantnon-functionalneutralPeakzillaeasyuseestimatesnecessaryparametersfreelyAVAILABILITYANDIMPLEMENTATION:programhttps://githubcom/steinmann/peakzillahttp://wwwstarklaborg/data/peakzilla/CONTACT:stark@starklaborgSUPPLEMENTARYINFORMATION:SupplementaryBioinformaticsonlineIdentificationfactor

Similar Articles

Cited By