PolyaPeak: detecting transcription factor binding sites from ChIP-seq using peak shape information.

Hao Wu, Hongkai Ji
Author Information
  1. Hao Wu: Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America.
  2. Hongkai Ji: Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America.

Abstract

ChIP-seq is a powerful technology for detecting genomic regions where a protein of interest interacts with DNA. ChIP-seq data for mapping transcription factor binding sites (TFBSs) have a characteristic pattern: around each binding site, sequence reads aligned to the forward and reverse strands of the reference genome form two separate peaks shifted away from each other, and the true binding site is located in between these two peaks. While it has been shown previously that the accuracy and resolution of binding site detection can be improved by modeling the pattern, efficient methods are unavailable to fully utilize that information in TFBS detection procedure. We present PolyaPeak, a new method to improve TFBS detection by incorporating the peak shape information. PolyaPeak describes peak shapes using a flexible Pólya model. The shapes are automatically learnt from the data using Minorization-Maximization (MM) algorithm, then integrated with the read count information via a hierarchical model to distinguish true binding sites from background noises. Extensive real data analyses show that PolyaPeak is capable of robustly improving TFBS detection compared with existing methods. An R package is freely available.

References

  1. Nucleic Acids Res. 2011 Mar;39(4):e25 [PMID: 21113027]
  2. Nucleic Acids Res. 2008 Sep;36(16):5221-31 [PMID: 18684996]
  3. BMC Bioinformatics. 2009 Sep 21;10:299 [PMID: 19772557]
  4. J Am Stat Assoc. 2011;106(495):891-903 [PMID: 26478641]
  5. Nat Biotechnol. 2009 Jan;27(1):66-75 [PMID: 19122651]
  6. BMC Bioinformatics. 2010 Jul 02;11:369 [PMID: 20598134]
  7. Nucleic Acids Res. 2003 Jan 1;31(1):374-8 [PMID: 12520026]
  8. Cell. 2008 Jun 13;133(6):1106-17 [PMID: 18555785]
  9. J Comput Graph Stat. 2010 Sep 1;19(3):645-665 [PMID: 20877446]
  10. BMC Bioinformatics. 2011 Jan 12;12:15 [PMID: 21226895]
  11. Nat Methods. 2007 Aug;4(8):651-7 [PMID: 17558387]
  12. Bioinformatics. 2010 Dec 15;26(24):3028-34 [PMID: 20966006]
  13. Nat Methods. 2008 Sep;5(9):829-34 [PMID: 19160518]
  14. Nucleic Acids Res. 2007 Jan;35(Database issue):D663-7 [PMID: 17166863]
  15. Nat Biotechnol. 2008 Nov;26(11):1293-300 [PMID: 18978777]
  16. Biometrics. 2011 Mar;67(1):151-63 [PMID: 20528864]
  17. Nature. 2009 Jun 18;459(7249):927-30 [PMID: 19536255]
  18. BMC Genomics. 2009 Dec 18;10:618 [PMID: 20017957]
  19. Genome Biol. 2008;9(9):R137 [PMID: 18798982]
  20. PLoS One. 2010 Jul 08;5(7):e11471 [PMID: 20628599]
  21. Science. 2007 Jun 8;316(5830):1497-502 [PMID: 17540862]
  22. Bioinformatics. 2008 Aug 1;24(15):1729-30 [PMID: 18599518]
  23. Bioinformatics. 2011 May 15;27(10):1447-8 [PMID: 21450710]

Grants

  1. R01 HG006282/NHGRI NIH HHS
  2. R01HG006282/NHGRI NIH HHS

MeSH Term

Algorithms
Binding Sites
Chromatin Immunoprecipitation
Models, Theoretical
Transcription Factors

Chemicals

Transcription Factors

Word Cloud

Created with Highcharts 10.0.0bindingdetectioninformationChIP-seqdatasitessiteTFBSPolyaPeakpeakusingdetectingtranscriptionfactortwopeakstruemethodsshapeshapesmodelpowerfultechnologygenomicregionsproteininterestinteractsDNAmappingTFBSscharacteristicpattern:aroundsequencereadsalignedforwardreversestrandsreferencegenomeformseparateshiftedawaylocatedshownpreviouslyaccuracyresolutioncanimprovedmodelingpatternefficientunavailablefullyutilizeprocedurepresentnewmethodimproveincorporatingdescribesflexiblePólyaautomaticallylearntMinorization-MaximizationMMalgorithmintegratedreadcountviahierarchicaldistinguishbackgroundnoisesExtensiverealanalysesshowcapablerobustlyimprovingcomparedexistingRpackagefreelyavailablePolyaPeak:

Similar Articles

Cited By