An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq.

Eloi Mercier, Arnaud Droit, Leping Li, Gordon Robertson, Xuekui Zhang, Raphael Gottardo
Author Information
  1. Eloi Mercier: Computational Biology Unit, Institut de Recherche Clinique de Montreal, Montreal, Canada.

Abstract

ChIP-Seq has become the standard method for genome-wide profiling DNA association of transcription factors. To simplify analyzing and interpreting ChIP-Seq data, which typically involves using multiple applications, we describe an integrated, open source, R-based analysis pipeline. The pipeline addresses data input, peak detection, sequence and motif analysis, visualization, and data export, and can readily be extended via other R and Bioconductor packages. Using a standard multicore computer, it can be used with datasets consisting of tens of thousands of enriched regions. We demonstrate its effectiveness on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, where it detected co-occurring motifs that were consistent with the literature but not detected by other methods. Our pipeline provides the first complete set of Bioconductor tools for sequence and motif analysis of ChIP-Seq and ChIP-chip data.

References

  1. Pac Symp Biocomput. 2001;:127-38 [PMID: 11262934]
  2. Nucleic Acids Res. 2009 Jan;37(Database issue):D77-82 [PMID: 18842628]
  3. Proc Natl Acad Sci U S A. 2004 Aug 17;101(33):12114-9 [PMID: 15297614]
  4. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  5. Nucleic Acids Res. 2006;34(21):e146 [PMID: 17090591]
  6. Nucleic Acids Res. 2008 Jan;36(Database issue):D102-6 [PMID: 18006571]
  7. Nat Methods. 2009 Nov;6(11 Suppl):S22-32 [PMID: 19844228]
  8. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W199-203 [PMID: 15215380]
  9. Genome Res. 2009 Jun;19(6):1107-16 [PMID: 19246318]
  10. Bioinformatics. 2000 Jan;16(1):16-23 [PMID: 10812473]
  11. Bioinformatics. 1998;14(1):48-54 [PMID: 9520501]
  12. Bioinformatics. 1999 Jul-Aug;15(7-8):563-77 [PMID: 10487864]
  13. J Biol Chem. 1994 Mar 4;269(9):6376-82 [PMID: 7509801]
  14. Nat Biotechnol. 2008 Nov;26(11):1293-300 [PMID: 18978777]
  15. Nucleic Acids Res. 2010 Jan;38(3):e13 [PMID: 19906703]
  16. DNA Res. 2009 Oct;16(5):261-73 [PMID: 19740934]
  17. Bioinformatics. 2010 Mar 1;26(5):589-95 [PMID: 20080505]
  18. Nature. 1995 Oct 19;377(6550):591-4 [PMID: 7566171]
  19. Bioinformatics. 2009 Oct 1;25(19):2605-6 [PMID: 19689956]
  20. Bioinformatics. 2009 Jul 15;25(14):1841-2 [PMID: 19468054]
  21. Nucleic Acids Res. 2008 Jun;36(10):3171-84 [PMID: 18411210]
  22. Nat Genet. 2006 Nov;38(11):1289-97 [PMID: 17013392]
  23. Bioinformatics. 2001;17 Suppl 1:S207-14 [PMID: 11473011]
  24. Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36 [PMID: 7584402]
  25. In Silico Biol. 2006;6(4):307-10 [PMID: 16922693]
  26. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73 [PMID: 16845028]
  27. Science. 1993 Oct 8;262(5131):208-14 [PMID: 8211139]
  28. Biometrics. 2011 Mar;67(1):151-63 [PMID: 20528864]
  29. Brief Bioinform. 2008 Jul;9(4):326-32 [PMID: 18436575]
  30. Nucleic Acids Res. 2008 Jan;36(Database issue):D120-4 [PMID: 18158297]
  31. BMC Bioinformatics. 2009 Jan 06;10:2 [PMID: 19123956]
  32. Science. 2007 Sep 14;317(5844):1557-60 [PMID: 17872446]
  33. J Mol Biol. 2004 Apr 23;338(2):207-15 [PMID: 15066426]
  34. Genome Biol. 2004;5(10):R80 [PMID: 15461798]
  35. Oncogene. 2001 Apr 30;20(19):2438-52 [PMID: 11402339]
  36. Science. 1994 Jun 3;264(5164):1415-21 [PMID: 8197455]
  37. BMC Genomics. 2009 Dec 18;10:618 [PMID: 20017957]
  38. Genome Biol. 2008;9(9):R137 [PMID: 18798982]
  39. J Neurochem. 1991 Feb;56(2):400-6 [PMID: 1703218]
  40. Nucleic Acids Res. 2010 Jun;38(11):e126 [PMID: 20375099]
  41. BMC Bioinformatics. 2008 Dec 05;9:523 [PMID: 19061503]
  42. Am J Physiol Lung Cell Mol Physiol. 2003 Jul;285(1):L137-48 [PMID: 12788789]
  43. Breast Cancer Res Treat. 2008 Feb;107(3):337-47 [PMID: 17393299]
  44. J Mol Endocrinol. 2004 Jun;32(3):719-75 [PMID: 15171711]
  45. Oncogene. 2001 Apr 30;20(19):2390-400 [PMID: 11402335]
  46. Brief Bioinform. 2011 Nov;12(6):626-33 [PMID: 21059603]
  47. Bioinformatics. 2010 Mar 1;26(5):678-9 [PMID: 20089513]
  48. J Comput Biol. 2009 Feb;16(2):317-29 [PMID: 19193149]
  49. Proteins. 1990;7(1):41-51 [PMID: 2184437]
  50. BMC Bioinformatics. 2010 May 11;11:237 [PMID: 20459804]
  51. Mol Cell Biol. 1991 Sep;11(9):4371-9 [PMID: 1908551]
  52. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W253-8 [PMID: 17478497]
  53. Nat Methods. 2007 Aug;4(8):651-7 [PMID: 17558387]
  54. Blood. 2008 Dec 15;112(13):4924-34 [PMID: 18805967]
  55. Nucleic Acids Res. 2003 Jul 1;31(13):3666-8 [PMID: 12824389]
  56. Bioinformatics. 2005 Jun;21 Suppl 1:i311-8 [PMID: 15961473]
  57. Bioinformatics. 2010 Oct 15;26(20):2622-3 [PMID: 20736340]
  58. Science. 1995 Mar 3;267(5202):1349-53 [PMID: 7871433]
  59. Genome Biol. 2007;8(2):R24 [PMID: 17324271]
  60. Genome Biol. 2010;11(7):402 [PMID: 20670392]
  61. Nat Methods. 2008 Sep;5(9):829-34 [PMID: 19160518]
  62. Nat Biotechnol. 1998 Oct;16(10):939-45 [PMID: 9788350]
  63. Nucleic Acids Res. 2010 Apr;38(7):2154-67 [PMID: 20056654]
  64. Science. 1993 Dec 3;262(5139):1575-9 [PMID: 7504325]
  65. Science. 1994 Jan 7;263(5143):89-92 [PMID: 8272872]
  66. Nucleic Acids Res. 2008 Jan;36(Database issue):D93-6 [PMID: 17962296]
  67. Genes Dev. 2006 Sep 15;20(18):2513-26 [PMID: 16980581]
  68. J Biol Chem. 2003 Oct 17;278(42):41109-13 [PMID: 12941952]

Grants

  1. R01-HG005692/NHGRI NIH HHS
  2. Z01 ES101765/Intramural NIH HHS
  3. R01 HG005692-01/NHGRI NIH HHS
  4. R01 HG005692/NHGRI NIH HHS
  5. ES101765-05/NIEHS NIH HHS
  6. R01 HG005692-02/NHGRI NIH HHS

MeSH Term

Algorithms
Base Sequence
Binding Sites
CCCTC-Binding Factor
Chromatin
Chromatin Immunoprecipitation
Chromosome Mapping
HeLa Cells
Hepatocyte Nuclear Factor 3-alpha
Humans
Molecular Sequence Data
Protein Binding
Repressor Proteins
STAT1 Transcription Factor
Sequence Analysis, DNA
Sequence Homology
Systems Integration
Transcription Factors
Tumor Cells, Cultured

Chemicals

CCCTC-Binding Factor
CTCF protein, human
Chromatin
FOXA1 protein, human
Hepatocyte Nuclear Factor 3-alpha
Repressor Proteins
STAT1 Transcription Factor
Transcription Factors

Word Cloud

Created with Highcharts 10.0.0ChIP-Seqdataanalysispipelinestandardgenome-widetranscriptionintegratedsequencemotifcanBioconductordatasetsdetectedbecomemethodprofilingDNAassociationfactorssimplifyanalyzinginterpretingtypicallyinvolvesusingmultipleapplicationsdescribeopensourceR-basedaddressesinputpeakdetectionvisualizationexportreadilyextendedviaRpackagesUsingmulticorecomputerusedconsistingtensthousandsenrichedregionsdemonstrateeffectivenesspublishedhumanFOXA1ERCTCFSTAT1co-occurringmotifsconsistentliteraturemethodsprovidesfirstcompletesettoolsChIP-chipfactorbindingsites

Similar Articles

Cited By