ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data.

Lihua J Zhu, Claude Gazin, Nathan D Lawson, Hervé Pagès, Simon M Lin, David S Lapointe, Michael R Green
Author Information
  1. Lihua J Zhu: Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA. julie.zhu@umassmed.edu

Abstract

BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.
RESULTS: We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes.
CONCLUSIONS: ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.

References

  1. BMC Genomics. 2009 Jan 21;10:37 [PMID: 19159457]
  2. Bioinformatics. 2009 Oct 1;25(19):2605-6 [PMID: 19689956]
  3. Bioinformatics. 2009 Jul 15;25(14):1841-2 [PMID: 19468054]
  4. BMC Bioinformatics. 2009 Jan 06;10:2 [PMID: 19123956]
  5. Genome Biol. 2008;9(9):R137 [PMID: 18798982]
  6. Genome Res. 2004 Jan;14(1):160-9 [PMID: 14707178]
  7. Bioinformatics. 2008 Dec 15;24(24):2918-20 [PMID: 18945685]
  8. Nat Methods. 2007 Aug;4(8):651-7 [PMID: 17558387]
  9. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D447-53 [PMID: 15608235]
  10. Methods Enzymol. 2006;411:270-82 [PMID: 16939795]
  11. BMC Bioinformatics. 2008 Oct 13;9:431 [PMID: 18851737]
  12. Nat Biotechnol. 2008 Nov;26(11):1293-300 [PMID: 18978777]
  13. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  14. BMC Bioinformatics. 2009 Sep 21;10:299 [PMID: 19772557]
  15. BMC Bioinformatics. 2007 Jun 26;8:221 [PMID: 17594472]
  16. Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36 [PMID: 7584402]
  17. PLoS Comput Biol. 2008 Aug 22;4(8):e1000154 [PMID: 18725950]
  18. Nat Biotechnol. 2009 Jan;27(1):66-75 [PMID: 19122651]
  19. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  20. Bioinformatics. 2009 Oct 1;25(19):2607-8 [PMID: 19654119]
  21. Genome Biol. 2004;5(10):R80 [PMID: 15461798]
  22. Genome Res. 2008 Mar;18(3):393-403 [PMID: 18258921]
  23. Bioinformatics. 2010 Jan 1;26(1):136-8 [PMID: 19855105]
  24. Science. 2007 Jun 8;316(5830):1497-502 [PMID: 17540862]
  25. PLoS Comput Biol. 2008 Aug 22;4(8):e1000158 [PMID: 18725927]
  26. Bioinformatics. 2008 Aug 1;24(15):1729-30 [PMID: 18599518]
  27. Stat Appl Genet Mol Biol. 2004;3:Article3 [PMID: 16646809]
  28. Bioinformatics. 2006 Apr 1;22(7):883-4 [PMID: 16452111]
  29. Bioinformatics. 2005 Aug 15;21(16):3439-40 [PMID: 16082012]
  30. Nat Methods. 2008 Sep;5(9):829-34 [PMID: 19160518]

Grants

  1. R01 GM033977/NIGMS NIH HHS
  2. R01 HL093467/NHLBI NIH HHS
  3. R01 HL093467-03/NHLBI NIH HHS
  4. R01 HL093766/NHLBI NIH HHS

MeSH Term

Binding Sites
Chromatin Immunoprecipitation
Genome
Oligonucleotide Array Sequence Analysis
Software

Word Cloud

Created with Highcharts 10.0.0sitesbindingannotationChIP-seqChIP-chipnumberpackageChIPlargeChIPpeakAnnowithinenrichedChromatinimmunoprecipitationfollowedgenomeanalysisidentificationdevelopedtrackscanBioconductorstatisticalprogrammingenvironmentRbatchpeaksidentifiedCAGEgenomicregionsgenesfunctionalitiesoverlapreplicatesenablesdataBACKGROUND:high-throughputsequencingtilingarraybecomestandardtechnologiesgenome-wideDNA-bindingproteintargetalgorithmsparallelallowdatasetssubsequentvisualizationUniversityCaliforniaSantaCruzUCSCGenomeBrowsercustomHoweversummarizingdauntingtaskparticularlydistributedwidelyacrossRESULTS:facilitatecapgeneexpressionexperimentsresultingannotatedviewedeasilytablepiechartplottedhistogramformiedistributiondistancesnearestsetadditionimplementeddeterminingsignificanceamongtranscriptionfactorscomplexdrawingVenndiagramsvisualizeextentFurthermoreincludesretrievesequencesflankingputativePCRamplificationcloningmotifdiscoveryidentifyGeneOntologyGOtermsassociatedadjacentCONCLUSIONS:technologyresultsAllowinguserspassdifferentpreparationdatasetliteratureexistingpackagesGenomicFeaturesBSgenomeprovidesflexibilityTightintegrationbiomaRtup-to-dateretrievalBioMartdatabaseChIPpeakAnno:annotate

Similar Articles

Cited By