MEPP: more transparent motif enrichment by profiling positional correlations.

Nathaniel P Delos Santos, Sascha Duttke, Sven Heinz, Christopher Benner
Author Information
  1. Nathaniel P Delos Santos: Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA. ORCID
  2. Sascha Duttke: School of Molecular Biosciences, College of Veterinary Medicine, Washington State University, Pullman, WA, USA. ORCID
  3. Sven Heinz: Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA. ORCID
  4. Christopher Benner: Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA. ORCID

Abstract

Score-based motif enrichment analysis (MEA) is typically applied to regulatory DNA to infer transcription factors (TFs) that may modulate transcription and chromatin state in different conditions. Most MEA methods determine motif enrichment independent of motif position within a sequence, even when those sequences harbor anchor points that motifs and their bound TFs may functionally interact with in a distance-dependent fashion, such as other TF binding motifs, transcription start sites (TSS), sequencing assay cleavage sites, or other biologically meaningful features. We developed motif enrichment positional profiling (MEPP), a novel MEA method that outputs a positional enrichment profile of a given TF's binding motif relative to key anchor points (e.g. transcription start sites, or other motifs) within the analyzed sequences while accounting for lower-order nucleotide bias. Using transcription initiation and TF binding as test cases, we demonstrate MEPP's utility in determining the sequence positions where motif presence correlates with measures of biological activity, inferring positional dependencies of binding site function. We demonstrate how MEPP can be applied to interpretation and hypothesis generation from experiments that quantify transcription initiation, chromatin structure, or TF binding measurements. MEPP is available for download from https://github.com/npdeloss/mepp.

References

  1. Genes Dev. 2002 Oct 15;16(20):2583-92 [PMID: 12381658]
  2. Genome Biol. 2019 Feb 26;20(1):45 [PMID: 30808370]
  3. Nat Rev Mol Cell Biol. 2015 Mar;16(3):144-54 [PMID: 25650801]
  4. Nat Genet. 2021 Mar;53(3):266-268 [PMID: 33686263]
  5. BMC Bioinformatics. 2008 Nov 17;9:484 [PMID: 19014636]
  6. Commun Biol. 2021 Jun 2;4(1):661 [PMID: 34079046]
  7. Cell. 2008 Mar 7;132(5):887-98 [PMID: 18329373]
  8. Epigenetics Chromatin. 2014 Nov 20;7(1):33 [PMID: 25473421]
  9. IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):69-79 [PMID: 21071798]
  10. Genome Res. 2019 Nov;29(11):1836-1846 [PMID: 31649059]
  11. Proc Natl Acad Sci U S A. 1995 Sep 12;92(19):8606-10 [PMID: 7567983]
  12. Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173 [PMID: 34850907]
  13. Nat Biotechnol. 2015 Apr;33(4):395-401 [PMID: 25751057]
  14. Proc Natl Acad Sci U S A. 2016 Jun 7;113(23):6508-13 [PMID: 27155014]
  15. Bioinformatics. 2009 Dec 1;25(23):3181-2 [PMID: 19773334]
  16. BMC Bioinformatics. 2010 Apr 01;11:165 [PMID: 20356413]
  17. Nat Genet. 2014 Dec;46(12):1311-20 [PMID: 25383968]
  18. Genome Biol. 2014;15(12):550 [PMID: 25516281]
  19. Genes Dev. 2019 Sep 1;33(17-18):1159-1174 [PMID: 31371436]
  20. Front Immunol. 2018 Nov 13;9:2542 [PMID: 30483250]
  21. Nat Rev Genet. 2004 Apr;5(4):276-87 [PMID: 15131651]
  22. Curr Protoc Mol Biol. 2012 Oct;Chapter 21:Unit 21.24 [PMID: 23026909]
  23. Nat Genet. 2021 Mar;53(3):354-366 [PMID: 33603233]
  24. Nat Methods. 2020 Mar;17(3):261-272 [PMID: 32015543]
  25. Bioinformatics. 2020 Apr 1;36(7):2272-2274 [PMID: 31821414]
  26. Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801 [PMID: 29126249]
  27. Mol Cell Biol. 2015 Oct 26;36(1):157-72 [PMID: 26503782]
  28. Curr Protoc Mol Biol. 2015 Jan 05;109:21.29.1-21.29.9 [PMID: 25559105]
  29. BMC Bioinformatics. 2020 Sep 16;21(1):410 [PMID: 32938397]
  30. EMBO J. 1997 Jun 2;16(11):3145-57 [PMID: 9214632]
  31. PLoS One. 2007 Aug 29;2(8):e807 [PMID: 17726537]
  32. Nature. 2012 Sep 6;489(7414):57-74 [PMID: 22955616]
  33. Nucleic Acids Res. 2016 Sep 19;44(16):7511-26 [PMID: 27317694]
  34. Genome Biol. 2002;3(12):RESEARCH0087 [PMID: 12537576]
  35. BMC Genomics. 2014 Sep 02;15:752 [PMID: 25179504]
  36. Nucleic Acids Res. 2020 Jul 27;48(13):7182-7196 [PMID: 32510157]
  37. Mol Cell. 2010 May 28;38(4):576-89 [PMID: 20513432]
  38. BMC Bioinformatics. 2016 Nov 21;17(1):479 [PMID: 27871221]
  39. Nat Commun. 2019 Jul 11;10(1):3072 [PMID: 31296853]

Grants

  1. R00 GM135515/NIGMS NIH HHS
  2. R01 GM129523/NIGMS NIH HHS
  3. R01 GM134366/NIGMS NIH HHS

Word Cloud

Created with Highcharts 10.0.0motiftranscriptionenrichmentbindingpositionalMEAmotifsTFsitesMEPPappliedTFsmaychromatinwithinsequencesequencesanchorpointsstartprofilinginitiationdemonstrateScore-basedanalysistypicallyregulatoryDNAinferfactorsmodulatestatedifferentconditionsmethodsdetermineindependentpositionevenharborboundfunctionallyinteractdistance-dependentfashionTSSsequencingassaycleavagebiologicallymeaningfulfeaturesdevelopednovelmethodoutputsprofilegivenTF'srelativekeyeganalyzedaccountinglower-ordernucleotidebiasUsingtestcasesMEPP'sutilitydeterminingpositionspresencecorrelatesmeasuresbiologicalactivityinferringdependenciessitefunctioncaninterpretationhypothesisgenerationexperimentsquantifystructuremeasurementsavailabledownloadhttps://githubcom/npdeloss/meppMEPP:transparentcorrelations

Similar Articles

Cited By