DNAcycP: a deep learning tool for DNA cyclizability prediction.

Keren Li, Matthew Carroll, Reza Vafabakhsh, Xiaozhong A Wang, Ji-Ping Wang
Author Information
  1. Keren Li: Department of Statistics, Northwestern University, 633 Clark Street, Evanston, IL 60208, USA.
  2. Matthew Carroll: Weinberg College IT Solutions (WITS), Northwestern University, 633 Clark Street, Evanston, IL 60208, USA.
  3. Reza Vafabakhsh: Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA.
  4. Xiaozhong A Wang: Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA.
  5. Ji-Ping Wang: Department of Statistics, Northwestern University, 633 Clark Street, Evanston, IL 60208, USA. ORCID

Abstract

DNA mechanical properties play a critical role in every aspect of DNA-dependent biological processes. Recently a high throughput assay named loop-seq has been developed to quantify the intrinsic bendability of a massive number of DNA fragments simultaneously. Using the loop-seq data, we develop a software tool, DNAcycP, based on a deep-learning approach for intrinsic DNA cyclizability prediction. We demonstrate DNAcycP predicts intrinsic DNA cyclizability with high fidelity compared to the experimental data. Using an independent dataset from in vitro selection for enrichment of loopable sequences, we further verified the predicted cyclizability score, termed C-score, can well distinguish DNA fragments with different loopability. We applied DNAcycP to multiple species and compared the C-scores with available high-resolution chemical nucleosome maps. Our analyses showed that both yeast and mouse genomes share a conserved feature of high DNA bendability spanning nucleosome dyads. Additionally, we extended our analysis to transcription factor binding sites and surprisingly found that the cyclizability is substantially elevated at CTCF binding sites in the mouse genome. We further demonstrate this distinct mechanical property is conserved across mammalian species and is inherent to CTCF binding DNA motif.

References

  1. BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):478 [PMID: 29219068]
  2. Nucleic Acids Res. 2012 Apr;40(7):2862-8 [PMID: 22156372]
  3. Nucleic Acids Res. 2016 Jun 20;44(11):e107 [PMID: 27084946]
  4. Nucleic Acids Res. 2011 Nov 1;39(20):8740-51 [PMID: 21775342]
  5. Mol Cell. 2004 May 7;14(3):355-62 [PMID: 15125838]
  6. Cell. 2008 Jun 13;133(6):1106-17 [PMID: 18555785]
  7. Nat Rev Genet. 2020 Jan;21(1):5-26 [PMID: 31636414]
  8. Proc Natl Acad Sci U S A. 1998 Sep 15;95(19):11163-8 [PMID: 9736707]
  9. Nature. 2012 Jun 28;486(7404):496-501 [PMID: 22722846]
  10. Biophys J. 2019 Dec 3;117(11):2217-2227 [PMID: 31521330]
  11. Bioessays. 1993 Jan;15(1):25-32 [PMID: 8466473]
  12. Proc Natl Acad Sci U S A. 2010 Aug 31;107(35):15421-6 [PMID: 20702767]
  13. Curr Opin Struct Biol. 2020 Oct;64:42-50 [PMID: 32615513]
  14. Genome Res. 2020 Dec 18;: [PMID: 33355297]
  15. Proc Natl Acad Sci U S A. 1981 Aug;78(8):4833-7 [PMID: 6272277]
  16. Nucleic Acids Res. 2018 Jun 20;46(11):e69 [PMID: 29617928]
  17. BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):418 [PMID: 30453896]
  18. Cell. 2016 Dec 1;167(6):1555-1570.e15 [PMID: 27889238]
  19. Biopolymers. 2015 Jun;103(6):303-20 [PMID: 25620396]
  20. BMC Genomics. 2013 Jun 10;14:391 [PMID: 23758892]
  21. Nature. 1986 May 22-28;321(6068):449-50 [PMID: 3713816]
  22. Sensors (Basel). 2020 Dec 08;20(24): [PMID: 33302459]
  23. Nature. 2021 Jan;589(7842):462-467 [PMID: 33328628]
  24. Nucleic Acids Res. 1995 Jun 25;23(12):2083-91 [PMID: 7610037]
  25. Oncogene. 1994 Feb;9(2):463-8 [PMID: 8290258]
  26. J Mol Biol. 2021 Mar 19;433(6):166861 [PMID: 33539885]
  27. Electrophoresis. 1993 Aug;14(8):732-46 [PMID: 8404817]
  28. Nat Rev Mol Cell Biol. 2021 Jul;22(7):445-464 [PMID: 33767413]
  29. Methods. 2019 Aug 15;166:40-47 [PMID: 30922998]
  30. Adv Exp Med Biol. 2018;1092:11-39 [PMID: 30368746]
  31. J Mol Biol. 1983 Nov 15;170(4):957-81 [PMID: 6315955]
  32. Nucleic Acids Res. 2012 Sep;40(16):7728-38 [PMID: 22718983]
  33. Bioinformatics. 2018 May 15;34(10):1705-1712 [PMID: 29329398]
  34. Biophys J. 2012 May 2;102(9):2140-8 [PMID: 22824278]
  35. Nucleic Acids Res. 2021 Nov 18;49(20):11459-11475 [PMID: 34718725]
  36. Nature. 1986 Apr 10-16;320(6062):501-6 [PMID: 3960133]
  37. BMC Genomics. 2018 Dec 31;19(Suppl 10):905 [PMID: 30598079]
  38. Nat Rev Genet. 2019 Jul;20(7):389-403 [PMID: 30971806]
  39. J Mol Biol. 1991 May 20;219(2):217-30 [PMID: 1645411]
  40. Genome Res. 2018 May;28(5):739-750 [PMID: 29588361]
  41. Nucleic Acids Res. 2020 Jul 27;48(13):7099-7118 [PMID: 32558887]
  42. Proc Natl Acad Sci U S A. 1986 Feb;83(4):862-6 [PMID: 3456570]
  43. J Chem Theory Comput. 2017 Apr 11;13(4):1539-1555 [PMID: 28029797]
  44. Proc Natl Acad Sci U S A. 1997 Nov 11;94(23):12633-7 [PMID: 9356501]
  45. J Mol Biol. 1999 Aug 13;291(2):249-65 [PMID: 10438619]
  46. Annu Rev Genet. 2020 Nov 23;54:367-385 [PMID: 32886547]
  47. Cell. 2012 Jan 20;148(1-2):335-48 [PMID: 22244452]
  48. Nucleic Acids Res. 2009 Aug;37(14):4580-6 [PMID: 19487242]
  49. Nature. 1995 Feb 23;373(6516):724-7 [PMID: 7854460]
  50. Nucleic Acids Res. 2011 Dec;39(22):9820-32 [PMID: 21917856]
  51. Biopolymers. 2007 Feb 5;85(2):115-30 [PMID: 17103419]
  52. J Mol Biol. 1998 Feb 13;276(1):19-42 [PMID: 9514715]
  53. Nature. 2003 May 8;423(6936):145-50 [PMID: 12736678]
  54. Proc Natl Acad Sci U S A. 2005 Apr 12;102(15):5397-402 [PMID: 15809441]
  55. Phys Rev Lett. 2019 May 31;122(21):218101 [PMID: 31283336]
  56. Nucleic Acids Res. 2014;42(16):10786-94 [PMID: 25122748]
  57. Science. 2012 Aug 31;337(6098):1097-101 [PMID: 22936778]
  58. PLoS One. 2009 May 25;4(5):e5621 [PMID: 19479049]
  59. Nature. 1997 Sep 18;389(6648):251-60 [PMID: 9305837]
  60. Biophys J. 2002 Dec;83(6):3446-59 [PMID: 12496111]
  61. Trends Genet. 2009 Aug;25(8):335-43 [PMID: 19596482]
  62. Nano Lett. 2005 Jul;5(7):1509-14 [PMID: 16178266]
  63. Nucleic Acids Res. 2009 May;37(9):2882-93 [PMID: 19282451]
  64. Nature. 2006 Aug 17;442(7104):772-8 [PMID: 16862119]
  65. Brief Bioinform. 2021 Sep 2;22(5): [PMID: 33751027]

MeSH Term

Animals
Binding Sites
Chromatin
Cyclization
DNA
Deep Learning
Mammals
Mice
Nucleosomes
Saccharomyces cerevisiae
Software

Chemicals

Chromatin
Nucleosomes
DNA

Word Cloud

Created with Highcharts 10.0.0DNAcyclizabilityhighintrinsicDNAcycPbindingmechanicalloop-seqbendabilityfragmentsUsingdatatoolpredictiondemonstratecomparedspeciesnucleosomemouseconservedsitesCTCFpropertiesplaycriticalroleeveryaspectDNA-dependentbiologicalprocessesRecentlythroughputassaynameddevelopedquantifymassivenumbersimultaneouslydevelopsoftwarebaseddeep-learningapproachpredictsfidelityexperimentalindependentdatasetvitroselectionenrichmentloopablesequencesverifiedpredictedscoretermedC-scorecanwelldistinguishdifferentloopabilityappliedmultipleC-scoresavailablehigh-resolutionchemicalmapsanalysesshowedyeastgenomessharefeaturespanningdyadsAdditionallyextendedanalysistranscriptionfactorsurprisinglyfoundsubstantiallyelevatedgenomedistinctpropertyacrossmammalianinherentmotifDNAcycP:deeplearning

Similar Articles

Cited By