iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations.
Junru Jin, Yingying Yu, Ruheng Wang, Xin Zeng, Chao Pang, Yi Jiang, Zhongshen Li, Yutong Dai, Ran Su, Quan Zou, Kenta Nakai, Leyi Wei
Author Information
Junru Jin: School of Software, Shandong University, Jinan, 250101, China.
Yingying Yu: School of Software, Shandong University, Jinan, 250101, China.
Ruheng Wang: School of Software, Shandong University, Jinan, 250101, China.
Xin Zeng: Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
Chao Pang: School of Software, Shandong University, Jinan, 250101, China.
Yi Jiang: School of Software, Shandong University, Jinan, 250101, China.
Zhongshen Li: School of Software, Shandong University, Jinan, 250101, China.
Yutong Dai: Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
Ran Su: College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
Quan Zou: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
Kenta Nakai: Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan. knakai@ims.u-tokyo.ac.jp.
Leyi Wei: School of Software, Shandong University, Jinan, 250101, China. weileyi@sdu.edu.cn.
中文译文
English
In this study, we propose iDNA-ABF, a multi-scale deep biological language learning model that enables the interpretable prediction of DNA methylations based on genomic sequences only. Benchmarking comparisons show that our iDNA-ABF outperforms state-of-the-art methods for different methylation predictions. Importantly, we show the power of deep language learning in capturing both sequential and functional semantics information from background genomes. Moreover, by integrating the interpretable analysis mechanism, we well explain what the model learns, helping us build the mapping from the discovery of important sequential determinants to the in-depth analysis of their biological functions.
Bioinformatics. 2006 Jul 1;22(13):1658-9
[PMID: 16731699 ]
BMC Genomics. 2020 Sep 11;21(1):627
[PMID: 32917152 ]
Nat Commun. 2016 Jun 30;7:12065
[PMID: 27356984 ]
Nucleic Acids Res. 2019 Jul 26;47(13):6753-6768
[PMID: 31334813 ]
Front Cell Dev Biol. 2020 Jul 28;8:614
[PMID: 32850787 ]
Nucleic Acids Res. 2017 Jul 3;45(W1):W534-W538
[PMID: 28460012 ]
Cell. 2015 May 7;161(4):879-892
[PMID: 25936837 ]
Bioinformatics. 2021 Dec 11;37(24):4603-4610
[PMID: 34601568 ]
Bioinformatics. 2021 Aug 9;37(15):2112-2120
[PMID: 33538820 ]
Nucleic Acids Res. 2005 Oct 13;33(18):5868-77
[PMID: 16224102 ]
Nat Struct Mol Biol. 2013 Mar;20(3):274-81
[PMID: 23463312 ]
Nat Rev Genet. 2019 Aug;20(8):437-455
[PMID: 31086298 ]
Nat Rev Mol Cell Biol. 2019 Oct;20(10):590-607
[PMID: 31399642 ]
Genome Biol. 2022 Oct 17;23(1):219
[PMID: 36253864 ]
Front Genet. 2019 Oct 11;10:1071
[PMID: 31681441 ]
Brief Bioinform. 2022 Mar 10;23(2):
[PMID: 35225328 ]
Oncogene. 2013 Jan 31;32(5):663-9
[PMID: 22391558 ]
Genome Res. 2019 Jun;29(6):969-977
[PMID: 31160376 ]
Genome Biol. 2014 Jun 23;15(6):R81
[PMID: 24958354 ]
Brief Bioinform. 2021 May 20;22(3):
[PMID: 32608476 ]
Nat Commun. 2020 Jul 29;11(1):3696
[PMID: 32728046 ]
Indian Pediatr. 2011 Apr;48(4):277-87
[PMID: 21532099 ]
Nat Genet. 2012 Nov;44(11):1207-14
[PMID: 23064413 ]
Nature. 2009 Nov 19;462(7271):315-22
[PMID: 19829295 ]
Genome Res. 2010 Mar;20(3):332-40
[PMID: 20107151 ]
J Am Chem Soc. 2019 Jun 5;141(22):8694-8697
[PMID: 31117646 ]
PLoS Comput Biol. 2021 Feb 18;17(2):e1008767
[PMID: 33600435 ]
Plant J. 2019 Feb;97(4):779-794
[PMID: 30427081 ]
Bioinformatics. 2020 Jan 15;36(2):388-392
[PMID: 31297537 ]
iScience. 2020 Apr 24;23(4):100991
[PMID: 32240948 ]
Bioinformatics. 2020 Jun 1;36(11):3327-3335
[PMID: 32108866 ]
Nucleic Acids Res. 2020 Jan 8;48(D1):D882-D889
[PMID: 31713622 ]
Bioinformatics. 2021 Sep 29;37(18):2834-2840
[PMID: 33760053 ]
PLoS Biol. 2016 Dec 22;14(12):e2001624
[PMID: 28005907 ]
Nat Rev Genet. 2011 Nov 29;13(1):36-46
[PMID: 22124482 ]
Nat Methods. 2010 Jun;7(6):461-5
[PMID: 20453866 ]
Nat Rev Genet. 2013 Mar;14(3):204-20
[PMID: 23400093 ]
Genome Biol. 2007;8(2):R24
[PMID: 17324271 ]
Molecules. 2021 Dec 07;26(24):
[PMID: 34946497 ]
DNA Methylation
Genomics
Language
Models, Biological