RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models.

Han Huang, Ziqian Lin, Dongchen He, Liang Hong, Yu Li
Author Information
  1. Han Huang: Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China.
  2. Ziqian Lin: Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China.
  3. Dongchen He: Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China. ORCID
  4. Liang Hong: Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China.
  5. Yu Li: Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China. ORCID

Abstract

MOTIVATION: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation.
RESULTS: In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.
AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/ml4bio/RiboDiffusion.

References

  1. Bioinformatics. 2012 Dec 1;28(23):3150-2 [PMID: 23060610]
  2. Bioinformatics. 2006 Aug 1;22(15):1823-31 [PMID: 16709587]
  3. Nat Methods. 2024 Jan;21(1):117-121 [PMID: 37996753]
  4. Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200 [PMID: 33211869]
  5. Brief Bioinform. 2018 Mar 1;19(2):350-358 [PMID: 28049135]
  6. Cell Death Dis. 2022 Jul 23;13(7):644 [PMID: 35871216]
  7. Nat Methods. 2020 Mar;17(3):261-272 [PMID: 32015543]
  8. Methods. 2016 Jul 1;103:138-56 [PMID: 27125734]
  9. Adv Appl Bioinform Chem. 2011;4:1-12 [PMID: 21918633]
  10. Sensors (Basel). 2017 Aug 30;17(9): [PMID: 28867802]
  11. Proc Natl Acad Sci U S A. 2022 Apr 26;119(17):e2112677119 [PMID: 35439059]
  12. Nucleic Acids Res. 2015 Jul 1;43(W1):W498-501 [PMID: 25964298]
  13. Science. 2022 Oct 7;378(6615):49-56 [PMID: 36108050]
  14. Curr Opin Chem Biol. 2015 Oct;28:47-56 [PMID: 26093826]
  15. Nucleic Acids Res. 2016 Jan 8;44(1):1-13 [PMID: 26621913]
  16. Nat Commun. 2017 Oct 19;8(1):1051 [PMID: 29051490]
  17. Nat Commun. 2023 Sep 16;14(1):5745 [PMID: 37717036]
  18. Nucleic Acids Res. 2019 Jan 8;47(D1):D221-D229 [PMID: 30395267]
  19. J Mol Biol. 2004 Feb 20;336(3):607-24 [PMID: 15095976]
  20. Science. 2023 Mar 17;379(6637):1123-1130 [PMID: 36927031]
  21. J Bioinform Comput Biol. 2013 Apr;11(2):1350001 [PMID: 23600819]
  22. BMC Bioinformatics. 2015 Nov 18;16:389 [PMID: 26581440]
  23. Nature. 2023 Aug;620(7976):1089-1100 [PMID: 37433327]
  24. Nucleic Acids Res. 2015 Dec 2;43(21):e142 [PMID: 26184874]
  25. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W70-4 [PMID: 18424795]
  26. Biotechnol Adv. 2019 Dec;37(8):107452 [PMID: 31669138]
  27. BMC Bioinformatics. 2017 Nov 6;18(1):468 [PMID: 29110632]
  28. Nature. 2021 Aug;596(7873):583-589 [PMID: 34265844]
  29. Methods Mol Biol. 2015;1269:393-412 [PMID: 25577393]
  30. Nucleic Acids Res. 2014 Oct;42(18):11752-62 [PMID: 25209235]
  31. Nat Methods. 2022 Sep;19(9):1109-1115 [PMID: 36038728]
  32. Nucleic Acids Res. 2016 Apr 20;44(7):2987-99 [PMID: 26969733]

Grants

  1. 4937025/Chinese University of Hong Kong
  2. CUHK 24204023/Research Grants Council of the Hong Kong Special Administrative Region
  3. GHP/065/21SZ/Innovation and Technology Commission of the Hong Kong Special Administrative Region
  4. /RMGS
  5. 8601603/CUHK

MeSH Term

RNA
Nucleic Acid Conformation
RNA Folding
Computational Biology
Algorithms
Software
Neural Networks, Computer
Sequence Analysis, RNA

Chemicals

RNA

Word Cloud

Created with Highcharts 10.0.0RNAsequencessequencefolding3DmodelgiveninversestructuresstructuresimilaritydesignvariousstructuralconstraintsproblembasedRiboDiffusiongenerativediffusioncanmodulerecoverysplitsMOTIVATION:showsgrowingapplicationssyntheticbiologytherapeuticsdrivencrucialrolebiologicalprocessesfundamentalchallengefindfunctionalsatisfyknownComputationalapproachesemergedaddresssecondaryHoweverdesigningdirectlystillchallengingduescarcitydatanonuniquestructure-sequencemappingflexibilityconformationRESULTS:studyproposelearnconditionaldistributionbackboneconsistsgraphneuralnetwork-basedTransformer-basediterativelytransformsrandomdesiredtuningsamplingweightallowstrade-offdiversityexplorecandidatessplittestsetsclusteringdifferentcut-offsoutperformsbaselinesaveragerelativeimprovement11%16%MoreoverperformsconsistentlywellacrosslengthcategoriestypesalsoapplysilicovalidatewhethergeneratedfoldbackbonesmethodpowerfultoolexploresvastspacefindsnovelsolutionsAVAILABILITYANDIMPLEMENTATION:sourcecodeavailablehttps://githubcom/ml4bio/RiboDiffusionRiboDiffusion:tertiarystructure-basedmodels

Similar Articles

Cited By