Scaling up DNA digital data storage by efficiently predicting DNA hybridisation using deep learning.

David Buterez
Author Information
  1. David Buterez: Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. db804@cam.ac.uk.

Abstract

Deoxyribonucleic acid (DNA) has shown great promise in enabling computational applications, most notably in the fields of DNA digital data storage and DNA computing. Information is encoded as DNA strands, which will naturally bind in solution, thus enabling search and pattern-matching capabilities. Being able to control and predict the process of DNA hybridisation is crucial for the ambitious future of Hybrid Molecular-Electronic Computing. Current tools are, however, limited in terms of throughput and applicability to large-scale problems. We present the first comprehensive study of machine learning methods applied to the task of predicting DNA hybridisation. For this purpose, we introduce an in silico-generated hybridisation dataset of over 2.5 million data points, enabling the use of deep learning. Depending on hardware, we achieve a reduction in inference time ranging from one to over two orders of magnitude compared to the state-of-the-art, while retaining high fidelity. We then discuss the integration of our methods in modern, scalable workflows.

References

  1. BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):478 [PMID: 29219068]
  2. Bioinformatics. 2009 Jun 1;25(11):1422-3 [PMID: 19304878]
  3. Biomol Detect Quantif. 2017 Nov 22;14:19-28 [PMID: 29201647]
  4. Science. 1994 Nov 11;266(5187):1021-4 [PMID: 7973651]
  5. Proc Natl Acad Sci U S A. 2018 Mar 6;115(10):E2183-E2192 [PMID: 29463736]
  6. Angew Chem Int Ed Engl. 2015 Feb 16;54(8):2552-5 [PMID: 25650567]
  7. Nat Chem. 2018 Jan;10(1):91-98 [PMID: 29256499]
  8. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  9. ACS Synth Biol. 2020 Oct 16;9(10):2665-2678 [PMID: 32910644]
  10. Plant Methods. 2006 Mar 01;2(1):4 [PMID: 16509990]
  11. BioData Min. 2017 Dec 8;10:35 [PMID: 29234465]
  12. Nature. 2013 Feb 7;494(7435):77-80 [PMID: 23354052]
  13. Nat Biotechnol. 2017 Nov;35(11):1026-1028 [PMID: 29035372]
  14. J Comput Chem. 2011 Jan 15;32(1):170-3 [PMID: 20645303]
  15. BMC Genomics. 2020 Jan 2;21(1):6 [PMID: 31898477]
  16. Genome Biol. 2016 Nov 25;17(1):239 [PMID: 27887629]
  17. Genome Res. 2002 Jun;12(6):985-95 [PMID: 12045152]
  18. BMC Bioinformatics. 2016 Feb 10;17:81 [PMID: 26864881]
  19. Proc Biol Sci. 2012 Dec 7;279(1748):4724-33 [PMID: 23055061]
  20. Brief Funct Genomics. 2019 Feb 14;18(1):41-57 [PMID: 30265280]

MeSH Term

Computers, Molecular
Deep Learning
Information Storage and Retrieval
Nucleic Acid Hybridization

Word Cloud

Created with Highcharts 10.0.0DNAhybridisationenablingdatalearningdigitalstoragemethodspredictingdeepDeoxyribonucleicacidshowngreatpromisecomputationalapplicationsnotablyfieldscomputingInformationencodedstrandswillnaturallybindsolutionthussearchpattern-matchingcapabilitiesablecontrolpredictprocesscrucialambitiousfutureHybridMolecular-ElectronicComputingCurrenttoolshoweverlimitedtermsthroughputapplicabilitylarge-scaleproblemspresentfirstcomprehensivestudymachineappliedtaskpurposeintroducesilico-generateddataset25millionpointsuseDependinghardwareachievereductioninferencetimerangingonetwoordersmagnitudecomparedstate-of-the-artretaininghighfidelitydiscussintegrationmodernscalableworkflowsScalingefficientlyusing

Similar Articles

Cited By