baseLess: lightweight detection of sequences in raw MinION data.

Ben Noordijk, Reindert Nijland, Victor J Carrion, Jos M Raaijmakers, Dick de Ridder, Carlos de Lannoy
Author Information
  1. Ben Noordijk: Bioinformatics Group, Wageningen University, Wageningen 6700AH, The Netherlands. ORCID
  2. Reindert Nijland: Marine Animal Ecology, Wageningen University, Wageningen 6700AP, The Netherlands.
  3. Victor J Carrion: Institute of Biology, Leiden University, Leiden 2300RA, The Netherlands.
  4. Jos M Raaijmakers: Institute of Biology, Leiden University, Leiden 2300RA, The Netherlands.
  5. Dick de Ridder: Bioinformatics Group, Wageningen University, Wageningen 6700AH, The Netherlands. ORCID
  6. Carlos de Lannoy: Bioinformatics Group, Wageningen University, Wageningen 6700AH, The Netherlands. ORCID

Abstract

Summary: With its candybar form factor and low initial investment cost, the MinION brought affordable portable nucleic acid analysis within reach. However, translating the electrical signal it outputs into a sequence of bases still requires mid-tier computer hardware, which remains a caveat when aiming for deployment of many devices at once or usage in remote areas. For applications focusing on detection of a target sequence, such as infectious disease monitoring or species identification, the computational cost of analysis may be reduced by directly detecting the target sequence in the electrical signal instead. Here, we present baseLess, a computational tool that enables such target-detection-only analysis. BaseLess makes use of an array of small neural networks, each of which efficiently detects a fixed-size subsequence of the target sequence directly from the electrical signal. We show that baseLess can accurately determine the identity of reads between three closely related fish species and can classify sequences in mixtures of 20 bacterial species, on an inexpensive single-board computer.
Availability and implementation: baseLess and all code used in data preparation and validation are available on Github at https://github.com/cvdelannoy/baseLess, under an MIT license. Used validation data and scripts can be found at https://doi.org/10.4121/20261392, under an MIT license.
Supplementary information: Supplementary data are available at online.

References

  1. Nat Commun. 2020 Nov 3;11(1):5454 [PMID: 33144581]
  2. BMC Infect Dis. 2021 Aug 18;21(1):828 [PMID: 34407759]
  3. Nat Biotechnol. 2021 Apr;39(4):431-441 [PMID: 33257863]
  4. Gigascience. 2018 Apr 1;7(4): [PMID: 29617771]
  5. Electrophoresis. 2018 Nov;39(21):2642-2654 [PMID: 30101986]
  6. F1000Res. 2017 Jul 7;6:1083 [PMID: 29375809]
  7. Front Microbiol. 2020 Aug 26;11:2029 [PMID: 32983030]
  8. Genome Med. 2016 Sep 29;8(1):97 [PMID: 27683027]
  9. Nat Med. 2014 May;20(5):548-54 [PMID: 24705333]
  10. Nature. 2015 Oct 1;526(7571):29-31 [PMID: 26432225]
  11. Front Microbiol. 2017 Dec 20;8:2594 [PMID: 29326684]
  12. Mol Ecol Notes. 2007 May 1;7(3):355-364 [PMID: 18784790]
  13. Mol Biol Evol. 2021 May 19;38(6):2627-2638 [PMID: 33620468]
  14. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  15. J Bacteriol. 2022 Jan 18;204(1):e0041821 [PMID: 34780302]
  16. Genome Med. 2018 Sep 28;10(1):74 [PMID: 30266093]
  17. Genome Biol. 2021 Oct 27;22(1):298 [PMID: 34706748]
  18. Bioinformatics. 2012 Oct 1;28(19):2520-2 [PMID: 22908215]
  19. Bioinformatics. 2020 Aug 15;36(14):4191-4192 [PMID: 32374816]
  20. Nat Methods. 2016 Sep;13(9):751-4 [PMID: 27454285]
  21. Nature. 2019 Jun;570(7761):385-389 [PMID: 31142840]