Efficient Minimum Flow Decomposition via Integer Linear Programming.

Fernando H C Dias, Lucia Williams, Brendan Mumey, Alexandru I Tomescu
Author Information
  1. Fernando H C Dias: Department of Computer Science, University of Helsinki, Helsinki, Finland.
  2. Lucia Williams: School of Computing, Montana State University, Bozeman, Montana, USA.
  3. Brendan Mumey: School of Computing, Montana State University, Bozeman, Montana, USA.
  4. Alexandru I Tomescu: Department of Computer Science, University of Helsinki, Helsinki, Finland. ORCID

Abstract

Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of the exponentially many solution paths using only a number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.

Keywords

References

  1. Nat Biotechnol. 2011 May 15;29(7):644-52 [PMID: 21572440]
  2. Nature. 2006 Jan 19;439(7074):344-8 [PMID: 16327776]
  3. J Comput Biol. 2013 Feb;20(2):113-23 [PMID: 23383997]
  4. BMC Bioinformatics. 2019 Apr 16;20(1):190 [PMID: 30991937]
  5. IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):48-56 [PMID: 34033544]
  6. Nat Biotechnol. 2020 Jun;38(6):708-714 [PMID: 32518404]
  7. Nature. 2012 Apr 04;486(7403):395-9 [PMID: 22495314]
  8. Bioinformatics. 2014 Sep 1;30(17):2447-55 [PMID: 24813214]
  9. IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1345-54 [PMID: 26671806]
  10. J Comput Biol. 2022 Feb;29(2):121-139 [PMID: 35041494]
  11. Genome Biol. 2016 Jan 30;17:16 [PMID: 26831908]
  12. Gene. 2005 Jan 3;344:1-20 [PMID: 15656968]
  13. Nat Rev Genet. 2013 Mar;14(3):157-67 [PMID: 23358380]
  14. IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):658-670 [PMID: 29990201]
  15. Genome Res. 2004 Mar;14(3):426-41 [PMID: 14962984]
  16. PLoS One. 2020 Jun 2;15(6):e0232946 [PMID: 32484809]
  17. Nat Biotechnol. 2010 May;28(5):511-5 [PMID: 20436464]
  18. Bioinformatics. 2018 Sep 1;34(17):2927-2935 [PMID: 29617936]
  19. BMC Bioinformatics. 2011 Apr 26;12:119 [PMID: 21521499]
  20. Genome Biol. 2021 Jan 22;22(1):44 [PMID: 33482911]
  21. Bioinformatics. 2019 Dec 15;35(24):5086-5094 [PMID: 31147688]
  22. Nat Biotechnol. 2019 Aug;37(8):907-915 [PMID: 31375807]
  23. Genome Biol. 2020 Feb 7;21(1):30 [PMID: 32033565]
  24. Genomics. 2013 Nov-Dec;102(5-6):507-14 [PMID: 24161398]
  25. Nat Biotechnol. 2017 Dec;35(12):1167-1169 [PMID: 29131147]
  26. Nature. 2008 Nov 27;456(7221):470-6 [PMID: 18978772]
  27. BMC Bioinformatics. 2013;14 Suppl 5:S15 [PMID: 23734627]
  28. Nat Biotechnol. 2015 Mar;33(3):290-5 [PMID: 25690850]
  29. Nat Commun. 2021 Nov 18;12(1):6728 [PMID: 34795232]
  30. Genome Res. 2008 Dec;18(12):1865-74 [PMID: 18842824]
  31. Genome Biol. 2019 Dec 16;20(1):278 [PMID: 31842956]
  32. J Comput Biol. 2011 Nov;18(11):1693-707 [PMID: 21951053]
  33. Genome Biol. 2014;15(10):501 [PMID: 25367074]
  34. Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):19867-72 [PMID: 22135461]
  35. Bioinformatics. 2021 Jul 19;37(12):1673-1680 [PMID: 33471068]
  36. Bioinformatics. 2012 Apr 15;28(8):1086-92 [PMID: 22368243]

MeSH Term

Programming, Linear
Computational Biology
Algorithms
RNA

Chemicals

RNA

Word Cloud

Created with Highcharts 10.0.0flowmultiassemblyRNAassemblypracticalMinimumdecompositionMFDproblemdecomposenetworkpathstoolsIntegerLinearProgrammingILPapproachmanyNP-hardaskingminimumsettogetherassociatedweightsVariantspowerfulmodelsproblemsBioinformaticsOwinghardnesseitheruseheuristicssolvesimplerpolynomialtime-solvableversionsmayyieldsolutionsminimalperfectlyprovidefirstfastexactsolveracyclicnetworksbasedKeyencodingexponentiallysolutionusingnumbervariablesalsoextendformulationvariantsincorporatinglongerpaired-endreadsminimizingerrorssimulatedreal-flowsplicinggraphssolvesinstance<13secondshopeformulationscanliecorefutureimplementationsfreelyavailableGithubEfficientFlowDecompositionviaintegerlinearprogramming

Similar Articles

Cited By