Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.

C S Elder, Minh Hoang, Mohsen Ferdosi, Carl Kingsford
Author Information
  1. C S Elder: Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh Pennsylvania, USA. ORCID
  2. Minh Hoang: Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA. ORCID
  3. Mohsen Ferdosi: Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh Pennsylvania, USA.
  4. Carl Kingsford: Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh Pennsylvania, USA. ORCID

Abstract

The Turnpike problem aims to reconstruct a set of one-dimensional points from their unordered pairwise distances. Turnpike arises in biological applications such as molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes. Under noisy observation of the distances, the Turnpike problem is NP-hard and can take exponential time and space to solve when using traditional algorithms. To address this, we reframe the noisy Turnpike problem through the lens of optimization, seeking to simultaneously find the unknown point set and a permutation that maximizes similarity to the input distances. Our core contribution is a suite of algorithms that robustly solve this new objective. This includes a bilevel optimization framework that can efficiently solve Turnpike instances with up to 100,000 points. We show that this framework can be extended to scenarios with domain-specific constraints that include duplicated, missing, and partially labeled distances. Using these, we also extend our algorithms to work for points distributed on a circle (the Beltway problem). For small-scale applications that require global optimality, we formulate an integer linear program (ILP) that (i) accepts an objective from a generic family of convex functions and (ii) uses an extended formulation to reduce the number of binary variables. On synthetic and real partial digest data, our bilevel algorithms achieved state-of-the-art scalability across challenging scenarios with performance that matches or exceeds competing baselines. On small-scale instances, our ILP efficiently recovered ground-truth assignments and produced reconstructions that match or exceed our alternating algorithms. Our implementations are available at https://github.com/Kingsford-Group/turnpikesolvermm.

Keywords

References

  1. J Comput Biol. 1995 Summer;2(2):159-84 [PMID: 7497125]
  2. J Comput Biol. 2019 Jan;26(1):68-75 [PMID: 30117753]
  3. Proc Int Conf Intell Syst Mol Biol. 1993;1:362-70 [PMID: 7584358]
  4. J Comput Biol. 1994 Fall;1(3):235-9 [PMID: 8790468]
  5. IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):668-80 [PMID: 17975277]
  6. J Bioinform Comput Biol. 2015 Feb;13(1):1540008 [PMID: 25666654]
  7. J Comput Biol. 2011 Nov;18(11):1371-81 [PMID: 22035290]
  8. BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):510 [PMID: 28155644]
  9. Nucleic Acids Res. 1976 Sep;3(9):2387-98 [PMID: 787937]

Grants

  1. R01 HG012470/NHGRI NIH HHS

MeSH Term

Algorithms
Computational Biology
Humans

Word Cloud

Created with Highcharts 10.0.0TurnpikealgorithmsproblemdistancespointscansolveBeltwaysetapplicationsmolecularnoisyoptimizationobjectivebilevelframeworkefficientlyinstancesextendedscenariossmall-scaleILPOptimizationProblemaimsreconstructone-dimensionalunorderedpairwisearisesbiologicalstructuredeterminationgenomicsequencingtandemmassspectrometryerror-correctingcodesobservationNP-hardtakeexponentialtimespaceusingtraditionaladdressreframelensseekingsimultaneouslyfindunknownpointpermutationmaximizessimilarityinputcorecontributionsuiterobustlynewincludes100000showdomain-specificconstraintsincludeduplicatedmissingpartiallylabeledUsingalsoextendworkdistributedcirclerequireglobaloptimalityformulateintegerlinearprogramacceptsgenericfamilyconvexfunctionsiiusesformulationreducenumberbinaryvariablessyntheticrealpartialdigestdataachievedstate-of-the-artscalabilityacrosschallengingperformancematchesexceedscompetingbaselinesrecoveredground-truthassignmentsproducedreconstructionsmatchexceedalternatingimplementationsavailablehttps://githubcom/Kingsford-Group/turnpikesolvermmApproximateExactAlgorithmsProblemsDuplicatedMissingPartiallyLabeledUncertainMeasurementsDistanceInverse

Similar Articles

Cited By