Simulation-based inference for non-parametric statistical comparison of biomolecule dynamics.

Hippolyte Verdier, François Laurent, Alhassan Cassé, Christian L Vestergaard, Christian G Specht, Jean-Baptiste Masson
Author Information
  1. Hippolyte Verdier: Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France. ORCID
  2. François Laurent: Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France. ORCID
  3. Alhassan Cassé: Histopathology and Bio-Imaging Group, Sanofi, R&D, Vitry-Sur-Seine, France.
  4. Christian L Vestergaard: Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France. ORCID
  5. Christian G Specht: Diseases and Hormones of the Nervous System (DHNS), Inserm U1195, Université Paris-Saclay, Paris, France. ORCID
  6. Jean-Baptiste Masson: Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France. ORCID

Abstract

Numerous models have been developed to account for the complex properties of the random walks of biomolecules. However, when analysing experimental data, conditions are rarely met to ensure model identification. The dynamics may simultaneously be influenced by spatial and temporal heterogeneities of the environment, out-of-equilibrium fluxes and conformal changes of the tracked molecules. Recorded trajectories are often too short to reliably discern such multi-scale dynamics, which precludes unambiguous assessment of the type of random walk and its parameters. Furthermore, the motion of biomolecules may not be well described by a single, canonical random walk model. Here, we develop a two-step statistical testing scheme for comparing biomolecule dynamics observed in different experimental conditions without having to identify or make strong prior assumptions about the model generating the recorded random walks. We first train a graph neural network to perform simulation-based inference and thus learn a rich summary statistics vector describing individual trajectories. We then compare trajectories obtained in different biological conditions using a non-parametric maximum mean discrepancy (MMD) statistical test on their so-obtained summary statistics. This procedure allows us to characterise sets of random walks regardless of their generating models, without resorting to model-specific physical quantities or estimators. We first validate the relevance of our approach on numerically simulated trajectories. This demonstrates both the statistical power of the MMD test and the descriptive power of the learnt summary statistics compared to estimates of physical quantities. We then illustrate the ability of our framework to detect changes in α-synuclein dynamics at synapses in cultured cortical neurons, in response to membrane depolarisation, and show that detected differences are largely driven by increased protein mobility in the depolarised state, in agreement with previous findings. The method provides a means of interpreting the differences it detects in terms of single trajectory characteristics. Finally, we emphasise the interest of performing various comparisons to probe the heterogeneity of experimentally acquired datasets at different levels of granularity (e.g., biological replicates, fields of view, and organelles).

References

  1. Proc Natl Acad Sci U S A. 2014 Feb 25;111(8):2931-6 [PMID: 24516153]
  2. Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Aug;66(2 Pt 1):021114 [PMID: 12241157]
  3. Phys Rev Lett. 2007 Oct 19;99(16):160602 [PMID: 17995231]
  4. Phys Rev E. 2017 Aug;96(2-1):022144 [PMID: 28950648]
  5. PeerJ. 2014 Jun 19;2:e453 [PMID: 25024921]
  6. Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Feb;89(2):022726 [PMID: 25353527]
  7. Opt Express. 2012 Jan 30;20(3):2081-95 [PMID: 22330449]
  8. Phys Chem Chem Phys. 2014 Aug 14;16(30):15811-7 [PMID: 24968336]
  9. PLoS Comput Biol. 2009 Nov;5(11):e1000556 [PMID: 19893741]
  10. Neural Comput. 2021 Oct 12;33(11):2881-2907 [PMID: 34474477]
  11. Proc Natl Acad Sci U S A. 2021 Apr 6;118(14): [PMID: 33790018]
  12. Phys Rev E. 2022 Nov;106(5-2):055311 [PMID: 36559393]
  13. Nature. 2015 May 28;521(7553):436-44 [PMID: 26017442]
  14. Phys Biol. 2019 Nov 25;17(1):015003 [PMID: 31765328]
  15. Anal Chem. 2015 Apr 21;87(8):4100-3 [PMID: 25855499]
  16. Nat Commun. 2021 Oct 29;12(1):6253 [PMID: 34716305]
  17. Phys Rev Lett. 2011 Dec 23;107(26):260601 [PMID: 22243146]
  18. Phys Chem Chem Phys. 2014 Nov 28;16(44):24128-64 [PMID: 25297814]
  19. Phys Rev Lett. 2009 Oct 30;103(18):180602 [PMID: 19905793]
  20. J Neurosci. 2005 Nov 23;25(47):10913-21 [PMID: 16306404]
  21. Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30055-30062 [PMID: 32471948]
  22. Nat Methods. 2012 Jun 10;9(7):724-6 [PMID: 22688415]
  23. Med Image Anal. 1998 Jun;2(2):143-68 [PMID: 10646760]
  24. Nat Methods. 2015 Sep;12(9):838-40 [PMID: 26192083]
  25. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1994 Jun;49(6):4873-4877 [PMID: 9961805]
  26. J Neurosci. 2014 Feb 5;34(6):2037-50 [PMID: 24501346]
  27. Biophys J. 2010 Apr 7;98(7):1364-72 [PMID: 20371337]
  28. Science. 2015 Dec 11;350(6266):1332-8 [PMID: 26659050]
  29. Chem Rev. 2017 Jun 14;117(11):7276-7330 [PMID: 28414216]
  30. Front Synaptic Neurosci. 2021 Oct 22;13:753462 [PMID: 34744680]

MeSH Term

Computer Simulation
Motion
Neural Networks, Computer
Proteins

Chemicals

Proteins

Word Cloud

Created with Highcharts 10.0.0randomdynamicstrajectoriesstatisticalwalksconditionsmodeldifferentsummarystatisticsmodelsbiomoleculesexperimentalmaychangeswalksinglebiomoleculewithoutgeneratingfirstinferencebiologicalnon-parametricMMDtestphysicalquantitiespowerdifferencesNumerousdevelopedaccountcomplexpropertiesHoweveranalysingdatararelymetensureidentificationsimultaneouslyinfluencedspatialtemporalheterogeneitiesenvironmentout-of-equilibriumfluxesconformaltrackedmoleculesRecordedoftenshortreliablydiscernmulti-scaleprecludesunambiguousassessmenttypeparametersFurthermoremotionwelldescribedcanonicaldeveloptwo-steptestingschemecomparingobservedidentifymakestrongpriorassumptionsrecordedtraingraphneuralnetworkperformsimulation-basedthuslearnrichvectordescribingindividualcompareobtainedusingmaximummeandiscrepancyso-obtainedprocedureallowsuscharacterisesetsregardlessresortingmodel-specificestimatorsvalidaterelevanceapproachnumericallysimulateddemonstratesdescriptivelearntcomparedestimatesillustrateabilityframeworkdetectα-synucleinsynapsesculturedcorticalneuronsresponsemembranedepolarisationshowdetectedlargelydrivenincreasedproteinmobilitydepolarisedstateagreementpreviousfindingsmethodprovidesmeansinterpretingdetectstermstrajectorycharacteristicsFinallyemphasiseinterestperformingvariouscomparisonsprobeheterogeneityexperimentallyacquireddatasetslevelsgranularityegreplicatesfieldsvieworganellesSimulation-basedcomparison

Similar Articles

Cited By (2)