Missing data in amortized simulation-based neural posterior estimation.

Zijian Wang, Jan Hasenauer, Yannik Sch��lte
Author Information
  1. Zijian Wang: University of Bonn, Life and Medical Sciences Institute, Bonn, Germany. ORCID
  2. Jan Hasenauer: University of Bonn, Life and Medical Sciences Institute, Bonn, Germany.
  3. Yannik Sch��lte: University of Bonn, Life and Medical Sciences Institute, Bonn, Germany. ORCID

Abstract

Amortized simulation-based neural posterior estimation provides a novel machine learning based approach for solving parameter estimation problems. It has been shown to be computationally efficient and able to handle complex models and data sets. Yet, the available approach cannot handle the in experimental studies ubiquitous case of missing data, and might provide incorrect posterior estimates. In this work, we discuss various ways of encoding missing data and integrate them into the training and inference process. We implement the approaches in the BayesFlow methodology, an amortized estimation framework based on invertible neural networks, and evaluate their performance on multiple test problems. We find that an approach in which the data vector is augmented with binary indicators of presence or absence of values performs the most robustly. Indeed, it improved the performance also for the simpler problem of data sets with variable length. Accordingly, we demonstrate that amortized simulation-based inference approaches are applicable even with missing data, and we provide a guideline for their handling, which is relevant for a broad spectrum of applications.

References

  1. Science. 2002 Mar 1;295(5560):1662-4 [PMID: 11872829]
  2. Bioinformatics. 2020 Jul 1;36(Suppl_1):i551-i559 [PMID: 32657404]
  3. Cell Syst. 2017 Feb 22;4(2):194-206.e9 [PMID: 28089542]
  4. Epidemics. 2021 Mar;34:100439 [PMID: 33556763]
  5. IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1452-1466 [PMID: 33338021]
  6. PLoS One. 2013 Sep 30;8(9):e74335 [PMID: 24098642]
  7. Genetics. 1997 Feb;145(2):505-18 [PMID: 9071603]
  8. Biophys J. 2003 Jan;84(1):69-81 [PMID: 12524266]
  9. Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16732-16738 [PMID: 32616574]
  10. Mol Biol Evol. 1999 Dec;16(12):1791-8 [PMID: 10605120]
  11. BMC Bioinformatics. 2019 Feb 15;20(1):82 [PMID: 30770736]
  12. Neural Comput. 2019 Jul;31(7):1235-1270 [PMID: 31113301]
  13. Bioinformatics. 2017 Mar 1;33(5):718-725 [PMID: 28062444]
  14. Sensors (Basel). 2020 Sep 05;20(18): [PMID: 32899485]
  15. Curr Biol. 2004 Sep 21;14(18):R771-7 [PMID: 15380091]

MeSH Term

Computer Simulation
Neural Networks, Computer
Computational Biology
Humans
Machine Learning
Bayes Theorem
Algorithms
Data Interpretation, Statistical

Word Cloud

Created with Highcharts 10.0.0dataestimationsimulation-basedneuralposteriorapproachmissingamortizedbasedproblemshandlesetsprovideinferenceapproachesperformanceAmortizedprovidesnovelmachinelearningsolvingparametershowncomputationallyefficientablecomplexmodelsYetavailableexperimentalstudiesubiquitouscasemightincorrectestimatesworkdiscussvariouswaysencodingintegratetrainingprocessimplementBayesFlowmethodologyframeworkinvertiblenetworksevaluatemultipletestfindvectoraugmentedbinaryindicatorspresenceabsencevaluesperformsrobustlyIndeedimprovedalsosimplerproblemvariablelengthAccordinglydemonstrateapplicableevenguidelinehandlingrelevantbroadspectrumapplicationsMissing

Similar Articles

Cited By