Building test data from real outbreaks for evaluating detection algorithms.

Gaetan Texier, Michael L Jackson, Leonel Siwe, Jean-Baptiste Meynard, Xavier Deparis, Herve Chaudet
Author Information
  1. Gaetan Texier: Pasteur Center in Cameroun, Yaoundé, Cameroun.
  2. Michael L Jackson: Group Health Research Institute, Seattle, United States of America.
  3. Leonel Siwe: Sub-Regional Institute of Statistics and Applied Economics (ISSEA), Yaoundé, Cameroun.
  4. Jean-Baptiste Meynard: French Armed Forces Center for Epidemiology and Public Health (CESPA), Camp de Sainte Marthe, Marseille, France.
  5. Xavier Deparis: French Armed Forces Center for Epidemiology and Public Health (CESPA), Camp de Sainte Marthe, Marseille, France.
  6. Herve Chaudet: UMR 912 / SESSTIM - INSERM/IRD/Aix-Marseille University / Faculty of Medicine - 27, Bd Jean Moulin, Marseille, France.

Abstract

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.

References

  1. Science. 1994 Nov 18;266(5188):1202-8 [PMID: 7973702]
  2. MMWR Morb Mortal Wkly Rep. 1994 Jul 1;43(25):463-5 [PMID: 8208236]
  3. Am J Trop Med Hyg. 2007 Jun;76(6):1182-8 [PMID: 17556633]
  4. BMC Med Inform Decis Mak. 2013 Jan 23;13:12 [PMID: 23343523]
  5. Emerg Themes Epidemiol. 2007 May 11;4:2 [PMID: 17466070]
  6. MMWR Suppl. 2004 Sep 24;53:130-6 [PMID: 15714642]
  7. Clin Microbiol Infect. 2013 Nov;19(11):993-8 [PMID: 23879334]
  8. Science. 2011 Jan 14;331(6014):144-5; author reply 145-7 [PMID: 21233365]
  9. Biostatistics. 2006 Jul;7(3):422-37 [PMID: 16407470]
  10. Proc Natl Acad Sci U S A. 2003 Feb 18;100(4):1961-5 [PMID: 12574522]
  11. MMWR Recomm Rep. 2004 May 7;53(RR-5):1-11 [PMID: 15129191]
  12. PLoS Med. 2007 Jun;4(6):e210 [PMID: 17593895]
  13. MMWR Morb Mortal Wkly Rep. 2000 Mar 17;49(10):207-11 [PMID: 10738840]
  14. Epidemiol Infect. 2008 May;136(5):679-87 [PMID: 17655783]
  15. Growth. 1954 Sep;18(3):137-43 [PMID: 13200865]
  16. PLoS One. 2013 Jun 27;8(6):e67164 [PMID: 23826222]
  17. PLoS One. 2016 Aug 11;11(8):e0160759 [PMID: 27513749]
  18. BMC Med Inform Decis Mak. 2007 Mar 01;7:6 [PMID: 17331250]
  19. J Clin Epidemiol. 1992 Oct;45(10):1071-80 [PMID: 1474403]
  20. J Biomed Inform. 2007 Aug;40(4):370-9 [PMID: 17095301]
  21. J Biomed Inform. 2005 Apr;38(2):99-113 [PMID: 15797000]
  22. Am J Epidemiol. 1966 Mar;83(2):204-6 [PMID: 5930773]
  23. MMWR Morb Mortal Wkly Rep. 2002 Dec 13;51(49):1112-5 [PMID: 12530708]
  24. Front Microbiol. 2011 Feb 11;2:25 [PMID: 21687417]

MeSH Term

Algorithms
Computer Simulation
Disease Outbreaks
Humans
Models, Statistical
Population Surveillance
Probability
Reproducibility of Results
Statistics as Topic
Treatment Outcome

Word Cloud

Created with Highcharts 10.0.0outbreaksimulationoutbreaksdatasignalssurveillanceGibbsalgorithmsdaysnumbercasesrealisticsufficientshaperangedurationdistributionfacedsystemevaluateapproachhistoricalresamplingMetropolis-HastingSampleranalysisimportantparametersqualityconfirmsusedieoverallscalefactorsimulateddependencyBenchmarkingsystemsrequiressimulationsdiseaseHoweverobtainingquantitycoveringagentssizeknowndifficultdatasetgeneratedreflectlikelyauthenticsituationsincludingunlikelyproposenewbasedusesimulatetailoredmethodrelieshomothetictransformationfollowedprocessesBinomialInverseTransformSamplingMethod-ITSMRandomWalkIndependentHybridcarriedidentifyinputperformanceinfluencetypealgorithmresultsshowregardlessmetricschosenevaluationdecreasedincreaseincreasedSimulatingfewerless1resultedlossinformationfoundsamplingshrinkageprocedureprovidesgoodbalanceaccuracylittleimportancebinomialITSMmethodsaccurateGivenconstraintkeepingwithinplausibleepidemiologicalcurvesstudycangeneratelargespectrumBuildingtestrealevaluatingdetection

Similar Articles

Cited By