Data-driven approach for creating synthetic electronic medical records.

Anna L Buczak, Steven Babin, Linda Moniz
Author Information
  1. Anna L Buczak: Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA. Anna.Buczak@jhuapl.edu

Abstract

BACKGROUND: New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed.
METHODS: This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population.
RESULTS: We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified.
CONCLUSIONS: A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.

References

  1. J Am Med Inform Assoc. 2009 May-Jun;16(3):371-9 [PMID: 19261943]
  2. Drug Saf. 2005;28(11):981-1007 [PMID: 16231953]
  3. J Am Med Inform Assoc. 2008 Jul-Aug;15(4):506-12 [PMID: 18436898]
  4. J Am Med Inform Assoc. 2009 May-Jun;16(3):328-37 [PMID: 19261932]
  5. Clin Microbiol Rev. 2002 Oct;15(4):631-46 [PMID: 12364373]
  6. Am J Emerg Med. 1985 Sep;3(5):415-8 [PMID: 4041193]
  7. Stud Health Technol Inform. 2008;138:201-23 [PMID: 18560122]
  8. Eur Arch Otorhinolaryngol. 2003 Jul;260(6):298-300 [PMID: 12883950]
  9. J Biomed Inform. 2005 Apr;38(2):99-113 [PMID: 15797000]
  10. J Am Med Inform Assoc. 2009 Jan-Feb;16(1):18-24 [PMID: 18952940]
  11. Jinrui Idengaku Zasshi. 1977 Mar;21(4):217-37 [PMID: 559814]
  12. Online J Public Health Inform. 2009;1(1): [PMID: 23569572]
  13. Stud Health Technol Inform. 2004;107(Pt 1):212-6 [PMID: 15360805]
  14. JAMA. 2001 Jun 6;285(21):2763-73 [PMID: 11386933]
  15. Emerg Infect Dis. 2008 Jul;14(7):1154-7 [PMID: 18598647]
  16. J Am Med Inform Assoc. 2009 Sep-Oct;16(5):670-82 [PMID: 19567795]
  17. J Clin Epidemiol. 1994 Apr;47(4):419-33 [PMID: 7730867]
  18. J Am Med Inform Assoc. 2009 Nov-Dec;16(6):855-63 [PMID: 19717809]
  19. Proc AMIA Symp. 2001;:164-8 [PMID: 11833477]
  20. AMIA Annu Symp Proc. 2008 Nov 06;:480-4 [PMID: 18998983]
  21. Emerg Infect Dis. 2009 Apr;15(4):533-9 [PMID: 19331728]
  22. Medicine (Baltimore). 1985 Jul;64(4):251-69 [PMID: 3892222]
  23. PLoS One. 2008 Jul 09;3(7):e2626 [PMID: 18612462]
  24. Am J Hyg. 1950 May;51(3):310-8 [PMID: 15413610]
  25. Yearb Med Inform. 2008;:128-44 [PMID: 18660887]
  26. Health Serv Res. 2005 Oct;40(5 Pt 2):1620-39 [PMID: 16178999]
  27. Int J Med Inform. 1999 May;54(2):77-95 [PMID: 10219948]
  28. J Am Med Inform Assoc. 2010 May-Jun;17(3):245-52 [PMID: 20442141]
  29. BMC Med Inform Decis Mak. 2009 Apr 21;9:21 [PMID: 19383138]
  30. BMC Med Inform Decis Mak. 2008 Jul 24;8:32 [PMID: 18652655]

Grants

  1. P01-HK000028-02/PHITPO CDC HHS

MeSH Term

Algorithms
Child
Child, Preschool
Databases as Topic
Disease Outbreaks
Electronic Health Records
Humans
Information Storage and Retrieval
Models, Theoretical
Patient Care
Population Surveillance
Tularemia

Word Cloud

Created with Highcharts 10.0.0syntheticEMRsrecordsbackgroundalgorithmsoutbreakdevelopedmedicalpatientdatamethodinformationgeneratingtularemiapatientsproblemsagefullelectronicnovelEMRcompletecarepatternssimilargeneratedclinicalactivitylaboratoryorders/resultsradiologyValidationexpertrevealedsubsequently4-11grouporderspilotapproachBACKGROUND:NewdiseasedetectiontakeadvantagecontainwealthHoweverdueprivacyconcernsevenanonymizedsharedamongresearchersresultinggreatdifficultycomparingeffectivenessbridgegapbio-surveillanceoperatinglacknon-identifiableMETHODS:paperdescribesmethodologyillnessinterestthreemajorsteps:1identitybasicgeneration2identificationreceivebasedpresentrealhealth3adaptationpopulationRESULTS:includingvisit20319%correctedalso3000yrfewer3%errorsrectifiedCONCLUSIONS:data-drivenfullygeneralcanappliedsetelementsresultsprescriptionmayadaptedinfectiousdiseasesyearoldadaptationsmustmadeproducegroupsindicatedData-drivencreating

Similar Articles

Cited By