Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials.

Yohan Kim, Scott Kelly, Deepu Krishnan, Jay Falletta, Kerryn Wilmot
Author Information
  1. Yohan Kim: Institute for Sustainable Futures, University of Technology Sydney, 235 Jones Street, Ultimo, NSW 2007, Australia. ORCID
  2. Scott Kelly: Institute for Sustainable Futures, University of Technology Sydney, 235 Jones Street, Ultimo, NSW 2007, Australia. ORCID
  3. Deepu Krishnan: Institute for Sustainable Futures, University of Technology Sydney, 235 Jones Street, Ultimo, NSW 2007, Australia.
  4. Jay Falletta: Institute for Sustainable Futures, University of Technology Sydney, 235 Jones Street, Ultimo, NSW 2007, Australia.
  5. Kerryn Wilmot: Institute for Sustainable Futures, University of Technology Sydney, 235 Jones Street, Ultimo, NSW 2007, Australia. ORCID

Abstract

Time series data collected in clinical trials can have varying degrees of missingness, adding challenges during statistical analyses. An additional layer of complexity is introduced for missing data in randomized controlled trials (RCT), where researchers must remain blinded between intervention and control groups. Such restriction severely limits the applicability of conventional imputation methods that would utilize other participants' data for improved performance. This paper explores and compares various methods to impute high-resolution temperature logger data in RCT settings. In addition to the conventional non-parametric approaches, we propose a spline regression (SR) approach that captures the dynamics of indoor temperature by time of day that is unique to each participant. We investigate how the inclusion of external temperature and energy use can improve the model performance. Results show that SR imputation results in 16% smaller root mean squared error (RMSE) compared to conventional imputation methods, with the gap widening to 22% when more than half of data is missing. The SR method is particularly useful in cases where missingness occurs simultaneously for multiple participants, such as concurrent battery failures. We demonstrate how proper modelling of periodic dynamics can lead to significantly improved imputation performance, even with limited data.

Keywords

References

  1. Pattern Recognit Lett. 2018 Dec 1;116:88-96 [PMID: 30416234]
  2. Int J Environ Res Public Health. 2018 Dec 04;15(12): [PMID: 30518164]
  3. Eval Rev. 2003 Feb;27(1):79-103 [PMID: 12568061]
  4. Behav Res Methods. 2014 Dec;46(4):1138-48 [PMID: 24515888]
  5. Indoor Air. 2014 Feb;24(1):103-12 [PMID: 23710826]
  6. Sci Rep. 2018 Apr 17;8(1):6085 [PMID: 29666385]
  7. Biometrics. 2001 Mar;57(1):22-33 [PMID: 11252602]

MeSH Term

Humans
Randomized Controlled Trials as Topic
Research Design
Time Factors

Word Cloud

Created with Highcharts 10.0.0dataimputationtrialscanconventionalmethodsperformancetemperatureSRmissingnessmissingrandomizedcontrolledRCTimproveddynamicsTimeseriescollectedclinicalvaryingdegreesaddingchallengesstatisticalanalysesadditionallayercomplexityintroducedresearchersmustremainblindedinterventioncontrolgroupsrestrictionseverelylimitsapplicabilityutilizeparticipants'paperexplorescomparesvariousimputehigh-resolutionloggersettingsadditionnon-parametricapproachesproposesplineregressionapproachcapturesindoortimedayuniqueparticipantinvestigateinclusionexternalenergyuseimprovemodelResultsshowresults16%smallerrootmeansquarederrorRMSEcomparedgapwidening22%halfmethodparticularlyusefulcasesoccurssimultaneouslymultipleparticipantsconcurrentbatteryfailuresdemonstratepropermodellingperiodicleadsignificantlyevenlimitedStrategiesImputationHigh-ResolutionEnvironmentalDataClinicalRandomizedControlledTrialsmachinelearningspline-regressionthermalcomfort

Similar Articles

Cited By (1)