R Package imputeTestbench to Compare Imputation Methods for Univariate Time Series.

Marcus W Beck, Neeraj Bokde, Gualberto Asencio-Cortés, Kishore Kulat
Author Information
  1. Marcus W Beck: USEPA National Health and Environmental Effects Research Laboratory, Gulf Ecology Division, 1 Sabine Island Drive, Gulf Breeze, FL 32651, USA.
  2. Neeraj Bokde: Visvesvaraya National Institute of Technology, Nagpur, North Ambazari Road, Nagpur, India.
  3. Gualberto Asencio-Cortés: Universidad Pablo de Olavide, ES-41013, Sevilla Spain.
  4. Kishore Kulat: Visvesvaraya National Institute of Technology, Nagpur, North Ambazari Road, Nagpur, Indias.

Abstract

Missing observations are common in time series data and several methods are available to impute these values prior to analysis. Variation in statistical characteristics of univariate time series can have a profound effect on characteristics of missing observations and, therefore, the accuracy of different imputation methods. The package can be used to compare the prediction accuracy of different methods as related to the amount and type of missing data for a user-supplied dataset. Missing data are simulated by removing observations completely at random or in blocks of different sizes depending on characteristics of the data. Several imputation algorithms are included with the package that vary from simple replacement with means to more complex interpolation methods. The testbench is not limited to the default functions and users can add or remove methods as needed. Plotting functions also allow comparative visualization of the behavior and effectiveness of different algorithms. We present example applications that demonstrate how the package can be used to understand differences in prediction accuracy between methods as affected by characteristics of a dataset and the nature of missing data.

References

  1. Psychol Methods. 2002 Jun;7(2):147-77 [PMID: 12090408]
  2. J Clin Epidemiol. 2006 Oct;59(10):1087-91 [PMID: 16980149]
  3. BMC Bioinformatics. 2007 Mar 29;8:109 [PMID: 17394658]
  4. BMC Med Res Methodol. 2013 Nov 20;13:144 [PMID: 24252653]
  5. Comput Intell Neurosci. 2015;2015:364089 [PMID: 25866501]
  6. BMC Genomics. 2015;16 Suppl 9:S1 [PMID: 26330180]

Grants

  1. EPA999999/Intramural EPA

Word Cloud

Created with Highcharts 10.0.0methodsdatacharacteristicscandifferentobservationsmissingaccuracypackageMissingtimeseriesimputationusedpredictiondatasetalgorithmsfunctionscommonseveralavailableimputevaluesprioranalysisVariationstatisticalunivariateprofoundeffectthereforecomparerelatedamounttypeuser-suppliedsimulatedremovingcompletelyrandomblockssizesdependingSeveralincludedvarysimplereplacementmeanscomplexinterpolationtestbenchlimiteddefaultusersaddremoveneededPlottingalsoallowcomparativevisualizationbehavioreffectivenesspresentexampleapplicationsdemonstrateunderstanddifferencesaffectednatureRPackageimputeTestbenchCompareImputationMethodsUnivariateTimeSeries

Similar Articles

Cited By