Multiple imputation for longitudinal data using Bayesian lasso imputation model.

Yusuke Yamaguchi, Satoshi Yoshida, Toshihiro Misumi, Kazushi Maruo
Author Information
  1. Yusuke Yamaguchi: Data Science, Development, Astellas Pharma Inc., Tokyo, Japan. ORCID
  2. Satoshi Yoshida: Data Science, Development, Astellas Pharma Inc., Tokyo, Japan.
  3. Toshihiro Misumi: Department of Biostatistics, School of Medicine, Yokohama City University, Yokohama, Japan.
  4. Kazushi Maruo: Department of Biostatistics, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan. ORCID

Abstract

Multiple imputation is a promising approach to handle missing data and is widely used in analysis of longitudinal clinical studies. A key consideration in the implementation of multiple imputation is to obtain accurate imputed values by specifying an imputation model that incorporates auxiliary variables potentially associated with missing variables. The use of informative auxiliary variables is known to be beneficial to make the missing at random assumption more plausible and help to reduce uncertainty of the imputations; however, it is not straightforward to pre-specify them in many cases. We propose a data-driven specification of the imputation model using Bayesian lasso in the context of longitudinal clinical study, and develop a built-in function of the Bayesian lasso imputation model which is performed within the framework of multiple imputation using chained equations. A simulation study suggested that the Bayesian lasso imputation model worked well in a variety of longitudinal study settings, providing unbiased treatment effect estimates with well-controlled type I error rates and coverage probabilities of the confidence interval; in contrast, ignorance of the informative auxiliary variables led to serious bias and inflation of type I error rate. Moreover, the Bayesian lasso imputation model offered higher statistical powers compared with conventional imputation methods. In our simulation study, the gains in statistical power were remarkable when the sample size was small relative to the number of auxiliary variables. An illustration through a real example also suggested that the Bayesian lasso imputation model could give smaller standard errors of the treatment effect estimate.

Keywords

References

  1. Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York, NY: Wiley; 1987.
  2. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. New York, NY: John Wiley & Sons; 2002.
  3. Mallinckrodt CH. Preventing and Treating Missing Data in Longitudinal Clinical Trials: A Practical Guide: Practical Guides to Biostatistics and Epidemiology. Cambridge, MA: Cambridge University Press; 2013.
  4. National Research Council. The Prevention and Treatment of Missing Data in Clinical Trials. Panel on Handling Missing Data in Clinical Trials. Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press; 2010.
  5. Little RJ, D'Agostino R, Cohen ML, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012;367(14):1355-1360.
  6. Meyer RD, Ratitch B, Wolbers M, et al. Statistical issues and recommendations for clinical trials conducted during the COVID-19 pandemic. Stat Biopharm Res. 2020;12(4):399-411.
  7. Sullivan TR, White IR, Salter AB, Ryan P, Lee KJ. Should multiple imputation be the method of choice for handling missing data in randomized trials? Stat Methods Med Res. 2018;27(9):2610-2626.
  8. White IR, Carpenter J, Horton NJ. Including all individuals is not enough: lessons for intention-to-treat analysis. Clin Trials. 2012;9(4):396-407.
  9. Molenberghs G, Kenward MG. Missing Data in Clinical Studies. Chichester, UK: John Wiley & Sons; 2007.
  10. Van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1-67.
  11. Van Buuren S. Flexible Imputation of Missing Data. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC Press; 2018.
  12. Meng XL. Multiple imputation with uncongenial sources of input. Stat Sci. 1994;9(4):538-558.
  13. Schafer JL. Analysis of Incomplete Multivariate Data. New York, NY: Chapman & Hall; 1997.
  14. Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6(4):330-351.
  15. White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377-399.
  16. Mustillo S. The effects of auxiliary variables on coefficient bias and efficiency in multiple imputation. Sociol Methods Res. 2012;41:335-361.
  17. Thoemmes F, Rose N. A cautious note on auxiliary variables that can increase bias in missing data problems. Multivar Behav Res. 2014;49(5):443-459.
  18. Hardt J, Herke M, Leonhart R. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research. BMC Med Res Methodol. 2012;12:184.
  19. Madley-Dowd P, Hughes R, Tilling K, Heron J. The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol. 2019;110:63-73.
  20. Park T, Casella G. The Bayesian lasso. J Am Stat Assoc. 2008;103(482):681-686.
  21. Hans C. Bayesian lasso regression. Biometrika. 2009;96(4):835-845.
  22. Hans C. Model uncertainty and variable selection in Bayesian lasso regression. Stat Comput. 2010;20(2):221-229.
  23. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol. 1996;58(1):267-288.
  24. Zhao Y, Long Q. Multiple imputation in the presence of high-dimensional data. Stat Methods Med Res. 2016;25(5):2021-2035.
  25. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67(2):301-320.
  26. Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418-1429.
  27. Vickers AJ, Rees RW, Zollman CE, et al. Acupuncture for chronic headache in primary care: large, pragmatic, randomised trial. BMJ. 2004;328(7442):744.
  28. Vickers AJ. Whose data set is it anyway? Sharing raw data from randomized trials. Trials. 2006;7:15.
  29. Van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16(3):219-242.
  30. Yamaguchi Y, Ueno M, Maruo K, Gosho M. Multiple imputation for longitudinal data in the presence of heteroscedasticity between treatment groups. J Biopharm Stat. 2020;30(1):178-196.
  31. Ratitch B. Multiple imputation. In: O'Kelly M, Ratitch B, eds. Clinical Trials with Missing Data: A Guide for Practitioners. Hoboken, NJ: Wiley; 2014.
  32. Gramacy RB. monomvn: estimation for MVN and Student-t data with monotone missingness. Version 1.9-13; 2019. https://cran.r-project.org/web/packages/monomvn/. Accessed October 17, 2021.
  33. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074-2102.
  34. Mallinckrodt CH, Clark WS, David SR. Accounting for dropout bias using mixed-effects models. J Biopharm Stat. 2001;11(1-2):9-21.
  35. Leng C, Tran M-N, Nott D. Bayesian adaptive lasso. Ann Inst Stat Math. 2014;66(2):221-244.
  36. Xu X, Ghosh M. Bayesian variable selection and estimation for group lasso. Bayesian Anal. 2015;10(4):909-936.
  37. Burgette LF, Reiter JP. Multiple imputation for missing data via sequential regression trees. Am J Epidemiol. 2010;172(9):1070-1076.
  38. Doove LL, Van Buuren S, Dusseldorp E. Recursive partitioning for missing data imputation in the presence of interaction effects. Comput Stat Data Anal. 2014;72:92-104.
  39. Stekhoven DJ, Bühlmann P. MissForest: non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112-118.
  40. Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am J Epidemiol. 2014;179(6):764-774.
  41. Deng Y, Chang C, Ido MS, Qi L. Multiple imputation for general missing data patterns in the presence of high-dimensional data. Sci Rep. 2016;6:21689.
  42. Tang F, Ishwaran H. Random forest missing data algorithms. Stat Anal Data Min. 2017;10(6):363-377.
  43. Yamaguchi Y, Misumi T, Maruo K. A comparison of multiple imputation methods for incomplete longitudinal binary data. J Biopharm Stat. 2018;28(4):654-667.
  44. Biswas S, Lin S. Logistic Bayesian LASSO for identifying association with rare haplotypes and application to age-related macular degeneration. Biometrics. 2012;68(2):587-597.
  45. International Council for Harmonisation. Addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials E9(R1); November 2019. https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf. Accessed October 17, 2021.

MeSH Term

Bayes Theorem
Bias
Computer Simulation
Data Interpretation, Statistical
Humans
Longitudinal Studies
Models, Statistical

Word Cloud

Created with Highcharts 10.0.0imputationmodelBayesianlassolongitudinalvariablesstudymissingauxiliarydataclinicalmultipleusingMultipleinformativesimulationsuggestedtreatmenteffecttypeerrorstatisticalpromisingapproachhandlewidelyusedanalysisstudieskeyconsiderationimplementationobtainaccurateimputedvaluesspecifyingincorporatespotentiallyassociateduseknownbeneficialmakerandomassumptionplausiblehelpreduceuncertaintyimputationshoweverstraightforwardpre-specifymanycasesproposedata-drivenspecificationcontextdevelopbuilt-infunctionperformedwithinframeworkchainedequationsworkedwellvarietysettingsprovidingunbiasedestimateswell-controlledratescoverageprobabilitiesconfidenceintervalcontrastignoranceledseriousbiasinflationrateMoreoverofferedhigherpowerscomparedconventionalmethodsgainspowerremarkablesamplesizesmallrelativenumberillustrationrealexamplealsogivesmallerstandarderrorsestimate

Similar Articles

Cited By