Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials.

Lauren D Liao, Emilie Højbjerre-Frandsen, Alan E Hubbard, Alejandro Schuler
Author Information
  1. Lauren D Liao: Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.
  2. Emilie Højbjerre-Frandsen: Biostatistics Methods and Outreach, Novo Nordisk A/S, Bagsvaerd, Denmark.
  3. Alan E Hubbard: Division of Biostatistics, University of California, Berkeley, CA, USA.
  4. Alejandro Schuler: Division of Biostatistics, University of California, Berkeley, CA, USA.

Abstract

Although randomized controlled trials (RCTs) are a cornerstone of comparative effectiveness, they typically have much smaller sample size than observational studies due to financial and ethical considerations. Therefore there is interest in using plentiful historical data (either observational data or prior trials) to reduce trial sizes. Previous estimators developed for this purpose rely on unrealistic assumptions, without which the added data can bias the treatment effect estimate. Recent work proposed an alternative method (prognostic covariate adjustment) that imposes no additional assumptions and increases efficiency in trial analyses. The idea is to use historical data to learn a prognostic model: a regression of the outcome onto the covariates. The predictions from this model, generated from the RCT subjects' baseline variables, are then used as a covariate in a linear regression analysis of the trial data. In this work, we extend prognostic adjustment to trial analyses with nonparametric efficient estimators, which are more powerful than linear regression. We provide theory that explains why prognostic adjustment improves small-sample point estimation and inference without any possibility of bias. Simulations corroborate the theory: efficient estimators using prognostic adjustment compared to without provides greater power (i.e., smaller standard errors) when the trial is small. Population shifts between historical and trial data attenuate benefits but do not introduce bias. We showcase our estimator using clinical trial data provided by Novo Nordisk A/S that evaluates insulin therapy for individuals with type 2 diabetes.

Keywords

References

  1. Bentley, C, Cressman, S, van der Hoek, K, Arts, K, Dancey, J, Peacock, S. Conducting clinical trials – costs, impacts, and the value of clinical trials networks: a scoping review. Clin Trials 2019;16:183–93. https://doi.org/10.1177/1740774518820060 . [DOI: 10.1177/1740774518820060]
  2. Glennerster, R. Chapter 5 – the practicalities of running randomized evaluations: partnerships, measurement, ethics, and transparency. In: Banerjee, AV, Duflo, E, editors. Handbook of economic field experiments . Oxford: Elsevier; 2017, vol 1:175–243 pp.
  3. Temple, R, Ellenberg, SS. Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 1: ethical and scientific issues. Ann Intern Med 2000;133:455–63. https://doi.org/10.7326/0003-4819-133-6-200009190-00014 . [DOI: 10.7326/0003-4819-133-6-200009190-00014]
  4. Wu, P, Luo, S, Geng, Z. On the comparative analysis of average treatment effects estimation via data combination. J Am Stat Assoc 2025;1–12. https://doi.org/10.1080/01621459.2024.2435656 . [DOI: 10.1080/01621459.2024.2435656]
  5. Bareinboim, E, Pearl, J. Causal inference and the data-fusion problem. Proc Natl Acad Sci USA 2016;113:7345–52. https://doi.org/10.1073/pnas.1510507113 . [DOI: 10.1073/pnas.1510507113]
  6. Shi, X, Pan, Z, Miao, W. Data integration in causal inference. Wiley Interdiscip Rev Comput Stat 2023;15. https://doi.org/10.1002/wics.1581 . [DOI: 10.1002/wics.1581]
  7. Colnet, B, Mayer, I, Chen, G, Dieng, A, Li, R, Varoquaux, G, et al.. Causal inference methods for combining randomized trials and observational studies: a review. Stat Sci 2024;39:165–91. https://doi.org/10.1214/23-STS889 . [DOI: 10.1214/23-STS889]
  8. Hill, JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 2011;20:217–40. https://doi.org/10.1198/jcgs.2010.08162 . [DOI: 10.1198/jcgs.2010.08162]
  9. Li, F, Ding, P, Mealli, F. Bayesian causal inference: a critical review. Philos Trans A Math Phys Eng Sci 2023;381:20220153. https://doi.org/10.1098/rsta.2022.0153 . [DOI: 10.1098/rsta.2022.0153]
  10. Huang, M, Egami, N, Hartman, E, Miratrix, L. Leveraging population outcomes to improve the generalization of experimental results: application to the JTPA study. Ann Appl Stat 2023;17:2139–64. https://doi.org/10.1214/22-AOAS1712 . [DOI: 10.1214/22-AOAS1712]
  11. Degtiar, I, Rose, S. A review of generalizability and transportability. Annu Rev Stat Appl. 2023;10:501–24. https://doi.org/10.1146/annurev-statistics-042522-103837 . [DOI: 10.1146/annurev-statistics-042522-103837]
  12. Lee, D, Yang, S, Dong, L, Wang, X, Zeng, D, Cai, J. Improving trial generalizability using observational studies. Biometrics 2021;1213–25. https://doi.org/10.1111/biom.13609 . [DOI: 10.1111/biom.13609]
  13. Li, X, Miao, W, Lu, F, Zhou, XH. Improving efficiency of inference in clinical trials with external control data. Biometrics 2023;79:394–403. https://doi.org/10.1111/biom.13583 . [DOI: 10.1111/biom.13583]
  14. Dang, LE, Tarp, JM, Abrahamsen, TJ, Kvist, K, Buse, JB, Petersen, M, et al.. A cross-validated targeted maximum likelihood estimator for data-adaptive experiment selection applied to the augmentation of RCT control arms with external data. arXiv preprint 2022. https://doi.org/10.48550/arXiv.2210.05802 . [DOI: 10.48550/arXiv.2210.05802]
  15. FDA . Adjusting for covariates in randomized clinical trials for drugs and biological products . Internet: Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research; 2023.
  16. Van Lancker, K, Bretz, F, Dukes, O. The use of covariate adjustment in randomized controlled trials: an overview. arXiv preprint 2023. https://doi.org/10.48550/arXiv.2306.05823 . [DOI: 10.48550/arXiv.2306.05823]
  17. Schuler, A, Walsh, D, Hall, D, Walsh, J, Fisher, C, for the Critical Path for Alzheimer’s Disease ; the Alzheimer’s Disease Neuroimaging Initiative ; the Alzheimer’s Disease Cooperative Study . Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score. Int J Biostat 2022;18:329–56. https://doi.org/10.1515/ijb-2021-0072 . [DOI: 10.1515/ijb-2021-0072]
  18. Holzhauer, B, Adewuyi, ET. “Super-covariates”: using predicted control group outcome as a covariate in randomized clinical trials. Pharm Stat 2023;22:1062–75. https://doi.org/10.1002/pst.2329 . [DOI: 10.1002/pst.2329]
  19. Van Der Laan, MJ, Rubin, D. Targeted maximum likelihood learning. Int J Biostat 2006. https://doi.org/10.2202/1557-4679.1043 . [DOI: 10.2202/1557-4679.1043]
  20. Van der Laan, MJ, Rose, S. Targeted learning: causal inference for observational and experimental data . New York: Springer; 2011, 4.
  21. Diaz, I. Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning. Biostatistics 2020;21:353–8. https://doi.org/10.1093/biostatistics/kxz042 . [DOI: 10.1093/biostatistics/kxz042]
  22. Glynn, AN, Quinn, KM. An introduction to the augmented inverse propensity weighted estimator. Polit Anal 2010;18:36–56. https://doi.org/10.1093/pan/mpp036 . [DOI: 10.1093/pan/mpp036]
  23. Chernozhukov, V, Chetverikov, D, Demirer, M, Duflo, E, Hansen, C, Newey, W, et al.. Double/debiased machine learning for treatment and structural parameters. Econom J 2018;21:C1–68. https://doi.org/10.1111/ectj.12097 . [DOI: 10.1111/ectj.12097]
  24. Rosenblum, M, van der Laan, MJ. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to leverage baseline variables. Int J Biostat 2010;6:13. https://doi.org/10.2202/1557-4679.1138 . [DOI: 10.2202/1557-4679.1138]
  25. Petersen, ML, van der Laan, MJ. Causal models and learning from data: integrating causal modeling and statistical estimation. Epidemiology 2014;25:418–26. https://doi.org/10.1097/ede.0000000000000078 . [DOI: 10.1097/ede.0000000000000078]
  26. Hansen, BB. The prognostic analogue of the propensity score. Biometrika 2008;95:481–8. https://doi.org/10.1093/biomet/asn004 . [DOI: 10.1093/biomet/asn004]
  27. Schuler, A. Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators. Int J Biostat 2021;18:151–71. https://doi.org/10.1515/ijb-2021-0039 . [DOI: 10.1515/ijb-2021-0039]
  28. van der Laan, MJ, Polley, EC, Hubbard, AE. Super learner. Stat Appl Genet Mol Biol 2007;6:25. https://doi.org/10.2202/1544-6115.1309 . [DOI: 10.2202/1544-6115.1309]
  29. Polley, EC, van der Laan, MJ. Super learner in prediction. U.C. Berkeley Division of Biostatistics Working Paper Series 2010; Working Paper 266. https://biostats.bepress.com/ucbbiostat/paper266 .
  30. Rothe C. Flexible covariate adjustments in randomized experiments; 2018. Available from: https://madoc.bib.uni-mannheim.de/52249/ .
  31. Moore, KL, van der Laan, MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009;28:39–64. https://doi.org/10.1002/sim.3445 . [DOI: 10.1002/sim.3445]
  32. Schuler, A, van der Laan, M. Introduction to modern causal inference. 2022. https://alejandroschuler.github.io/mci [Accessed 9 12 2023].
  33. Kennedy, EH. Semiparametric doubly robust targeted double machine learning: a review. In: Handbook of statistical methods for precision medicine. Boca Raton, FL: Chapman & Hall; 2024: 207–36 pp. https://doi.org/10.48550/arXiv.2203.06469 . [DOI: 10.48550/arXiv.2203.06469]
  34. Hahn, J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 1998:315–31. https://doi.org/10.2307/2998560 . [DOI: 10.2307/2998560]
  35. Chakrabortty, A, Dai, G, Tchetgen, ET. A general framework for treatment effect estimation in semi-supervised and high dimensional settings. arXiv preprint 2022. https://doi.org/10.48550/arXiv.2201.00468 . [DOI: 10.48550/arXiv.2201.00468]
  36. White, H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 1980;48:817–38. https://doi.org/10.2307/1912934 . [DOI: 10.2307/1912934]
  37. MacKinnon, JG, White, H. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. J Econom 1985;29:305–25. https://doi.org/10.1016/0304-4076(85)90158-7 . [DOI: 10.1016/0304-4076(85)90158-7]
  38. Long, JS, Ervin, LH. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician 2000;54:217–24. https://doi.org/10.1080/00031305.2000.10474549 . [DOI: 10.1080/00031305.2000.10474549]
  39. Chen, T, Guestrin, C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16 . New York, NY, USA: Association for Computing Machinery; 2016:785–94 pp.
  40. Friedman, JH. Multivariate adaptive regression Splines. aos 1991;19:1–67. https://doi.org/10.1214/aos/1176347963 . [DOI: 10.1214/aos/1176347963]
  41. gov, C. A clinical trial comparing glycaemic control and safety of insulin degludec/liraglutide (IDegLira) versus insulin glargine (IGlar) as add-on therapy to SGLT2i in subjects with type 2 diabetes mellitus (DUAL TM IX). 2020. Available from: https://clinicaltrials.gov/study/NCT02773368?cond=DUALTMAccessed:2023-9-25 .
  42. Breiman, L. Random forests. Mach Learn 2001;45:5–32.
  43. Torrey, L, Shavlik, J. Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques . Hershey, PA: IGI Global; 2010:242–64 pp.
  44. Zhuang, F, Qi, Z, Duan, K, Xi, D, Zhu, Y, Zhu, H, et al.. A comprehensive survey on transfer learning. Proc IEEE 2021;109:43–76. https://doi.org/10.1109/jproc.2020.3004555 . [DOI: 10.1109/jproc.2020.3004555]
  45. Weiss, K, Khoshgoftaar, TM, Wang, D. A survey of transfer learning. J Big Data 2016;3:1–40. https://doi.org/10.1186/s40537-016-0043-6 . [DOI: 10.1186/s40537-016-0043-6]

Word Cloud

Created with Highcharts 10.0.0datatrialprognostichistoricaladjustmenttrialsestimatorsrandomizedusingwithoutbiasregressionefficientsmallerobservationalassumptionsworkcovariateanalyseslinearinferencediabetesAlthoughcontrolledRCTscornerstonecomparativeeffectivenesstypicallymuchsamplesizestudiesduefinancialethicalconsiderationsThereforeinterestplentifuleitherpriorreducesizesPreviousdevelopedpurposerelyunrealisticaddedcantreatmenteffectestimateRecentproposedalternativemethodimposesadditionalincreasesefficiencyideauselearnmodel:outcomeontocovariatespredictionsmodelgeneratedRCTsubjects'baselinevariablesusedanalysisextendnonparametricpowerfulprovidetheoryexplainsimprovessmall-samplepointestimationpossibilitySimulationscorroboratetheory:comparedprovidesgreaterpoweriestandarderrorssmallPopulationshiftsattenuatebenefitsintroduceshowcaseestimatorclinicalprovidedNovoNordiskA/Sevaluatesinsulintherapyindividualstype2Prognosticunbiasedlyleveragecausalscore

Similar Articles

Cited By

No available data.