Estimation of Heterogeneous Restricted Mean Survival Time Using Random Forest.

Mingyang Liu, Hongzhe Li
Author Information
  1. Mingyang Liu: Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
  2. Hongzhe Li: Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.

Abstract

Estimation and prediction of heterogeneous restricted mean survival time (hRMST) is of great clinical importance, which can provide an easily interpretable and clinically meaningful summary of the survival function in the presence of censoring and individual covariates. The existing methods for the modeling of hRMST rely on proportional hazards or other parametric assumptions on the survival distribution. In this paper, we propose a random forest based estimation of hRMST for right-censored survival data with covariates and prove a central limit theorem for the resulting estimator. In addition, we present a computationally efficient construction for the confidence interval of hRMST. Our simulations show that the resulting confidence intervals have the correct coverage probability of the hRMST, and the random forest based estimate of hRMST has smaller prediction errors than the parametric models when the models are mis-specified. We apply the method to the ovarian cancer data set from The Cancer Genome Atlas (TCGA) project to predict hRMST and show an improved prediction performance over the existing methods. A software implementation, srf using R and C++, is available at https://github.com/lmy1019/SRF.

Keywords

References

  1. J Am Stat Assoc. 2019;114(525):370-383 [PMID: 31190691]
  2. Biometrics. 2001 Dec;57(4):1030-8 [PMID: 11764241]
  3. Biometrics. 2011 Sep;67(3):740-9 [PMID: 21039400]
  4. BMC Med Res Methodol. 2013 Dec 07;13:152 [PMID: 24314264]
  5. Biostatistics. 2014 Apr;15(2):222-33 [PMID: 24292992]
  6. Lifetime Data Anal. 2018 Jan;24(1):176-199 [PMID: 28224260]
  7. Nat Commun. 2015 Jan 28;6:4852 [PMID: 25629879]
  8. J R Stat Soc Series B Stat Methodol. 2017 Nov;79(5):1415-1437 [PMID: 37854943]
  9. J Chem Inf Comput Sci. 2003 Nov-Dec;43(6):1947-58 [PMID: 14632445]
  10. BMC Bioinformatics. 2006 Jan 06;7:3 [PMID: 16398926]
  11. Ann Stat. 2013 Jun 1;41(3):1142-1165 [PMID: 24086091]
  12. Ecology. 2007 Nov;88(11):2783-92 [PMID: 18051647]
  13. Stat Probab Lett. 2010 Jul 1;80(13-14):1056-1064 [PMID: 20582150]

Grants

  1. R01 GM123056/NIGMS NIH HHS
  2. R01 GM129781/NIGMS NIH HHS