Adaptive Sampling for Weighted Log-Rank Survival Trees Boosting
Iulii Vasilev, Mikhail Petrovskiy, Igor Mashechkin
Code Available — Be the first to reproduce this paper.
ReproduceCode
Abstract
The field of survival analysis is devoted to predicting the probability and time of the occurrence of an event. The global problem is to predict the event probability over time. It has applications in healthcare, credit scoring, etc. The most widely used method for assessing the covariate impacts on survival is the Cox proportional hazards approach. However, the assumption of non-overlapping survival functions usually does not hold on real data, and the linear dependence on features limits the quality of the method. There are tree-based machine learning methods to solve these problems. Usually, to evaluate the difference between the samples, it used the log-rank test. Obtained survival decision tree models also have strong interpretability, they can evaluate the importance of predictors, but they demonstrate inferior performance in comparison to Cox proportional hazards models. To overcome these issues, this paper proposes a new boosting of the survival decision tree model that uses adaptive sampling and weighted log-rank split criteria. The model iteratively corrects an error in the ensemble. Each decision tree is trained on a sample, taking into account the weights of observations and subsequently adjusting the probabilities of getting into the next sample. We introduce an experimental comparison of the proposed adaptive boosting method against Cox proportional hazard and widely used survival trees and their ensembles: random forest and gradient boosting. Experiments on healthcare datasets show that our model outperforms the state-of-the-art survival models in terms of the following metrics: the concordance index, the integrated Brier score, and the integrated AUC.