Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.

Dong T.; Sinha S.; Zhai B.; Fudulu D.; Chan J.; Narayan P.; Judge A.; Caputo M.; Dimagli A.; Benedetto U.; Angelini GD.

Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.

Dong T., Sinha S., Zhai B., Fudulu D., Chan J., Narayan P., Judge A., Caputo M., Dimagli A., Benedetto U., Angelini GD.

BACKGROUND: The Society of Thoracic Surgeons and European System for Cardiac Operative Risk Evaluation (EuroSCORE) II risk scores are the most commonly used risk prediction models for in-hospital mortality after adult cardiac surgery. However, they are prone to miscalibration over time and poor generalization across data sets; thus, their use remains controversial. Despite increased interest, a gap in understanding the effect of data set drift on the performance of machine learning (ML) over time remains a barrier to its wider use in clinical practice. Data set drift occurs when an ML system underperforms because of a mismatch between the data it was developed from and the data on which it is deployed. OBJECTIVE: In this study, we analyzed the extent of performance drift using models built on a large UK cardiac surgery database. The objectives were to (1) rank and assess the extent of performance drift in cardiac surgery risk ML models over time and (2) investigate any potential influence of data set drift and variable importance drift on performance drift. METHODS: We conducted a retrospective analysis of prospectively, routinely gathered data on adult patients undergoing cardiac surgery in the United Kingdom between 2012 and 2019. We temporally split the data 70:30 into a training and validation set and a holdout set. Five novel ML mortality prediction models were developed and assessed, along with EuroSCORE II, for relationships between and within variable importance drift, performance drift, and actual data set drift. Performance was assessed using a consensus metric. RESULTS: A total of 227,087 adults underwent cardiac surgery during the study period, with a mortality rate of 2.76% (n=6258). There was strong evidence of a decrease in overall performance across all models (P

Original publication

DOI

10.2196/45973

Type

Journal article

Journal

Jmirx med

Publication Date

12/06/2024

Volume

Keywords

United Kingdom, adult, artificial intelligence, cardiac, cardiac surgery, cardiology, data, data set drift, heart, machine learning, model, mortality, national data set, operative mortality, performance, performance drift, prediction, risk, risk prediction, surgery

Cookies on this website

Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.

Dong T., Sinha S., Zhai B., Fudulu D., Chan J., Narayan P., Judge A., Caputo M., Dimagli A., Benedetto U., Angelini GD.

DOI

Type

Journal

Publication Date

Volume

Keywords