Is Fine-Tuning Useful in EHR-Based Prediction Models? a Use Case on Mortality Prediction with Longitudinal Data from Spanish (SIDIAP) and UK (CPRD) Populations Aged Over 65 Years
Carrasco-Ribelles LA., Cabrera-Bean M., Prats-Uribe A., Khalid S., Violán C.
Transfer learning enables the reuse of models trained on large datasets, reducing data collection, computation time, and costs. While widely used in computer vision, its application to models based on electronic health records (EHRs) remains limited. This study evaluates whether fine-tuning an EHR-based model from one country to another outperforms training a model from scratch. EHR from the SIDIAP (Spain) and CPRD (UK) databases were used, defining a cohort in each country of individuals aged 65+ followed between 2010 and 2019. A prediction model was trained and validated internally for each country to predict 1-year mortality, then externally validated and fine-tuned with the other country's population (recalibrated model). The models were based on ARIADNEhr, a previously validated architecture. Performance metrics, decision curve analysis, and attention maps were compared. Participants included 1,456,052 from SIDIAP and 1,507,736 from CPRD, with similar demographics. Performance on the external cohort varied between $\mathbf{- 1 0. 9 \%}$ and $\mathbf{+ 3 9. 5 \%}$. Fine-tuning consistently improved external performance (1.8 % − 15.5 %), enhanced model calibration and clinical utility, and maintained key contributing variables. However, the fine-tuned models did not reach the performance of the country-specific models, showing a performance drop between 14 % and 20 %. Fine-tuning may be useful in other fields but still insufficient for tabular EHR-based prediction models in health applications.