The study, 'Evaluating large language models for clinical note processing: local fine-tuning and internal–external validation using electronic health records from South Asia', has been published in BMC Medical Informatics. A collaboration between Associate Professor Sara Khalid at NDORMS and Dr Faisal Sultan from the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC) in Pakistan, the project was supported by a Global Grand Challenges award funded by the Bill & Melinda Gates Foundation.
South Asia is home to a quarter of the world's population. Healthcare systems are under unprecedented pressures and largely over-subscribed and under- resourced. Up to 80% of information required for time-critical decision-making is buried in full-text patient notes that might include key patient-specific information relating to family history, social, behavioural, or environmental determinants of health.
Artificial intelligence is increasingly used in healthcare to analyse structured data such as imaging or laboratory results, but modelling free-text clinical notes in real-world settings is more complex.
The study evaluated open-source LLMs using a database of approximately 250,000 patient records, including cancer and COVID-19 cases. It assessed whether these models could identify and prioritise clinically relevant information from free-text notes to support timely decision-making.
Dr Faisal Sultan said: 'We found that open-source large language models (LLMs) could be applied to real-world databases in South Asia. However, the models trained on global data created bias in the results and did not consistently perform in our local clinical context.'
The team found that by adapting and fine-tuning the models using local data and with the help of clinicians familiar with the local context, the accuracy and relevance improved significantly.
Sara said: 'This is an excellent example of both opportunities and limitations of digital technologies and current AI systems. Our findings underline the importance of co-development with local clinicians and data scientists to ensure that tools are safe, contextually appropriate and do not inadvertently widen existing health inequalities.'