Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

A Global Grand Challenges case study reveals the potential of large language models (LLMs) to close health gaps in the South Asia, but only when they’re adapted and fine-tuned using local data and expertise.

Doctors reviewing patient data

The study, 'Evaluating large language models for clinical note processing: local fine-tuning and internal–external validation using electronic health records from South Asia', has been published in BMC Medical Informatics. A collaboration between Associate Professor Sara Khalid at NDORMS and Dr Faisal Sultan from the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC) in Pakistan, the project was supported by a Global Grand Challenges award funded by the Bill & Melinda Gates Foundation.

South Asia is home to a quarter of the world's population. Healthcare systems are under unprecedented pressures and largely over-subscribed and under- resourced. Up to 80% of information required for time-critical decision-making is buried in full-text patient notes that might include key patient-specific information relating to family history, social, behavioural, or environmental determinants of health.

Artificial intelligence is increasingly used in healthcare to analyse structured data such as imaging or laboratory results, but modelling free-text clinical notes in real-world settings is more complex.

The study evaluated open-source LLMs using a database of approximately 250,000 patient records, including cancer and COVID-19 cases. It assessed whether these models could identify and prioritise clinically relevant information from free-text notes to support timely decision-making.

Dr Faisal Sultan said: 'We found that open-source large language models (LLMs) could be applied to real-world databases in South Asia. However, the models trained on global data created bias in the results and did not consistently perform in our local clinical context.'

The team found that by adapting and fine-tuning the models using local data and with the help of clinicians familiar with the local context, the accuracy and relevance improved significantly.

Sara said: 'This is an excellent example of both opportunities and limitations of digital technologies and current AI systems. Our findings underline the importance of co-development with local clinicians and data scientists to ensure that tools are safe, contextually appropriate and do not inadvertently widen existing health inequalities.'