Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

  • Project No: #OxKEN-2023/10
  • Intake: OxKEN 2023



project overview

The Covid pandemic has highlighted inequalities in health systems around the world. However, inequity is not limited to the pandemic – it is in fact a long-standing and multifaceted issue. In addition to socio-economic complexities, imbalances in healthcare technologies can worsen existing biases.

An example is the artificial intelligence technology behind clinical prediction models. If there are imbalances in the data used to train the models, or if there are algorithm biases within the analytical pipeline, the resulting models can be biased and result in mis-estimation of the health risks of patients in real-time. This in turn can lead to some groups of patients being under- or over-prioritised.

This research will develop prediction models that are based on bias-minimisation guidelines (developed by the Equator Centre UK housed in our department) and that are tailored to specific patient groups, including patients with different ethnic backgrounds, patients with rare conditions and patients with disabilities. By addressing any sources of bias in the data and in the analytical pipelines, prediction models can be made more targeted and equitable.

The project will use routinely collected data from the UK Clinical Practice Research Datalink, Hospital Episode Statistics (HES), and Office of National Statistics, as well as international data representing >500 million patients and 5 billion clinical records from across 5 continents. The project will have access to the OHDSI analytics pipeline ( for standardized, rapid, and reproducible artificial intelligence.

Patient and public engagement and involvement will be an important element of this research.


Personalised medicine, Big Data, Health Equity, Machine Learning, Observational Research

training opportunities

The Botnar Research Centre plays host to the University of Oxford's Institute of Musculoskeletal Sciences and Centre for Statistics in Medicine.

Training will be provided in relevant related research methodology, including the handling and analysis of large health datasets, and advanced statistical and machine learning techniques, as well as in patient and public engagement for research. Attendance at formal training courses will be encouraged, and will include the "Real world epidemiology" Oxford summer school and the "Big Data and Machine Learning for Healthcare" modules.

In addition, courses from the University's Centre for Teaching and Learning (, Department of Computer Science (, and the Medical Science Division Skills Team ( on key skills for the completion of a successful PhD thesis will be available. Additional on-the-field training opportunities will arise, and the supervisors will encourage the student to pursue such opportunities.

Further, the Observational Health Data Sciences and Informatics ( global community of 300+ researchers will provide training and opportunities for international collaboration stretching beyond the project.
A core curriculum of lectures organized departmentally will be taken in the first term to provide a solid foundation in a broad range of subjects including epidemiology, machine learning, and statistics.

Students will attend weekly seminars within the department and those relevant in the wider University.
Students will be expected to present data regularly to the department, the research group and to attend external conferences to present their research globally.

key publications

  1. A. K. Clift et al., “Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: national derivation and validation cohort study,” BMJ, vol. 371, p. m3731, Oct. 2020, doi: 10.1136/bmj.m3731.
  2. S. Khalid et al., “A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data,” medRxiv, 2021.
  3. S. Khalid and D. Prieto-Alhambra, “Machine Learning for Feature Selection and Cluster Analysis in Drug Utilisation Research,” Curr. Epidemiol. Reports, vol. 6, no. 3, pp. 364–372, 2019.
  4. [27] A. Delmestri and D. Prieto-Alhambra, “CPRD GOLD and linked ONS mortality records: Reconciling guidelines,” Int. J. Med. Inform., vol. 136, p. 104038, 2020
  5. G. S. Collins et al., “External validation of multivariable prediction models: a systematic review of methodological conduct and reporting,” BMC Med. Res. Methodol., vol. 14, no. 1, pp. 1–11, 2014.

contact information for all supervisors

Prof Daniel Prieto-Alhambra has published extensively in the field of pharmaco-epidemiology, and is recognised internationally as an authority on use of routine data for musculoskeletal pharmaco- and device epidemiology.

Professor Laura Coates is an Associate Professor and honorary consultant rheumatologist with an interest in outcome measures, clinical trial design and patient and public involvement in research.

Dr Sara Khalid is a machine learning lead in the Centre for Statistics in Medicine, Oxford. She has an Oxford DPhil in Engineering Science, and has an excellent track record and experience in the use of big data methods including machine learning and similar methods.

Dr Antonella Delmestri is an international expert in automation of data engineering, data mining and advanced curation of real-world health data routinely collected by doctors in primary and secondary care.

Prof Gary Collins’ research interests are focused on methodological aspects surrounding the development and validation of multivariable prediction models and has published extensively in this area. He is particularly interested in the role that big data and machine learning has in developing and evaluating prediction models.