Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
  • Project No: NDORMS 2022/5
  • Intake: 2022

Project overview 

The Covid pandemic has highlighted inequalities in health systems around the world. However, inequity is not limited to the pandemic – it is in fact a long-standing and multifaceted issue. In addition to socio-economic complexities, imbalances in healthcare technologies can worsen existing biases.

An example is the artificial intelligence technology behind clinical prediction models. If there are imbalances in the data used to train the models, or if there are algorithm biases within the analytical pipeline, the resulting models can be biased and result in mis-estimation of the health risks of patients in real-time. This in turn can lead to some groups of patients being under- or over-prioritised. 

This research will develop prediction models that are based on bias-minimisation guidelines  (developed by the Equator Centre UK housed in our department) and that are tailored to specific patient groups, including patients with different ethnic backgrounds, patients with rare conditions and patients with disabilities. By addressing any sources of bias in the data and in the analytical pipelines, prediction models can be made more targeted and equitable.

The project will use routinely collected data from the UK Clinical Practice Research Datalink, Hospital Episode Statistics (HES), and Office of National Statistics, as well as international data representing >500 million patients and 5 billion clinical records from across 5 continents. The project will have access to the OHDSI analytics pipeline for standardized, rapid, and reproducible artificial intelligence.

Patient and public engagement and involvement will be an important element of this research.


The PhD project will be jointly supervised by Prof Prieto-Alhambra, and Dr Sara Khalid, Dr Antonella Delmestri, and Professor Laura Coates from the Centre for Statistics in Medicine and the Pharmaco- and Device epidemiology group at the Nuffield Department of Orthopaedics Rheumatology and Musculoskeletal Sciences. Project advisors will include Prof Gary Collins, Professor of Medical Statistics and Interim Director of the Oxford Centre for Statistics in Medicine and Professor Irene Petersen, Professor of Epidemiology at University College London. 

Dr Sara Khalid is a machine learning lead  in the Centre for Statistics in Medicine, Oxford. She has an Oxford DPhil in Engineering Science, and has an excellent track record and experience in the use of big data methods including machine learning and similar methods.

Prof Daniel Prieto-Alhambra has published extensively in the field of pharmaco-epidemiology, and is recognised internationally as an authority on use of routine data for musculoskeletal pharmaco- and device epidemiology. 

Professor Laura Coates is an Associate Professor and honorary consultant rheumatologist with an interest in outcome measures, clinical trial design and patient and public involvement in research.

Dr Antonella Delmestri is an international expert in automation of data engineering, data mining and advanced curation of real-world health data routinely collected by doctors in primary and secondary care.


Prof Gary Collins’ research interests are focused on methodological aspects surrounding the development and validation of multivariable prediction models and has published extensively in this area. He is particularly interested in the role that big data and machine learning has in developing and evaluating prediction models.

Current DPhil Students within the research group: 10

Current Postdocs within the research group: 7


The Botnar Research Centre plays host to the University of Oxford's Institute of Musculoskeletal Sciences and Centre for Statistics in Medicine. 

Training will be provided in relevant related research methodology, including the handling and analysis of large  health datasets, and advanced statistical and machine learning techniques, as well as in patient and public engagement for research. Attendance at formal training courses will be encouraged, and will include the "Real world epidemiology” Oxford summer school and the "Big Data and Machine Learning for Healthcare" modules. 

In addition, courses from the University’s Centre for Teaching and LearningDepartment of Computer Science, and the Medical Science Division Skills Team on key skills for the completion of a successful PhD thesis will be available. Additional on-the-field training opportunities will arise, and the supervisors will encourage the student to pursue such opportunities. 

Further, the Observational Health Data Sciences and Informatics global community of 300+ researchers will provide training and opportunities for international collaboration stretching beyond the project.

A core curriculum of lectures organized departmentally will be taken in the first term to provide a solid foundation in a broad range of subjects including epidemiology, machine learning, and statistics.

Students will attend weekly seminars within the department and those relevant in the wider University.

Students will be expected to present data regularly to the department, the research group and to attend external conferences to present their research globally. 


For further information regarding the project, please contact Dr Sara Khalid. For queries related to the application, please contact our Graduate Studies Officer, Sam Burnell.


  1. D. Pan et al., “The impact of ethnicity on clinical outcomes in COVID-19: A systematic review,” EClinicalMedicine, vol. 23, p. 100404, 2020, doi:
  2. R. Mathur et al., “Completeness and usability of ethnicity data in UK-based primary care and hospital databases,” J. Public Health (Bangkok)., vol. 36, no. 4, pp. 684–692, Dec. 2014, doi: 10.1093/pubmed/fdt116.
  3. A. K. Clift et al., “Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: national derivation and validation cohort study,” BMJ, vol. 371, p. m3731, Oct. 2020, doi: 10.1136/bmj.m3731.
  4. S. Khalid et al., “A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data,” medRxiv, 2021.
  5. S. Khalid and D. Prieto-Alhambra, “Machine Learning for Feature Selection and Cluster Analysis in Drug Utilisation Research,” Curr. Epidemiol. Reports, vol. 6, no. 3, pp. 364–372, 2019.
  6. [27] A. Delmestri and D. Prieto-Alhambra, “CPRD GOLD and linked ONS mortality records: Reconciling guidelines,” Int. J. Med. Inform., vol. 136, p. 104038, 2020.
  7. G. S. Collins et al., “External validation of multivariable prediction models: a systematic review of methodological conduct and reporting,” BMC Med. Res. Methodol., vol. 14, no. 1, pp. 1–11, 2014. 

How to Apply

The Department accepts applications throughout the year but it is recommended that, in the first instance, you contact the relevant supervisor(s) or the Graduate Studies Officer, Sam Burnell (, who will be able to advise you of the essential requirements. 

Interested applicants should have, or expect to obtain, a first or upper second-class BSc degree or equivalent in a relevant subject and will also need to provide evidence of English language competence (where applicable). The application guide and form is found online and the DPhil or MSc by research will commence in October 2022. 

Applications should be made to one of the following programmes using the specified course code:

D.Phil in Musculoskeletal Sciences (course code: RD_ML2)

MSc by research in Musculoskeletal Sciences (course code: RM_ML2)


For further information, please visit the University Graduate Study page.