Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
  • Project No: NDORMS-2021/3
  • Intake: 2021

Project outline

With the advent of real-world data in healthcare, there is growing interest in the use of artificial intelligence (AI) for observational research and there has been a significant uptake of traditional machine learning. However, the scope of the most recent advances in AI and their transportability to clinical analytics is not well explored or understood, particularly so from the validation standpoint.

Today, real-world data for healthcare research are available from a wide variety of sources including electronic medical records and  wearable devices. This is generating rapidly a big variety of large, coded, longitudinal, multimodal and multidimensional datasets. In theory, these data lend very well to being analysed by data-driven artificial intelligence methods that have evolved and matured in other fields, such as computer vision.

This project will explore the specific advances in deep learning and artificial intelligence and will investigate their applicability to and validation of real-world, large-scale, routinely collected, clinical datasets.

The specific focus areas will be

1) development of clinician-ready equations from shallow and deep neural networks

2) use of AI methods for patient- and population-level  temporal and spatial trajectory analysis

3) application of AI methods which have not yet been used in mining electronic medical record datasets

4) contribution to the development of predictive analytics tools in OHDSI via 1), 2) and 3).

This DPhil studentship presents an exciting opportunity to join the international Observational Health Data Sciences and Informatics (OHDSI, project and collaborate with a community of 300+ epidemiologists, data scientists, statisticians, software developers, and clinical experts from 30 countries.

This DPhil project will be suitable for a candidate who wishes to develop analytical tools for real-world medical decision making, and who has a strong interest and some experience in artificial intelligence and machine learning. 


  • Hold or be about to obtain a first or upper second class BSc degree or a Master degree (or equivalent) in subjects relevant to computer science, engineering, statistics, maths or data science.
  • Proficient in R and/or Python programming, and in data visualisation tools. 
  • Have a good theoretical understanding and applied experience of one or more of a) traditional machine learning, b) deep learning, c) reinforcement learning, d) computer vision techniques, e) time-series analysis.


  • Should have a commitment to research in the applied health sciences.
  • A good team player as well as work independently. 
  • Experience of developing statistical methods/software tools in R/Python would be an advantage.  
  • Understanding electronic healthcare data would be an advantage. 


This project will be jointly supervised by Prof Prieto-Alhambra (Professor of Pharmaco- and Device Epidemiology and Theme Lead for Observational Research), Dr Sara Khalid (Machine Learning subtheme Lead), Dr Antonella Delmestri (Real-World Data Processing subtheme Lead) all from the Centre for Statistics in Medicine, NDORMS, University of Oxford, and by Professor Peter Rijnbeek,  Erasmus Medical Centre, Netherlands. The research will be conducted in the Pharmaco- and Device Epidemiology Research Group, at the premises of the Botnar Research Centre, in Oxford, UK. Supervision meetings with Professor Rijnbeek will be organized regularly. The DPhil candidate will also work closely with other OHDSI collaborators and with researchers at the Centre for Statistics in Medicine to provide the candidate with cutting-edge environment to develop their career. 

Prof Daniel Prieto-Alhambra has published extensively in the field of pharmaco-epidemiology, and is recognised internationally as an authority on use of routine data for pharmaco- and device epidemiology and related methods. He is a core member of OHDSI. He will be the primary supervisor and will oversee the guideline for the DPhil student. 

Dr Sara Khalid is a senior research associate in biomedical data science. She has extensive expertise in the use, development, and validation of machine learning methods for the analysis of routinely collected data, for predictive and exploratory modelling. She is a core member of the OHDSI predictive analytics team. She will provide close supervision on machine learning and AI.

Prof Peter Rijnbeek (Erasmus Medical Centre Netherlands) is a professor of health data science, and the creator and co-founder of the OHDSI movement for standardised predictive analytics in routinely collected health data. He will provide expert guidance for this DPhil project.

Dr Antonella Delmestri is a senior health data scientist with a background in computer science and software engineering, with vast experience in real-world clinical data. She is a key member of OHDSI providing expert guidance in the standardization of several clinical datasets. She will provide support in understanding real-world data and data-driven methods.

Current DPhil Students within the pharmaco-epidemiology research group: 6

Current Postdocs within the pharmaco-epidemiology research group: 7


The Botnar Research Centre plays host to the University of Oxford's Institute of Musculoskeletal Sciences, which enables and encourages research and education into the causes of musculoskeletal disease and their treatment. Training will be provided in techniques including relevant related research methodology, including deep learning analytics, handling and analysis of large datasets, OHDSI analytical tools and OMOP-CDM structure.

A core curriculum of lectures organized departmentally will be taken in the first term to provide a solid foundation in a broad range of subjects including epidemiology, health economics, and data analysis. These include but are not limited to a workshop on "Machine Learning for Healthcare",  an 8-week module on "Epidemiology, Machine Learning and Health Economics", and regular courses throughout the year at the Centre for Statistics in Medicine.

Students will be expected to present data regularly in Departmental seminars, the Pharmaco- and Device Epidemiology Research Group and the Centre for Statistics in Medicine, and to attend external conferences to present their research globally, with limited financial support from the Department. Students will also be required to attend regular seminars within the Department and those relevant in the wider University.

Students will also have the opportunity to work closely with the OHDSI team. Students will have access to various courses run by the Medical Sciences Division Skills Training Team and other Departments. All students are required to attend a 2-day Statistical and Experimental Design course at NDORMS and run by the IT department (information will be provided once accepted to the programme).

Training will be provided in relevant related research methodology, including deep learning analytics, handling and analysis of large datasets, OHDSI analytical tools and OMOP-CDM structure. Attendance at formal training courses will be encouraged, and will include the "Real world epidemiology Oxford summer school" directed by Prof Prieto-Alhambra, machine learning summer school, and the pre-conference course/s offered by the International Society of Pharmaco-epidemiology, amongst others.



  1.  Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. 2018;22(5):1589–604
  2. Khalid, S., Prieto-Alhambra, D. Machine Learning for Feature Selection and Cluster Analysis in Drug Utilisation Research. Curr Epidemiol Rep 6, 364–372 (2019).
  3. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Reports. 2016;6:26094
  4. Rahimian F, Salimi-Khorshidi G, Payberah AH, et al. Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records. PLoS Med. 2018;15(11):e1002695. Published 2018 Nov 20. doi:10.1371/journal.pmed.1002695


The Department accepts applications throughout the year but it is recommended that, in the first instance, you contact the relevant supervisor(s) or the Graduate Studies Officer, Sam Burnell (, who will be able to advise you of the essential requirements.

Interested applicants should have, or expect to obtain, a first or upper second-class BSc degree or equivalent in a relevant subject and will also need to provide evidence of English language competence (where applicable). The application guide and form is found online and the DPhil or MSc by research will commence in October 2021.

For further information, please visit

and/or contact Dr Sara Khalid and Prof Prieto-Alhambra.