Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

  • Project No: Botnar-2025-12
  • Intake: 2026

Project overview

Real-world data (RWD), routinely collected across various healthcare settings, are increasingly used in observational studies, which are now recognised as valuable complements to traditional randomised controlled trials for informing patient care, healthcare delivery, and policy-making.

The real-world evidence (RWE) derived from RWD is strengthened when institutions collaborate through network studies, ensuring that analyses are consistent, comparable, and generalisable. Network studies, whether adopting a centralised or federated model, depend on Common Data Models (CDMs), which are consistent frameworks designed to standardise the organisation and structure of healthcare data across diverse sources, and harmonise their representation.

Our team has extensive expertise in the Observational Medical Outcomes Partnership (OMOP) CDM, one of the most widely used CDMs, and in the extract, transform, and load (ETL) processes used to map source data into this model. We are active data partners in international federated networks and have access to multiple clinical RWD sources.

A growing body of research and tools exists to validate and assess the quality of RWD once transformed into the OMOP CDM (1–4). However, no established methods currently measure the similarity or distance between source RWD and their transformed versions. This project aims to address this gap by strengthening the transformation process and enhancing the validity of the resulting data through:

  • Developing novel measures of similarity and distance between RWD and their transformed OMOP CDM versions.
  • Creating tools to detect incorrect or incomplete data transformations.

This is a 3-year DPhil project.

Key References

  1. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19(1):54–60.
  2. Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw ST, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. 2016;4(1):1244. doi: 10.13063/2327-9214
  3. Blacketer C, Defalco FJ, Ryan PB, Rijnbeek PR. Increasing trust in real-world evidence through evaluation of observational data quality. J Am Med Inform Assoc. 2021;28(10):2251–7.
  4. Blacketer C, Voss EA, DeFalco F, Hughes N, Schuemie MJ, Moinat M, et al. Using the Data Quality Dashboard to improve the EHDEN network. Appl Sci (Basel). 2021;11(24):11920.

Keywords

Real-world data, health data sciences, data standardisation, data harmonisation

The Health Data Sciences team

The Health Data Sciences (HDS) team at the Botnar Institute is a multidisciplinary group including over 40 people, including research staff, postdoctoral researchers, and 8 PhD students. Our team includes colleagues from multiple and diverse backgrounds and geographies, and from complementary areas of knowledge, necessary for the completion of research studies, from design to reporting. We have extensive expertise in health data sciences, epidemiology, pharmacoepidemiology, pharmacogenomics, and machine learning.

Training

Alongside departmental training opportunities listed below, we will ensure hands-on training in real-world data using medical records and genetic data from the HDS group. The student will work on their unique project within an experienced and collaborative supervisory team. The student will also be involved with our international European Health Data & Evidence Network (EHDEN) and Observational Health Data Sciences and Informatics (OHDSI) networks to ensure additional guidance, training and support. A student would be supported to attend relevant conferences to enrich their studies, and financial support will be made available for travel to conferences.

The Botnar Research Centre plays host to the University of Oxford's Institute of Musculoskeletal Sciences, which enables and encourages research and education into the causes of musculoskeletal disease and their treatment. Training will be provided in a variety of techniques and methods, including data sciences, applied artificial intelligence, and real-world evidence.

A core curriculum of lectures will be taken in the first term to provide a solid foundation in a broad range of subjects, including musculoskeletal biology, inflammation, epigenetics, translational immunology, data analysis and the microbiome.  Students will also be required to attend regular seminars within the Department and those relevant in the wider University.

Students will be expected to present data regularly in Departmental seminars, fortnightly Health Data Science meetings, and to attend external conferences to present their research globally, with limited financial support from the Department.

Students will also have the opportunity to work closely with our wide range of collaborators in the Observational Health Data Sciences and Informatics (OHDSI), European Health Data and Evidence Network (EHDEN), and related open data science communities.

Students will have access to various courses run by the Medical Sciences Division Skills Training Team and other Departments. All students are required to attend a 2-day Statistical and Experimental Design course at NDORMS (information will be provided once accepted to the programme).

How to Apply

Please contact the relevant supervisor(s), to register your interest in the project, and, if required, the departmental Education Team (graduate.studies@ndorms.ox.ac.uk), who will be able to advise you of the essential requirements for the programme and provide further information on how to make an official application.

Interested applicants should have, or expect to obtain, a first or upper second-class BSc degree or equivalent in a relevant subject and will also need to provide evidence of English language competence (where applicable). The application guide and form are found online, and the DPhil programme will commence in October 2026.

Applications should be made to one of the following programmes using the specified course code.

D.Phil in Clinical Epidemiology and Medical Statistics (course code: RD_NNRA1)

For further information, please visit http://www.ox.ac.uk/admissions/graduate/applying-to-oxford.