Methods for the development and validation of computable phenotypes based on biobanks and linked real world data
- Project No: Botnar-2025-13
- Intake: 2026
PROJECT OVERVIEW
Background
Ongoing and new biobanks and linked data provide a unique opportunity for the study of pharmacogenomics, proteomics, and disease determinants at scale. Examples include UK Biobank (UKB) and Our Future Health.
However, incomplete data linkage, follow-up, issues with representativeness, and diverse data models make the phenotyping process complex and non-scalable.
This project will:
- Explore methods for the definition of cohorts and computable phenotypes across data sources
- Improve phenotyping methods based on exposure/outcome ascertainment using self-reported variables, questionnaires, and linked real world data (RWD)
- Analyse the representativeness of the available biobanks and data assets
- Study the use of biobanks in rare disease research using all data available, including surveys/questionnaires, genomics/proteomics, and linked RWD
- Explore the use of large language models to increase the efficiency of computable phenotyping in biobanks and linked RWD
- Increase our understanding of the use of the OMOP common data model for the definition and validation of computable phenotypes in biobanks
Key References
1) G J Macfarlane, M Beasley, B H Smith, G T Jones, T V Macfarlane. Can large surveys conducted on highly selected populations provide valid information on the epidemiology of common health conditions? An analysis of UK Biobank data on musculoskeletal pain. 28 January 2015. British Journal of Pain. PubMed ID: 26526341. DOI: 10.1177/2049463715569806.
2) Cook MB, Adams N, Adjetey A, Arathimos R, Balabanovic M, Blackwood R, Booth A, Cairns BJ, Connell A, Ellis S, Elsworth B, Evans K, Forman A, Gradovich E, Gretton C, Grimm F, Hunter DJ, Lipinski K, Lord J, Luff J, Maleady-Crowe F, Moran R, North S, Peel A, van der Plaat D, Purves K, Reddington F, Roddam A, Sanderson SC, Sprosen T, Steventon A, Turnbull I, Vestesson E, Ali R. Cohort Profile: Our Future Health. Int J Epidemiol. 2025 Oct 14;54(6):dyaf171. doi: 10.1093/ije/dyaf171. PMID: 41092131; PMCID: PMC12527338.
KEYWORDS
Real world evidence, epidemiology, health data sciences
The Health Data Sciences team
The Health Data Sciences team at the Botnar Research Centre is a multidisciplinary group including over 40 people including research staff, postdoctoral researchers, and 8 PhD students. Our team includes colleagues from multiple and diverse backgrounds and geographies, and from complementary areas of knowledge, necessary for the completion of research studies, from design to reporting. We have extensive expertise in health data sciences, epidemiology, and pharmacoepidemiology.
Training
Alongside departmental training opportunities listed below we will ensure hands-on training in real world data analysis using medical records and genetic data from the Health Data Sciences section at the Botnar Research Centre (University of Oxford).
The Botnar Research Centre plays host to the University of Oxford's NDORMS Health Data Sciences and Real World Evidence section, which enables and encourages research and education into the use of large routinely collected health data for the study and improvement of human health. Training will be provided in techniques and methods including epidemiology, pharmacoepidemiology, data sciences, applied artificial intelligence, causal inference, and real world evidence.
A core curriculum of lectures will be taken in the first term to provide a solid multidisciplinary foundation in a broad range of subjects including biology, inflammation, epigenetics, translational immunology, microbiome, and data sciences. Students will also be required to attend regular seminars within the Department and those relevant in the wider University.
Students will be expected to present data regularly in Departmental seminars, fortnightly Health Data Science meetings, and to attend external conferences to present their research globally, with limited financial support from the Department.
Students will also have the opportunity to work closely with our wide range of collaborators in the Observational Health Data Sciences and Informatics (OHDSI), European Health Data and Evidence Network (EHDEN), and related open data science communities.
Students will have access to various courses run by the Medical Sciences Division Skills Training Team and other Departments. All students are required to attend a 2-day Statistical and Experimental Design course at NDORMS (information will be provided once accepted to the programme).
How to Apply
Please contact the relevant supervisor(s), to register your interest in the project, and, if required, the departmental Education Team (graduate.studies@ndorms.ox.ac.uk), who will be able to advise you of the essential requirements for the programme and provide further information on how to make an official application.
Interested applicants should have, or expect to obtain, a first or upper second-class BSc degree or equivalent in a relevant subject and will also need to provide evidence of English language competence (where applicable). The application guide and form is found online and the DPhil programme will commence in October 2026.
Applications should be made to the following programme using the specified course code:
- D.Phil in Clinical Epidemiology and Medical Statistics (course code: RD_NNRA1)
For further information, please visit http://www.ox.ac.uk/admissions/graduate/applying-to-oxford