Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
Skip to main content

NDORMS DPhil & MSc by Research

Project Outline

Missing data is ubiquitous in longitudinal cohort studies with repeated follow-up timepoints. Missing data is defined as being "intermittent" in which a missing value is followed by an observed value in a sequential timepoint. It is defined as being "monotone" if there are no observed values in any subsequent timepoints. Monotone missingness can be caused by early withdrawal or death. In observational studies using routinely collected health data, monotone missingness also happens at the stopping of individual-varying follow-up window for the outcome, e.g. patients will be censored when stopping/switch of the treatment. Unlike missing predictors/covariates that are often dealt with missing data imputations, missing data in repeatedly measured outcomes are typically handled with statistical approaches in longitudinal analysis1,2,3. Little focus is given to handle a mixture of missing data mechanisms.

In observational studies using routinely collected health data, inter-individual differences in intra-individual changes of the repeatedly measured outcomes are often the norm rather than the exception. Methods to accommodate heterogeneity are problematic to cope with different types of missing data concurrently4.

With advance in the data collection particularly in digital devices, intensive longitudinal data becomes increasingly popular. Tenability and utility of statistical approaches dealing with missing data in the intensive longitudinal outcome data have not fully tested5. There is opportunity to explore the use of machine learning in handling missing data in the intensive longitudinal outcomes.

This DPhil project will aim to address missing data issues in studies aiming to understand heterogeneity based on routinely collected health data. The student will gain experience in systematic reviews, simulation studies, and application of different methods to deal with missing data in clinical case studies.


1Curran et al (2010). Twelve Frequently Asked Questions About Growth Curve Modeling. J Cog Dev, 11(2), 121–136. ; 2 Daza et al (2017). Estimating inverse-probability weights for longitudinal data with dropout or truncation: The xtrccipw command. The Stata Journal17(2), 253–278; 3 Kurland et al (2009). Longitudinal Data with Follow-up Truncated by Death: Match the Analysis Method to Research Aims. Stat Sci; 24(2), 211; 4Muthén et al (2011). Growth Modeling with Non-Ignorable Dropout: Alternative Analyses of STAR*D Antidepressant Trial. Psychol Methods, 16(1), 17-33.5Ji(2018) Handling Missing Data in the Modeling of Intensive Longitudinal Data, Struct Equ Modeling

Details of the research group

Prof Daniel Prieto-Alhambra is recognised internationally as an authority on use of routine collected health data for musculoskeletal pharmaco- and device epidemiology. He will be the primary supervisor and will oversee the guideline for the DPhil student.

Dr Victoria Strauss has extensive experiences in longitudinal data analysis using routinely collected health data. She will provide close supervision to the DPhil project under the guideline of Associate Prof Prieto-Alhambra.

Dr Sara Khalid is an expert in the machine learning with applications in health informatics such as patient monitoring and telehealth. She will provide expert support in the application of machine learning in the intensive longitudinal data part.

Prof Irene Petersen (University College London) is a most experienced researcher in the field of handling missing data in routinely collected health data. She will provide expert guideline for this DPhil project.


The Botnar Research Centre plays host to the University of Oxford's Institute of Musculoskeletal Sciences, which enables and encourages research and education into the causes of musculoskeletal disease and their treatment. The proposed project would be part of the work of the pharmaco- and device epidemiology group, Centre for Statistics in Medicine (CSM), NDORMS. CSM has more than 20 years’ experience in medical statistics, and has teams of statisticians, epidemiologists, methodologists and systematic review specialists. The pharmaco- and device epidemiology group is led by Prof Daniel Prieto-Alhambra and has seven doctoral researchers, two post-doctoral researchers and four senior post-doctoral researchers.

Training will be provided in techniques including simulations, longitudinal data analysis and handling and analysis of large datasets. Attendance at formal training courses will be encouraged, and will include the "Real world epidemiology Oxford summer school" and advanced statistics courses. In addition, courses from the Oxford Learning Institute and the Oxford University Computer Sciences on key skills for the completion of a successful DPhil thesis will be available. Additional on the job training opportunities will arise, and the supervisors will encourage the student to pursue such opportunities.

A core curriculum of lectures will be taken in the first term to provide a solid foundation in a broad range of subjects including musculoskeletal biology, inflammation, epigenetics, translational immunology, data analysis and the microbiome. Students will attend regular seminars within the department and those relevant in the wider University.

Students will be expected to present data regularly in the departmental PGR seminars, the pharmaco- and device epidemiology group and to attend external conferences to present their research globally.

Students will also have the opportunity to work closely with the pharmaco- and device epidemiology group and Prof Irene Petersen. Students will have access to various courses run by the Medical Sciences Division Skills Training Team and other departments. All students are required to attend a 2 - day Statistical and Experimental Design course at NDORMS.

How to Apply

The department accepts applications throughout the year but it is recommended that, in the first instance, you contact the relevant supervisors or the Graduate Studies Officer, Sam Burnell (, who will be able to advise you of the essential requirements.

Interested applicants should have or expect to obtain a first or upper second class BSc degree or equivalent, and will also need to provide evidence of English language competence. The application guide and form are found online and the DPhil will commence in October 2019.

For further information, please visit and/or contact Dr Victoria Strauss (

External supervisor

Prof Irene Petersen, University College London

Project reference number #NDORMS-2019/1


Full list


Find out more