Validation Study
Validating OMOP-mapped CPRD data for health economic analyses in the UK
Status
Ongoing, preliminary results dissemination in progress
Our team
-
Rafael Pinedo Villanueva
Associate Professor
-
Gianluca Fabiano
Senior Researcher in Health Economics
-
Njoki Njuki
Data Analyst
-
Xihang Chen
Research Assistant in Health Data Sciences
-
Antonella Delmestri
Lead Health Data Scientist
Funded by
HEOR Group
What problem are we trying to address?
Every day, the NHS collects vast amounts of information from GP visits, hospital stays, outpatient appointments, and emergency care. These data are invaluable for understanding which treatments work best, how healthcare resources are used, and where services can improve.
However, health records are often stored in different formats and coding systems, making them hard to combine or compare. To overcome this, researchers are transforming data into common data models — standardised structures that enable consistent analysis across institutions, regions, and countries.
One of the most widely adopted frameworks is the OMOP Common Data Model, used by international initiatives such as EHDEN, DARWIN EU, and the NHS Research Secure Data Environment. This approach has already transformed research in drug safety and pharmacoepidemiology.
Yet, while OMOP and other common data models have helped to standardise clinical outcomes and diagnoses, very little is known about whether mapped data can be reliably used to inform health economic studies by providing estimates of healthcare resource use and costs.
What was our main objective?
To assess whether healthcare resource use and costs estimated using OMOP-mapped data are equivalent to those obtained from using the source CPRD and HES data using a cohort of post-menopausal women with fragility fractures in the UK.
Why is this project important?
Healthcare budgets are under pressure worldwide, and the NHS is no exception. Making good decisions about how resources are allocated requires reliable evidence on both the effectiveness of treatments and the costs of providing care. Standardised data models offer a powerful way to generate such evidence quickly and consistently, allowing the same analytic code to run in federated or distributed environments without needing to transfer sensitive patient-level data between institutions or countries. Previous research has shown that using standardised data can reduce analysis time by up to 80% compared to source data analysis.
However, trust is key. Policymakers, regulators, and industry partners must be confident that analyses conducted on mapped datasets reflect the reality captured in the original health records. Without this assurance, valuable opportunities to use these new tools in health economics could be missed.
By validating OMOP-mapped CPRD-HES data against its source counterpart, this project will help answer a simple but crucial question: can we rely on these standardised datasets to inform decision making on healthcare resource allocation?
What data did we use?
Our study focuses on a well-defined patient group: women in the UK who have experienced fragility fractures after the menopause. Fragility fractures, such as hip, spine or other, are common in postmenopausal women and carry significant personal, clinical, and economic burden. In the study we have used patient data from 2010-2018 and measured resource use and costs outcomes over a 730 day follow up period after the occurrence of their first ever fragility fracture.
Which methods did we use?
We analysed two versions of the same data:
- The source data — taken directly from the Clinical Practice Research Datalink (CPRD) and linked hospital records from Hospital Episodes Statistics (HES), reflecting how the information is originally recorded in NHS systems.
- The OMOP-mapped data — the same CPRD and HES records transformed into the OMOP common data model, which standardises the structure and terminology.
Our analysis covers:
- Primary care: number of encounters, including by healthcare professional specialty, and their associated costs.
- Hospital emergency care: number of visits and associated costs.
- Hospital inpatient care: number of admissions, length of stay, diagnoses, procedures, and associated costs.
- Hospital outpatient care: number of appointments, including by specialty-specific activity, and associated costs.
We estimated average use of resources and costs per patient over a two-year period as well as per year after fracture. To generate confidence intervals around these estimates, we applied bootstrap sampling so we could compare these between the source and OMOP-mapped datasets to assess their equivalence.
What have we found?
Primary care: OMOP estimates were equivalent to those from the source data, with resource use lower on average by 1.8% and costs by 2.3%, neither being statistically significant. These differences can be further minimised through refinements to mappings and vocabularies. See Figure 1 below.
Hospital emergency care: ongoing, results being generated.
Hospital inpatient care: ongoing, results being generated.
Hospital outpatient care: ongoing, results being generated.
Looking ahead
Although this project is focused on postmenopausal fractures, the methods and lessons learnt will apply far more broadly. Once tested and refined, the approach could be applied to study the economic burden of a wide range of conditions and treatments across diverse datasets and countries, paving the way for federated health economics analyses that support informed decision-making in the UK and internationally.
Mean number of primary care encounters and associated costs estimated using OMOP-mapped and source CPRD dataKey research question
Are healthcare resource use and costs estimated using OMOP-mapped data equivalent to those that would be obtained from using the original primary and hospital care data?
Headline summary of findings
For our cohort of post-menopausal women with fragility fractures in the UK, OMOP-mapped primary care resource use and costs estimated using OMOP-mapped data are nearly identical to those estimated using CPRD data. The work on hospital care is currently ongoing.