Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Objective This research aims to establish an efficient, systematic, reproducible, and transparent solution for advanced curation of real-world data, which are highly complex and represent an invaluable source of information for academia and industry. Materials and methods We propose a novel software solution that splits the statistical analytical pipeline into two phases. The first phase is implemented through Curator, which performs data engineering and data modelling on deidentified real-world data to achieve advanced curation and provides selected information ready to be analyzed in the second phase by statistical packages. Curator is made of a suite of Python programs and uses MySQL as its database management system. Curator has been utilised with several UK primary and secondary care data sources. Results Curator has been used in 25 completed clinical and health economics research studies. Their output has been published in 2 NIHR-funded reports and 33 prestigious international peer-reviewed journals and presented at 38 global conferences. Curator has consistently reduced research time and costs by over 36% and made research more reproducible and transparent. Discussion Curator fits in well with recent UK governmental guidelines that recognise health data curation as a complex standalone technical challenge. Curator has been used extensively on UK real-world data and can handle several linked datasets. However, for Curator to be accessed by a wider audience, it needs to become more user-friendly. Conclusion Curator has proven to be a cost-effective and trustworthy data curation tool, which should be developed further and made available to third parties.

More information Original publication

DOI

10.1016/j.imu.2023.101291

Type

Journal article

Publisher

Elsevier

Publication Date

2023-06-07T00:00:00+00:00

Volume

40

Keywords

data wrangling, real-world data, EHR, electronic health records, RWD