Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

The increasing availability of large combined datasets (or big data), such as those from electronic health records and from individual participant data meta-analyses, provides new opportunities and challenges for researchers developing and validating (including updating) prediction models. These datasets typically include individuals from multiple clusters (such as multiple centres, geographical locations, or different studies). Accounting for clustering is important to avoid misleading conclusions and enables researchers to explore heterogeneity in prediction model performance across multiple centres, regions, or countries, to better tailor or match them to these different clusters, and thus to develop prediction models that are more generalisable. However, this requires prediction model researchers to adopt more specific design, analysis, and reporting methods than standard prediction model studies that do not have any inherent substantial clustering. Therefore, prediction model studies based on clustered data need to be reported differently so that readers can appraise the study methods and findings, further increasing the use and implementation of such prediction models developed or validated from clustered datasets.

Original publication




Journal article



Publication Date





Humans, Checklist, Prognosis, Models, Statistical