Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, or with a binary or time-to-event outcome. As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a prediction model with a binary outcome, the outcome proportion and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R2 (proportion of variance explained) must be specified. This proposal raises the question of how to identify a plausible value for R2 in advance of model development. Our articles suggest researchers should identify R2 from closely related models already published in their field. In this letter, we present details on how to derive R2 using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R2 for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.

Original publication




Journal article


Stat med

Publication Date





859 - 864


C statistic (AUROC), R squared, clinical prediction model, sample size, Computer Simulation, Humans, Sample Size