A note on estimating the Cox-Snell R2 from a reported C statistic (AUROC) to inform sample size calculations for developing a prediction model with a binary outcome.
Riley RD., Van Calster B., Collins GS.
In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, or with a binary or time-to-event outcome. As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a prediction model with a binary outcome, the outcome proportion and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R2 (proportion of variance explained) must be specified. This proposal raises the question of how to identify a plausible value for R2 in advance of model development. Our articles suggest researchers should identify R2 from closely related models already published in their field. In this letter, we present details on how to derive R2 using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R2 for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.