Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Having a sufficient sample size is crucial when developing a clinical prediction model. We reviewed details of sample size in studies developing prediction models for binary outcomes using machine learning (ML) methods within oncology and compared the sample size used to develop the models with the minimum required sample size needed when developing a regression-based model (Nmin). METHODS: We searched the Medline (via OVID) database for studies developing a prediction model using ML methods published in December 2022. We reviewed how sample size was justified. We calculated Nmin, which is the minimum required sample size needed when developing a regression-based model, and compared this with the sample size that was used to develop the models. RESULTS: Only one of 36 included studies justified their sample size. We were able to calculate Nmin for 17 (47%) studies. 5/17 studies met Nmin, allowing to precisely estimate the overall risk and minimise overfitting. There was a median deficit of 302 participants with the event (n= 17; range: -21331 to 2298) when developing the ML models. An additional three out of the 17 studies met the required sample size to precisely estimate the overall risk only. CONCLUSION: Studies developing a prediction model using ML in oncology seldom justified their sample size and sample sizes were often smaller than Nmin. As ML models almost certainly require a larger sample size than regression models, the deficit is likely larger. We recommend that researchers consider and report their sample size and at least meet the minimum sample size required when developing a regression-based model.

Original publication

DOI

10.1016/j.jclinepi.2025.111675

Type

Journal

J clin epidemiol

Publication Date

13/01/2025

Keywords

machine learning, methodology, oncology, prediction model, sample size, systematic review