Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

OBJECTIVE: We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques. STUDY DESIGN AND SETTING: We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes. RESULTS: We included 152 studies, 58 (38.2% [95%CI 30.8-46.1]) were diagnostic and 94 (61.8% [95%CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n=133, 87.5% [95%CI 81.3-91.8]), focused on binary outcomes (n=131, 86.2% [95%CI 79.8-90.8), and did not report a sample size calculation (n=125, 82.2% [95%CI 75.4-87.5]). The most common algorithms used were support vector machine (n=86/522, 16.5% [95%CI 13.5-19.9]) and random forest (n=73/522, 14% [95%CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n=494/522, 94.6% [95%CI 92.4-96.3]). CONCLUSIONS: Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models. SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.

Original publication




Journal article


J clin epidemiol

Publication Date



development, diagnosis, predictive algorithm, prognosis, risk prediction, validation