Early prediction of bronchopulmonary dysplasia: comparison of modelling methods, development and validation studies.
Torchin H., Dhiman P., Ancel P-Y., Durrmeyer X., Jarreau P-H., Nuytten A., Truffert P., Zeitlin J., Collins GS.
BACKGROUND: Machine-learning methods are gaining in popularity to predict medical events but their added value to other methods is still to be determined. We compared performances of clinical prediction models for bronchopulmonary dysplasia (BPD) or death in very preterm infants using logistic regression and random forests methods. METHODS: Two population-based cohorts of very preterm infants were used: EPIPAGE-2 (France, 2011) for development and internal validation and EPICE (Europe, 2011) for external validation. Eligible infants were born before 30 weeks' gestation and admitted in neonatal units. BPD was defined as any respiratory support at 36 weeks postmenstrual age. Candidate predictors were available shortly after birth or at day 3. Logistic regression and random forest models performance was assessed in terms of discrimination (c-statistic) and calibration plots. RESULTS: Prevalence of BPD/death was 32.1% (668/1923) in EPIPAGE-2 and 41.0% (1368/3335) in EPICE. At both time points, logistic regression and random forest models showed similar performance during internal validation. At birth, external validation in EPICE showed good discrimination (logistic regression model: c-statistics 0.81, 95% CI 0.80-0.83; random forest: 0.80, 95% CI 0.79-0.81) but both models underestimated the probability of BPD/death. Model performances were heterogeneous throughout European regions. CONCLUSIONS: Both modelling methods performed similarly to predict BPD/death shortly after birth in very preterm children. IMPACT: Whether machine-learning methods predict better short-term respiratory outcomes in very preterm infants than logistic regression models is debated. Random forest-based prediction models did not perform better than logistic regression to predict bronchopulmonary dysplasia or death shortly after birth in very preterm infants. Calibration performances varied among European countries. While offering the same performance, regression models are easier to understand, to disseminate and to apply to different populations.