Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis.
Kuo RYL., Harrison C., Curran T-A., Jones B., Freethy A., Cussons D., Stewart M., Collins GS., Furniss D.
Background Patients with fractures are a common emergency presentation and may be misdiagnosed at radiologic imaging. An increasing number of studies apply artificial intelligence (AI) techniques to fracture detection as an adjunct to clinician diagnosis. Purpose To perform a systematic review and meta-analysis comparing the diagnostic performance in fracture detection between AI and clinicians in peer-reviewed publications and the gray literature (ie, articles published on preprint repositories). Materials and Methods A search of multiple electronic databases between January 2018 and July 2020 (updated June 2021) was performed that included any primary research studies that developed and/or validated AI for the purposes of fracture detection at any imaging modality and excluded studies that evaluated image segmentation algorithms. Meta-analysis with a hierarchical model to calculate pooled sensitivity and specificity was used. Risk of bias was assessed by using a modified Prediction Model Study Risk of Bias Assessment Tool, or PROBAST, checklist. Results Included for analysis were 42 studies, with 115 contingency tables extracted from 32 studies (55 061 images). Thirty-seven studies identified fractures on radiographs and five studies identified fractures on CT images. For internal validation test sets, the pooled sensitivity was 92% (95% CI: 88, 93) for AI and 91% (95% CI: 85, 95) for clinicians, and the pooled specificity was 91% (95% CI: 88, 93) for AI and 92% (95% CI: 89, 92) for clinicians. For external validation test sets, the pooled sensitivity was 91% (95% CI: 84, 95) for AI and 94% (95% CI: 90, 96) for clinicians, and the pooled specificity was 91% (95% CI: 81, 95) for AI and 94% (95% CI: 91, 95) for clinicians. There were no statistically significant differences between clinician and AI performance. There were 22 of 42 (52%) studies that were judged to have high risk of bias. Meta-regression identified multiple sources of heterogeneity in the data, including risk of bias and fracture type. Conclusion Artificial intelligence (AI) and clinicians had comparable reported diagnostic performance in fracture detection, suggesting that AI technology holds promise as a diagnostic adjunct in future clinical practice. Clinical trial registration no. CRD42020186641 © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by Cohen and McInnes in this issue.