Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice.
Higgins J., Thompson S., Deeks J., Altman D.
OBJECTIVE: Heterogeneity between study results can be a problem in any systematic review or meta-analysis of clinical trials. Identifying its presence, investigating its cause and correctly accounting for it in analyses all involve difficult decisions for the researcher. Our objectives were: to collate recommendations on the subject of dealing with heterogeneity in systematic reviews of clinical trials; to investigate current practice in addressing heterogeneity in Cochrane reviews; and to compare current practice with recommendations. METHODS: We review guidelines for those undertaking systematic reviews and examine how heterogeneity is addressed in practice in a sample of systematic reviews, and their protocols, from the Cochrane Database of Systematic Reviews. RESULTS: Advice to reviewers is on the whole consistent and sensible. However, examination of a sample of Cochrane protocols and reviews demonstrates that the advice is difficult to follow given the small numbers of studies identified in many systematic reviews, the difficulty of pre-specifying important effect modifiers for subgroup analysis or meta-regression and the unresolved debate concerning fixed versus random effects meta-analyses. There was disagreement between protocols and reviews, often either regarding choice of important potential effect modifiers or due to the review identifying too few studies to perform planned analyses. CONCLUSION: Guidelines that address practical issues are required to reduce the risk of spurious findings from investigations of heterogeneity. This may involve discouraging statistical investigations such as subgroup analyses and meta-regression, rather than simply adopting a cautious approach to their interpretation, unless a large number of studies is available. The notion of a priori specification of potential effect modifiers for a retrospective review of studies is ill-defined, and the appropriateness of using a statistical test for heterogeneity to decide between analysis strategies is suspect.