Causal Forests Versus Inverse Probability of Treatment Weighting to Adjust for Cluster-Level Confounding: A Parametric and Plasmode Simulation Study Based on US Hospital Electronic Health Record Data.

Du M., Johnston S., Coplan PM., Strauss VY., Khalid S., Prieto-Alhambra D.

BACKGROUND: Rapid innovation and new regulations increase the need for post-marketing surveillance of implantable devices. However, complex multi-level confounding related to patient-level and surgeon or hospital covariates hampers observational studies of risks and benefits. We conducted two simulation studies to compare the performance of Causal Forests (CF) versus Inverse Probability of Treatment Weighting (IPTW) to reduce confounding bias in the presence of strong surgeon impact on treatment allocation. METHODS: Two Monte Carlo simulation studies were carried out: (1) Parametric simulations with patients nested in clusters (ratio 10:1, 50:1, 100:1, 200:1, 500:1) and sample size n = 10 000 were conducted with patient and cluster level confounders; (2) Plasmode simulations generated from a cohort of 9981 patients admitted for pancreatectomy between 2015 and 2019 from the US PINC AT hospital research database. Different CF algorithms and IPTW were used to estimate binary treatment effects. RESULTS: Performance varied with the strength of cluster-level confounding. Under weak to moderate surgeon influence, CF and IPTW performed similarly. When confounding was strong (OR = 2.5), CF reduced bias compared with IPTW: in parametric simulations, relative bias averaged 11.2% for CF versus 19.9% for IPTW, with similar advantages observed in plasmode simulations. CONCLUSIONS: CF shows promise as a method for estimating treatment effects in scenarios where cluster-level confounding strongly impacts treatment allocation. More research is needed to guide its use.

DOI

10.1002/pds.70257

Type

Journal article

Publication Date

2025-11-01T00:00:00+00:00

Volume

34

Keywords

causal forests, causal inference, clustered data, machine learning, propensity score, simulation study, Humans, Electronic Health Records, Monte Carlo Method, United States, Confounding Factors, Epidemiologic, Computer Simulation, Cluster Analysis, Probability, Databases, Factual, Product Surveillance, Postmarketing, Algorithms, Bias

Permalink More information Close