Use of Machine Learning to Compare Disease Risk Scores and Propensity Scores Across Complex Confounding Scenarios: A Simulation Study.
Guo Y., Strauss VY., Khalid S., Prieto-Alhambra D.
PURPOSE: The surge of treatments for COVID-19 in the second quarter of 2020 had a low prevalence of treatment and high outcome risk. Motivated by that, we conducted a simulation study comparing disease risk scores (DRS) and propensity scores (PS) using a range of scenarios with different treatment prevalences and outcome risks. METHOD: Four methods were used to estimate PS and DRS: logistic regression (reference method), least absolute shrinkage and selection operator (LASSO), multilayer perceptron (MLP), and XgBoost. Monte Carlo simulations generated data across 25 scenarios varying in treatment prevalence, outcome risk, data complexity, and sample size. Average treatment effects were calculated after matching. Relative bias and average absolute standardized mean difference (ASMD) were reported. RESULT: Estimation bias increased as treatment prevalence decreased. DRS showed lower bias than PS when treatment prevalence was below 0.1, especially in nonlinear data. However, DRS did not outperform PS in linear or small sample data. PS had comparable or lower bias than DRS when treatment prevalence was 0.1-0.5. Three machine learning (ML) methods performed similarly, with LASSO and XgBoost outperforming the reference method in some nonlinear scenarios. ASMD results indicated that DRS was less impacted by decreasing treatment prevalence compared to PS. CONCLUSION: Under nonlinear data, DRS reduced bias compared to PS in scenarios with low treatment prevalence, while PS was preferable for data with treatment prevalence greater than 0.1, regardless of the outcome risk. ML methods can outperform the logistic regression method for PS and DRS estimation. Both decreasing sample size and adding nonlinearity and nonadditivity in data increased bias for all methods tested.