Selection Bias from Data Processing in N3C
Haghighathoseini A., Qodrati M., Min H., Leslie T., Frankenfeld C., Menon NM., Wojtusiak J., Wilcox AB., Lee AM., Graves A., Anzalone A., Manna A., Saha A., Olex A., Zhou A., Williams AE., Southerland A., Girvin AT., Walden A., Sharathkumar AA., Amor B., Bates B., Hendricks B., Patel B., Alexander C., Bramante C., Ward-Caviness C., Madlock-Brown C., Suver C., Chute C., Dillon C., Wu C., Schmitt C., Takemoto C., Housman D., Gabriel D., Eichmann DA., Mazzotti D., Brown D., Boudreau E., Hill E., Zampino E., Marti EC., Pfaff ER., French E., Koraishy FM., Mariona F., Prior F., Sokos G., Martin G., Lehmann H., Spratt H., Mehta H., Liu H., Sidky H., Awori Hayanga JW., Pincavitch J., Clark J., Harper JR., Islam J., Ge J., Gagnier J., Saltz JH., Saltz J., Loomba J., Buse J., Mathew J., Rutter JL., McMurry JA., Guinney J., Starren J., Crowley K., Bradwell KR., Walters KM., Wilkins K., Gersing KR., Cato KD., Murray K., Kostka K., Northington L., Pyles LA., Misquitta L., Cottrell L., Portilla L., Deacy M., Bissell MM., Clark M., Emmett M., Saltz MM., Palchuk MB., Haendel MA., Adams M., Temple-O'Connor M., Kurilla MG., Morris M., Qureshi N., Safdar N., Garbarini N., Sharafeldin N., Sadan O.
This study investigates potential selection bias in outcome prediction within the National COVID Cohort Collaborative (N3C) resulting from arbitrarily made decisions. In the processing of health data, decisions regarding cohort criteria and variable selection are often arbitrarily made, potentially introducing selection bias. This work explores if such decisions affect results of data analysis and potential conclusions of research studies. An experiment is conducted in which four arbitrary decisions are made. Results demonstrate significant differences in the obtained datasets and indicate a high potential for bias based on inclusion or exclusion decisions. The findings contribute to informed healthcare policies, better decision-making, and improved patient outcomes, emphasizing the necessity for testing assumptions and decisions in ongoing research that uses clinical data.