Item Response Theory Validation of the Forgotten Joint Score for Persons Undergoing Total Knee Replacement
Khatri C., Harrison CJ., Clement ND., Scott CEH., MacDonald D., Metcalfe AJ., Rodrigues JN.
Background: The Forgotten Joint Score (FJS), a commonly used patient-reported outcome measure, was developed without fully confirming assumptions such as unidimensionality (all items reflect 1 underlying factor), appropriate weighting of each item in scoring, absence of differential item functioning (in which different groups, e.g., men and women, respond differently), local dependence (pairs of items are measuring only 1 underlying factor), and monotonicity (persons with higher function have a higher score). We applied item response theory (IRT) to perform validation of the FJS according to contemporary standards, and thus support its ongoing use. We aimed to confirm that the FJS reflects a single latent trait. In addition, we aimed to determine whether an IRT model could be fitted to the FJS. Methods: Participants undergoing primary total knee replacement provided responses to the FJS items preoperatively and at 6 and 12 months postoperatively. An exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and Mokken analysis were conducted. A graded response model (GRM) was fitted to the data. Results: A total of 1,774 patient responses were analyzed. EFA indicated a 1-factor model (all 12 items reflecting 1 underlying trait). CFA demonstrated an excellent model fit. Items did not have equal weighting. The FJS demonstrated good monotonicity and no differential item functioning by sex, age, or body mass index. GRM parameters are reported in this paper. Conclusions: The FJS meets key validity assumptions, supporting its use in clinical practice and research. The IRT-adapted FJS has potential advantages over the traditional FJS: it provides continuous measurements with finer granularity between health states, includes individual measurement error, and can compute scores despite more missing data (with only 1 response required to estimate a score). It can be applied retrospectively to existing data sets or used to deliver individualized computerized adaptive tests. Level of Evidence: Prognostic Level II. See Instructions for Authors for a complete description of levels of evidence.