The completely patient-reported version of the American Orthopaedic Foot and Ankle Society (AOFAS) score: A valid and reliable measurement for ankle osteoarthritis

Background: The American Orthopaedic Foot and Ankle score (AOFAS) is an outcome measure for ankle and hindfoot conditions, which requires scoring from both the patients and the physician. A completely patient-reported version has been developed and used before, but its measurements properties are unknown. Our goal was to determine the measurement properties and the minimally important change (MIC) of a completely patient-reported AOFAS (PR-AOFAS) in patients with ankle osteoarthritis. Additionally, the MIC of both the PR-AOFAS and the AOFAS was estimated, which had not previously been done. Materials and methods: The PR-AOFAS of 112 patients was evaluated for reliability, construct validity (using the AOFAS, Foot and Ankle Outcome Score, Ankle Osteoarthritis Score, Visual Analogue Scale, and Short Form-36), and responsiveness. The MIC was estimated using the optimal cut-off point of the receiver operating characteristic curve. This was a substudy of a randomized clinical trial on the ef ﬁ cacy of platelet-rich plasma injections for ankle osteoarthritis (OA). Results: The PR-AOFAS had suf ﬁ cient construct validity, internal consistency, test – retest reliability, and responsiveness. The smallest detectable change at group level was 2.34. The MIC was 6.5 points (95% con ﬁ dence interval: 0.6 – 14.4). Conclusions: The measurement properties of the Dutch PR-AOFAS were suf ﬁ cient in patients with ankle osteo-arthritis who are willing to participate in a trial on injection therapy. The minimally important change of the PR-AOFAS is smaller than its smallest detectable change, making it more suitable for use in groups of patients, such as a research setting. Level of clinical evidence: 1.


Introduction
Patient-reported outcome measures (PROMs) enable quantification of outcomes such as pain and function and its impact on quality of life.To adequately interpret treatment effects, an outcome score must be reliable, valid, and responsive to change [1].The American Orthopaedic Foot and Ankle Society (AOFAS) score is one of the most frequently used scores for foot and ankle conditions [2].It combines five patient-reported items concerning pain and function with four physician-determined items concerning function and alignment, on a 0to -100-point scale [3].
The measurement properties of the AOFAS score were assessed in two studies (n ¼ 133 and n ¼ 117) with patients with end-stage ankle OA [2,3] and in two studies (n ¼ 9 and n ¼ 8) that included patients with ankle OA amongst other ankle and hindfoot conditions [4,5].The available data on ankle OA found the patient-reported part of the AOFAS score to be valid and responsive [2][3][4][5][6][7][8][9].However the minimally important change (MIC) of the AOFAS score has not been determined for ankle osteoarthritis.
Completely patient-reported outcome scores are easier for the patient, are less time-consuming, and result in more patient compliance, leading to higher response rates and thereby reduce risk of bias and increase generalizability [2,10].In a prospective multicenter randomized controlled trial in the Netherlands, including 152 patients on the routine versus on-demand removal of syndesmotic screws, the physician-determined items were modified to patient-reported items, allowing for a completely patient-reported version of the AOFAS score [11].However, the measurement properties of this completely patient-reported AOFAS score had not been determined [11].
Our goal was to determine the measurement properties and the MIC of a Dutch completely patient-reported AOFAS (PR-AOFAS) score in patients with ankle osteoarthritis.Additionally, the MIC of both the PR-AOFAS and the AOFAS scores was estimated, which had not previously been done.

Study design
This is a substudy of the Platelet-Rich plasma Injections in the Management of Ankle oa (PRIMA) trial, a randomized, double-blind, placebocontrolled, multicentre prospective study, designed to determine the efficacy of platelet-rich plasma injections in the management of ankle OA [12,13].The PRIMA trial is approved by the Medical Ethics Review Committee Amsterdam Medical Center, the Netherlands (ABR 2018-042, approved 23 July 2018) and is registered in the Netherlands trial register (NTR7261).This substudy was sponsored by the Marti-Keuning Eckhardt Foundation, a nonprofit patient organization.

Study population
Patients with ankle OA in six hospitals in the Netherlands (2 University Medical Centres, 2 teaching hospitals, a general hospital, and a focus clinic) were informed of the study.All participants signed an informed consent form before participating in the study.Patients were eligible for inclusion if they had a severity of ankle OA pain on a Visual Analogue Scale (VAS 0-100 mm) !40 mm during daily activities, radiographs (anteroposterior and lateral view) indicating !grade 2 talocrural OA on the van Dijk classification (joint space narrowing, with or without osteophytes) [14], and were !18 years of age.Patients were excluded if they had received injection therapy for ankle OA in the previous 6 months, did not want to receive one of the two therapies, had clinical signs of concomitant OA of one or more other major joints of the lower extremities that negatively affects their daily activity level, or had had a previous ankle surgery for OA or osteochondral defects in <1 year (not including surgery for an ankle fracture in the past).

Study procedures
This study was performed according to the COnsensus-based standards for the selection of health Measurement INstruments checklist as a guideline [15].Following inclusion, as far as relevant for this study, patients completed questionnaires at baseline, 6, 12, 26 and 52 weeks [12].The questionnaires completed included Dutch versions of the PR-AOFAS score, Foot and Ankle Outcome Score (FAOS), Ankle Osteoarthritis Score (AOS), VAS, and study short form 36 (SF-36) [12].In order to assess the test-retest reliability, at a minimum follow-up of 26 weeks under assumption of no change, an additional PR-AOFAS score was sent two weeks after a previous one with an additional question to assess change of symptoms.A Dutch version of the AOFAS score was taken during outpatient visits at baseline, 6 weeks, and 26 weeks.In order to minimise bias, all patients were seen by the same coordinating research physician of the PRIMA study.

The original patient-physician-determined American Orthopaedic Foot and Ankle society score
The AOFAS score measures pain function and alignment and was translated and validated in Dutch for ankle fractures [8].The physician-determined items were performed by the same coordinating research physician.It consists of 9 items and 3 subscales (pain -1 item, function -7 items, and alignment -1 item) of which 3 items from the function scale and the alignment scale are physician-determined.The maximum scores on all three subscales are 40, 50, and 10 points, respectively [16].

A completely patient-reported version of the American Orthopaedic Foot and Ankle society score
For the PR-AOFAS score, the questionnaire from the RODEO trial was used [11].Here, the physician-determined items were made more comprehendible for the patients [11].These changes are presented in the Supplement.Where previously the physician determined sagittal and hindfoot motion and ankle-hindfoot stability and alignment, the patient was now asked in patient-friendly terms how they would rate these items.For instance, "How would you rate the mobility of your ankle compared to the other side, or compared to when you did not have symptoms."Similar to the AOFAS score, the PR-AOFAS score consists of 9 items and 3 subscales (pain -1 item, function -7 items, and alignment -1 item).

The Foot and Ankle Outcome Score
The FAOS is used for functional assessment of ankle and hindfoot conditions, consists of five subscales (pain, other symptoms, activities of daily living [ADL], sport and recreation, and foot-and ankle-related quality of life) and has been shown to have sufficient reliability and validity in patients with ankle and hindfoot symptoms [17].There are a total of 42 items, each question is assigned 0-4 points based on the answer given [17].The scale runs from 0 (extreme symptoms) to 100 points (no symptoms).

Medical outcomes study short form 36
The SF-36 is a generic outcome measuring quality of life and has 36 items, consisting of 8 subscales (0-100 points).Of these subscales, 4 make out the Physical Component Summary (PCS) score, and 4 make out the Mental Component Summary (MCS) score (0-100 points) [18].The higher the patient scores, the higher the quality of life.

The visual analog score for pain
The VAS score (VAS 0-100 mm) is measured during activities of daily living, with 0 mm being no pain and 100 mm the worst pain imaginable.

The Ankle Osteoarthritis Score
The AOS measures pain (9 items) and disability (9 items) with a total of 18 items on a VAS (0-100 mm) and has been shown to be valid, reliable, and responsive to change in patients with ankle OA [2].Higher scores indicate worse pain and disability.

Subjective patient satisfaction
The subjective patient satisfaction was asked according to a 4-point Likert scale (poor, fair, good and excellent) and served as an anchor to determine the responsiveness of the completely PR-AOFAS score and calculate minimal clinically important change values.

Statistical analysis
Both PR-AOFAS and AOFAS scores were assessed for reliability, construct validity, responsiveness, and interpretability (floor and ceiling effects) [19][20][21].Due to the absence of a retest of the AOFAS score, only internal consistency was evaluated as a reliability measure.Reliability assessment of the PR-AOFAS score comprised of internal consistency, test-retest reliability, standard error of measurement (SEM), and smallest detectable change (SDC).Additionally, the Minimal clinically Important Change (MIC) of both scales was calculated [20,21].

Reliability
Reliability shows the degree to which the scale is free of measurement error and will be assessed through internal consistency, test-retest reliability, and measurement error.Internal consistency demonstrates the inter-relatedness among items of an outcome measure or their subscales.A Cronbach's α between 0.70 and 0.95 demonstrates satisfactory internal consistency for both PR-AOFAS and AOFAS scores as long as the scale is unidimensional [21,22].Test-retest reliability determines the ability to measure the same when patients with unchanged complaints take the questionnaire twice.Patients were only included in the test-retest reliability analysis for the PR-AOFAS score assessment if they had confirmed that no change had occurred.The test-retest reliability was determined using the intraclass correlation coefficient (ICC) alongside a SEM and was deemed reliable if ICC was !0.70 [23,24].The SEM was calculated as the square root of the within-subject variance (i.e., the sum of the between-measures variance and the residual variance) [25].The SDC (SDC ¼ 1.96*√2*SEM) provides the magnitude of change needed to have confidence that the change is not a consequence of measurement error [26].

Construct validity
Construct validity measures the degree to which the intended construct is measured [19].This was assessed by determining the correlation of the PR-AOFAS and the AOFAS scores (and their subscales) with the FAOS subscales, AOS, and VAS and the PCS and MCS scores of the SF-36 at baseline using Spearman's correlation coefficient (r s ).
In total, 21 hypotheses were determined for the PR-AOFAS score and 21 for the AOFAS score.At least, moderate a priori hypothesized correlations were to be expected between the total scores and the FAOS subscales, AOS total score, the PCS score of the SF-36 and the VAS.For the pain subscale of the PR-AOFAS and AOFAS scores, high correlations were expected with the pain subscales of the FAOS, the AOS, and the VAS and moderate correlations with the PCS and the symptoms subscale of the FAOS.For the function subscale of the PR-AOFAS and AOFAS scores, high correlations were expected with the ADL and sport and recreation subscales of the FAOS, the AOS disability subscale, and the PCS.Finally, all domains of both outcomes were expected to be unrelated (r < 0.3) to the MCS.Confirmation of at least 75% of the aforementioned hypotheses indicates sufficient construct validity [21].
A separate evaluation was performed to assess whether the PR-AOFAS and AOFAS assessments measure identical constructs.For this purpose, additional a priori correlation coefficients of >0.8 were to be expected between the PR-AOFAS and the AOFAS scores and their identical subscales (pain, function and alignment).

Responsiveness
Responsiveness of a PROM is defined as its ability to detect change over time [19,27].It was determined by comparing both the PR-AOFAS and AOFAS scores at the baseline and 26 weeks.Change from the baseline at 26 weeks of the 4-point Likert-scaled subjective patient satisfaction was used as an anchor.Changes in the anchor were defined as worsened, unchanged, improved, and greatly improved [28].Patients who improved at least 1 category were considered to have had a clinically important improvement [28].The effect sizes (ESs) and standardized response means (SRMs) were calculated for each category of changes in the anchor [29,30].Based on Cohen's standardized effect size, the following was hypothesized: [31].
-ES and SRM <0.2 for patients who reported to be unchanged -ES and SRM !0.2 for patients who reported to be improved -ES and SRM !0.5 for patients who reported to be much improved Responsiveness was further evaluated using the receiver operating characteristic (ROC) curve analyses.ROC curves give information on the true positive rate (sensitivity) and false-positive rate (1-specificity) for cut-off points in change score [32].The area under an ROC curve (AUC) indicates the chance that the patient is indeed correctly improved or unimproved, a value of at least 0.7 was deemed sufficient [15,21,32].

Interpretability
Interpretability is the meaning that can be assigned to a score or change in score [19].In this study, interpretability was measured by determining floor and/or ceiling effects.If a large number of patients score the maximum or minimum score, the instrument can fail to detect clinical improvement or deterioration.A floor and/or ceiling effect is present if more than 15% of the study participants had the lowest or highest possible score [20,21].

Minimal clinically important change
The Minimal Important Change (MIC) is the minimal amount of change that is perceived by the patient as "important" [20].Using the previously described anchor, the patients were categorized as either improved (or more) or unchanged (patients who were worse were excluded) [33].By use of ROC analysis, the MIC was calculated as the cut-off point that indicated the least amount of misclassification [33].A bootstrapping procedure (with 1000 bootstraps) was performed to estimate the standard error and determine the 95% confidence interval (CI).

Agreement of the PR-AOFAS and the AOFAS scores
Agreement between the PR-AOFAS and the AOFAS scores was determined using a Bland-Altman plot with limits of agreement [20,34].Additionally, ICCs were calculated between the PR-AOFAS, the AOFAS scores as well as between the change from baseline of the both outcome measures.

Inclusion
In total 112 patients were enrolled in the PRIMA study to receive either an intra-articular PRP or placebo (saline) injection and completed all baseline questionnaires and physical examinations (Table 1).In total, 63 (56%) were male and 49 (44%) female.The mean age, duration of symptoms, and body mass index was 55 years (SD 14 years), 9 years (SD 9 years), and 26.5 kg/m 2 (SD 3.8 kg/m 2 ), respectively.In total, 95 patients completed the second PR-AOFAS assessment (sent within 2 weeks of a previous one), of which 36 patients (38%) reported a change in complaints.Therefore, 59 patients (62%) were included in the test-retest analysis.

Construct validity
Spearman's correlation coefficients were calculated between both PR-AOFAS and AOFAS scores, and the PCS and MCS scores of the SF-36, VAS pain, AOS and its subscales, and the FAOS scales (Table 2).Of the a priori-formulated hypotheses, 95% were confirmed for the PR-AOFAS assessment and 71% for the AOFAS assessment.For the PR-AOFAS assessment, all correlations were as expected except for the correlations between the pain subscale from the PR-AOFAS and the FAOS symptoms subscale.For the AOFAS subscales, six hypotheses had to be rejected as correlations were not as expected (Table II).
Additionally, all hypotheses concerning the association between the two AOFAS versions were rejected as all correlation coefficients were smaller than expected.

Responsiveness
In total, 8 effect sizes and standardized response means for both PR-AOFAS and the AOFAS scores were calculated to be 100% and 75% in agreement with the predefined hypotheses, respectively (Table 3).The outcome disagreed with the hypotheses for the AOFAS in the category patients unchanged and exceeded the predefined hypothesis by 0.2.The AUC of the PR-AOFAS and the FAOS were 0.77 (95%CI: 0.68 to 0.87) and 0.66 (95% CI: 0.54.; 0.76), respectively, implying that only the PR-AOFAS assessment was considered to meet the requirements of proper responsiveness.

Interpretability
Clear-ceiling effects can be seen for both the alignment subscales of the PR-AOFAS and the AOFAS (Table 1).A floor effect was observed for the AOFAS pain subscale (Table 1).

MIC
The ROC-based calculation of the MIC was 6.5 points (95%CI: 0.6 to 14.4) for the PR-AOFAS score and 17.5 points (95%CI 2.5 to 32.5) for the AOFAS score.Accompanying sensitivity and specificity of the PR-AOFAS score were 0.42 and 0.80, respectively.The sensitivity and specificity of the AOFAS score were 0.36 and 0.89, respectively.

Agreement of the PR-AOFAS and the AOFAS scores
The ICC of the PR-AOFAS and the AOFAS scores was 0.70 (95%CI: 0.57 to 0.79) at the baseline and 0.56 (95%CI: 0.41 to 0.69) for the Table 1 Descriptives and percentages highest and lowest scores of the completely patient-reported American Orthopaedic Foot and Ankle Society, original patient-physiciandetermined American Orthopaedic Foot and Ankle Society, short form 36 subscales, Visual Analogue Scale pain, Ankle Osteoarthritis Score, and Foot and Ankle Outcome Score at the baseline.PR-AOFAS: patient-reported American Orthopaedic Foot and Ankle Society; AOFAS the original patient-physician-determined AOFAS; PROM: patient-reported Outcome Measure; n: number; CI: confidence interval.change from baseline.Using the Bland-Altman analysis, a systematic difference of 4.6 points (95%CI: 2.3 to 6.8) was found with limits of agreement of À18.4 and 27.6 points around this value in favour of the AOFAS score (Fig. 1).AOFAS the original patient-physician-determined American Orthopaedic Foot and Ankle Society (AOFAS) score; PR-AOFAS the completely patient-reported version of AOFAS score; PROM Patient Reported Outcome Measure.

Discussion
Our key finding is that the Dutch version of the PR-AOFAS score has sufficient construct validity, internal consistency, test-retest reliability, and responsiveness in patients with ankle osteoarthritis who are willing to participate in a trial on injection therapy.The MIC was 6.5 points (95% CI: 0.6 to 14.4) for the PR-AOFAS score.The AOFAS score has insufficient construct validity, internal consistency, and responsiveness to change in the same population.The MIC was 17.5 points (95%CI: 2.5 to 32.5) for the AOFAS score.Although both outcome measures aim to measure identical constructs, correlation coefficients and agreement measures did not support this.
In contrast to the AOFAS score, the PR-AOFAS score has sufficient construct validity and had a higher correlation with other PROMs (Physical and Mental Component Summary scores of the SF-36, VAS pain, AOS, FAOS pain) than the AOFAS score in this study or reported in the literature [3,4,8,9].Moderate correlations for the AOFAS score with all scales of the FAOS and the PCS score of the SF-36 was in line with other studies [3,4,8,9].
A comparable test-retest (ICC: 0.89) capability was found for the PR-AOFAS score compared to what is previously reported (ICC: 0.89) for the patient-reported items of the AOFAS assessment (evaluating 5 of the 9 items) [3].Although the responsiveness was calculated differently in the literature, the PR-AOFAS score was found to be sufficiently and the AOFAS to be insufficiently responsive to change [4,5,8,9].Similar to previous studies, both the PR-AOFAS and the AOFAS scores had ceiling effects at the baseline in the alignment subscale but neither scores had ceiling or floor effects in the total score [8,9].One study found ceiling effects in the total score after 7.5 months following an ankle fracture, this is likely because a large group of patients fully recovered [8].
A sufficient internal consistency was found for the function subscale (Cronbach' items of the function scale in two studies and reported a Cronbach's α of 0.927 and 0.947 [8,9] However, these studies concerned patients with ankle fractures.It must also be taken into consideration that too high a Cronbach's α may be an indication that different questions are capturing the same symptoms or limitations [21]. The minimal important change of the AOFAS score had not previously been calculated for patients with ankle osteoarthritis.In this study, the minimal clinically important change of the PR-AOFAS and the AOFAS scores was found to be smaller than its SDC (individual).As a result, this makes it less suitable for follow-up of individual patients in a clinical setting and more suitable for use in groups of patients, such as a research setting.The 95% CI from the MIC values of both outcome measures are large, indicating that this estimator should be interpreted with caution.
Although the PR-AOFAS score was expected to be similar in construct, it cannot replace the AOFAS score due to the systematic difference between the outcome measures and low level of agreement.There may be several explanations for this poor agreement.Firstly, the lower correlation could be explained by the fact that both instruments are measuring something else.The insufficient construct validity when compared to other PROMs may be a possible explanation.However, also disability weights assigned by patients have been found to be higher than the weights based on the judgement of the physician in osteoarthritis and other musculoskeletal conditions [35].This is further supported by the fact that the PR-AOFAS score has a higher correlation with the other PROMs as they all measure the patient's perspective.One study also found that the AOFAS had a similar responsiveness to other PROMs once the objective items were removed [2].Secondly, the correlation of identical patient-reported items also differed.Answering the same patient-reported items could therefore have been affected by the presence of a physician or by being in a hospital or clinical setting.Finally, the ceiling effect in the alignment subscale of both the PR-AOFAS and the AOFAS scores and the floor effect in the pain subscale of the AOFAS score may result in restricted variation with a subsequent lower correlation.
Strengths of this study include the structural and elaborate evaluation of a new ankle-specific PROM with sufficient measurement properties and a moderate to high correlation with a large range of ankle-specific and generic measuring instruments, a large sample size, and the calculation of the minimal important change in a homogenous population with the same diagnosis.Limitations include a limited anchor with a 4-point Likert scale and small samples for both instruments in the worsened and greatly improved categories for the evaluation of the responsiveness.Also the content validity of neither PR-AOFAS nor the AOFAS score was assessed.Content validity includes relevance and comprehensibility of the items.Relevance and comprehensibility, usually conducted through cognitive interviews of a small sample of patients, measures the patient's understanding of the item as it is intended (i.e., does the patient understand the difference between ankle and foot mobility) and its deemed relevance to the construct.Content validity should be evaluated in future research.

Conclusion
The Dutch version of the PR-AOFAS score has sufficient construct validity, internal consistency, test-retest reliability, and responsiveness in patients with ankle osteoarthritis who are willing to participate in a trial on injection therapy.The minimally important change of the PR-AOFAS score is smaller than its SDC, making it suitable for use in groups of patients, such as a research setting.group and were not involved in any aspect of the study described in this article.

Fig. 1 .
Fig.1.Bland-Altman plot for the completely patient reported AOFAS and the original patient-physician-determined AOFAS AOFAS: the original patient-physician-determined American Orthopaedic Foot and Ankle Society score; PR-AOFAS the completely patient-reported American Orthopaedic Foot and Ankle Society score; PR: patient-reported; CI: confidence interval.

Table 2
Construct validity using Spearman's correlation coefficients (R s ) between the patient-reported American Orthopaedic Foot and Ankle Society and American Orthopaedic Foot and Ankle Society, Physical and Mental Component Summary scores of the short form 36, Visual Analogue Scale pain, Ankle Osteoarthritis Score and the Foot and Ankle Outcome Score scales.Bold illustrates a priori defined hypothesized correlations.Underlines confirms the a priori defined hypotheses.Negative elations are due to the reversed scale for these measures.PR-AOFAS: patient-reported American Orthopaedic Foot and Ankle Society; PROM: patient-reported Outcome Measure; AOFAS original patient-physiciandetermined AOFAS; PCS: Physical Component, MCS: Mental Component Summary, VAS: Visual Analogue Scale; AOS: Ankle Osteoarthritis Score; FAOS: Foot and Ankle Outcome Score; ADL: Activities of Daily Living; S & R: Sport and Recreation; QoL: quality of life.

Table 3
Descriptives, effect sizes, and standardized response means of the completely patient-reported American Orthopaedic Foot and Ankle Society and the original patient-physician-determined American Orthopaedic Foot and Ankle Society to change.
s α 0.71) and insufficient internal consistency for the total score (Cronbach's α 0.68) of the PR-AOFAS score.Insufficient internal consistency was found for both the function subscale (Cronbach's α 0.40) and total score (Cronbach's α 0.47) of the AOFAS score.A Cronbach's α