To determine whether concentric evertor muscle weakness was associated with functional ankle instability (FAI).
We conducted an electronic search through November 2007, limited to English, and using PubMed, Pre-CINAHL, CINAHL, and SPORTDiscus. A forward search was conducted using the Science Citation Index on studies from the electronic search. Finally, we conducted a hand search of all selected studies and contacted the respective authors to identify additional studies. We included peer-reviewed manuscripts, dissertations, and theses.
We evaluated the titles and abstracts of studies identified by the electronic searches. Studies were selected by consensus and reviewed only if they included participants with FAI or chronic ankle instability and strength outcomes. Studies were included in the analysis if means and SDs (or other relevant statistical information, such as P values or t values and group n's) were reported for FAI and stable groups (or ankles).
Data were extracted by the authors independently, cross-checked for accuracy, and limited to outcomes of concentric eversion strength. We rated each study for quality. Outcomes were coded as either fast or slow velocity (ie, equal to or greater than 110°/s or less than 110°/s, respectively).
Data included the means, SDs, and group sample sizes (or other appropriate statistical information) for the FAI and uninjured groups (or ankles). The standard difference in the means (SDM) for each outcome was calculated using the pooled SD. We tested individual and overall SDMs using the Z statistic and comparisons between fast and slow velocities using the Q statistic. Our analysis revealed that ankles with FAI were weaker than stable ankles (SDM = 0.224, Z = 4.0, P < .001, 95% confidence interval = 0.115, 0.333). We found no difference between the fast- and slow-velocity SDMs (SDMFast = 0.189, SDMSlow = 0.244, Q = 29.9, df = 24, P = .187). Because of the small SDM, this method of measuring ankle strength in the clinical setting may need to be reevaluated.
Our meta-analysis showed that participants with functional ankle instability had weaker ankles than participants with stable ankles.
No difference between fast-velocity and slow-velocity strength testing was noted.
Functional ankle instability (FAI) remains a significant problem for both the general and athletic populations. Ankle injuries account for approximately 14% of all injuries in collegiate sports,1 and 20% to 40% of all ankle sprains result in FAI.2–4 Functional ankle instability is clinically important because it prevents approximately 6% of patients from returning to their occupations,5 and 13% to 15% of patients remain occupationally handicapped for at least 9 months and up to 6.5 years after injury.5,6 Originally defined by Freeman et al,7 FAI is a condition that results in the sense of giving way after an ankle sprain. A variety of mechanisms have been proposed as the cause of FAI, including delayed neuromuscular response,8 proprioceptive deficits,9,10 impaired balance,11 and ankle evertor weakness.12
Examination of the FAI literature indicates that the relationship between ankle muscle strength (especially evertor strength) and FAI is equivocal. Several groups13–19 have failed to find ankle strength differences in participants with FAI. However, others20–25 have reported decrements in FAI ankle concentric evertor isokinetic strength. The reason for these discrepancies is unclear. However, we suspect that one potential reason is the variation in velocities used to test ankle strength. The slowest testing reported was an isometric contraction: 0°/s.22 Alternatively, isokinetic testing has been used and can theoretically range from just above isometric to concentric speeds approaching 300°/s: values ranged from 30°/s14–17,25–29 to 240°/s.26 This wide range of velocities inherently increases the variability of results and may explain the literature's lack of clarity.
Whether evertor weakness is associated with FAI is unclear based on the current literature. Thus, our primary objective was to combine study outcomes to determine the association between FAI and evertor weakness. Because of the variety of velocities used to test evertor strength, velocity was a possible confounding factor. Therefore, we also sought to determine whether strength differences were affected by testing speed by categorizing the outcomes in each study into slow and fast velocities to create a velocity moderator variable.
We hypothesized that when studies were combined, ankles with FAI would be weaker than uninjured ankles. We also hypothesized that slow and fast testing velocities would produce different effect sizes but that both the fast and slow effect sizes would indicate that FAI ankles were weaker than uninjured ankles.
Published Study Selection
Our search strategy was conducted in 4 stages (Figure 1). For stage 1, we performed an initial electronic search and evaluated studies for inclusion. In stage 2, we used the previously selected studies and performed a forward search. Stage 3 consisted of a hand search of the reference lists of articles selected in stages 1 and 2. The final stage consisted of our contacting the corresponding authors of the previously selected studies to solicit any additional references. Stages 2 and 3 were iterative until no further articles were identified.
Our electronic literature search was restricted to English-language publications found in the following databases through November 2007: PubMed (National Library of Medicine, Bethesda, MD), Pre-CINAHL, CINAHL, and SPORTDiscus. We searched the latter 3 databases simultaneously using EBSCOhost (EBSCO Industries, Inc, Birmingham, AL). The strategy and search results are presented in Table 1. Our senior research team member (B.L.A.) directed the search. He has 14 years of research experience and expertise in the area of FAI and was assisted by 2 doctoral students specializing in FAI research.
Article Inclusion and Exclusion Criteria
Once we completed the initial search, we reviewed the titles and abstracts of each article retrieved by the search. After reading the titles and abstracts, we included articles based on a group consensus. For us to include an article, it had to either have reported means and SDs for an injured group (or ankles) and an uninjured comparison group (or ankles) or have statistics reported in enough detail that we could calculate effect sizes. We also required each study to use an inclusion criterion (or FAI definition) of giving way or frequent sprains or to have described the target condition as FAI.
In addition to journal articles, we also reviewed any theses and dissertations retrieved as part of the initial search. Both types of manuscripts were included in the analysis if they met our inclusion criteria. Abstracts from conferences were not reviewed for inclusion because of their limited availability in the electronic databases.
Forward and Hand Search
We conducted our forward search using the Science Citation Index (The Thomson Corp, New York, NY). This search was conducted on each of the previously included articles. Specifically, we used each article as a search item and searched from its publication date forward to November 2007. The search identified other articles that cited the initial article. Our review and inclusion or exclusion of articles identified in the forward search was identical to that described above. The references from the articles of the initial and forward searches were then hand searched to identify other potential articles.
Contact With Authors
To complete the search, we contacted the corresponding author of each article by letter or e-mail. In the correspondence, we listed which articles of theirs we included and asked that they identify additional articles for us to consider. We then applied our inclusion criteria to determine whether to include the additional studies.
For each study, 3 investigators independently extracted means and SDs or other appropriate measures, such as participant n and t values, for the FAI and stable ankle groups (or ankles in studies with contralateral comparisons). We resolved any discrepancies by reexamining the data and agreeing by consensus on the final data to be included. For studies using treatments, we selected only the pretreatment data for inclusion. If pretreatment data were not available, the study was excluded.
The outcomes we selected for this study were limited to measures of evertor muscle strength. Within this restriction, any mode of strength assessment (eg, isokinetic, isometric, isotonic) was eligible for inclusion. However, after we applied the inclusion criteria, only isokinetic studies with speeds ranging from 30°/s to 240°/s were included. The outcomes for each study are presented in Table 2.
Our analysis was not restricted to any particular research design or question. To be included, each study had to have at least 1 measure of evertor strength and provide a comparison between injured and uninjured participants or ankles. The typical study selected for review was a case-control study, but randomized controlled trials and ex post facto designs were eligible provided pretreatment data were included for both groups. Study populations were not restricted. Table 3 includes the participant demographics for each study.
To assess quality, we developed a 20-item questionnaire30 to identify possible threats to construct, internal, and external validity as identified by Cook and Campbell.31 Based on validity threats, we tailored our questions to ankle instability research. Each item was scored as yes or no, with 3 items potentially scored as not applicable. Thus, the overall score was calculated as the percentage of items scored as yes after any not applicable items were removed. Three reviewers independently scored each study. The mean of the 3 reviewers' scores served as the quality score. Meta-regression was then used to establish the relationship between study quality and the standard difference in the means (SDM).
We completed the statistical analysis using Comprehensive Meta-Analysis (version 2.2.034; BioStat International, Inc, Tampa, FL). Depending on the information presented in each study, we entered data either as means and SDs (n = 11) or as paired groups using sample size and the paired t value (n = 1). From these data, we calculated the SDM with the pooled SD for each study and used it for statistical analysis. We used the Z statistic (which follows the normal distribution) to test whether individual and study category SDMs were different from zero.32 However, before calculating the Z value, we assessed the heterogeneity of effect sizes among the studies using the Q statistic (which approximates the χ2 distribution). If the Q value was significant (ie, the between-studies variance was greater than chance expectations), we computed the Z statistic using a random-effects model. Otherwise, the Z statistic was calculated using a fixed-effects model.33 We also used the standardized residual to identify outcomes that were outliers. Studies with standardized residuals greater than or equal to 3.0 were deleted from the analysis.34
Multiple Isokinetic Velocities Within a Single Study
Several groups included strength assessments at more than 1 velocity (n = 6). Although the literature does not provide a clear distinction between fast and slow test velocities, it is common for authors to test at more than 1 velocity, ranging from relatively slow to relatively fast velocities.14,15,25,26,28,35 Depending on the source, the combined inversion-eversion motion ranges from 50°20 to 55°.36 By arbitrarily defining fast as completing the range of motion in 0.5 seconds or less, we determined that isokinetic velocities at or above 110°/s (55°/0.5 s = 110°/s) would be termed fast. Thus, using 110°/s as the cutoff, we created a moderator variable that divided velocities into fast and slow.
Because we expected that our results might differ across the range of velocities used in ankle strength testing, we elected to treat multiple velocities within a single study as independent. However, multiple velocities within a study are not, in fact, independent, so the standard error and confidence interval (CI) for the overall effect are smaller than expected, the statistical test for the overall effect is liberal, and the comparison between fast and slow velocities is conservative. The alternative would have been to average the velocities within each study. Yet doing so would have eliminated the ability to compare fast and slow velocities.
To compare the effects of fast and slow velocities, our first step was to test for heterogeneity among the studies using the Q statistic. If the Q statistic was nonsignificant, no further testing was warranted. With a significant Q statistic, we grouped study outcomes based on test velocity as a moderator variable.
Bias was assessed using 3 techniques. First, a funnel plot was created to visually interpret the data (ie, data points should be symmetrical within the funnel). Next, we used the Egger regression intercept method37 and the Duvall and Tweedie38 trim-and-fill procedure to confirm the funnel plot. We elected to conduct the bias assessment on the mean effect for each study (ie, the SDM calculated from the average of the study's outcomes). We did this because bias is more appropriately related to studies, not outcomes, and because bias can have multiple causes (eg, study quality39) that would be expected to affect all of a study's outcomes.
Identification of Participant Characteristics
We extracted study and participant characteristics from either reported characteristics or reported inclusion and exclusion criteria (Tables 2 and 3). No consistency is evident in the participant characteristics reported. Because of this variability, we did not attempt to compare studies using these characteristics. However, studies that included statistical comparisons between injured and uninjured groups (or ankles) received additional credit as part of our quality assessment.
To evaluate our quality assessment, we calculated the intraclass correlation coefficient (2,1) on the independent study ratings given by our 3 evaluators. This produced a coefficient of 0.676 with a 95% CI of 0.360 to 0.882. This result is similar to previous reports on the PEDro quality assessment tool.40 The average quality score was 26.6% ± 9.4%, with a range of 15% to 42%. No relationship was found between study quality and the SDM (slope = −0.002, P = .78), indicating that study quality did not relate to greater or smaller effect sizes.
Our initial overall analysis revealed 1 outlier (ie, a standardized residual equal to or greater than 3.0)33 in the data set (ie, Willems et al).28 Thus, we deleted this study (and 2 corresponding outcomes) from the overall analysis. For the overall analysis investigating whether FAI was associated with ankle weakness, we found no heterogeneity across outcomes (Q = 29.9, df = 24, P < .187). Thus, the use of the fixed-effects model was warranted, revealing an SDM of 0.224 (Z = 4.0, P < .001, 95% CI = 0.115, 0.333). In other words, ankle weakness was associated with functionally unstable ankles (Figure 2).
Differences Between Velocities
Based on the nonsignificant Q value reported above, we determined that a separate analysis for velocity was unnecessary because the lack of heterogeneity meant that velocity was not affecting the data in a meaningful way. The SDMs for fast and slow velocities are presented in Figure 2.
The Egger intercept was 0.974 (P = .28, 2 tailed) and the Duvall and Tweedie trim and fill identified no studies to be filled in the bias funnel plot (Figure 3), These results indicate that our findings were not biased by unpublished or inaccessible (ie, “fugitive”)41 studies.
We conducted this meta-analysis because of conflicts in the literature. Several groups13–17,26,27,35 have failed to detect concentric evertor weakness in participants with FAI. In contrast, others22–25,28 have reported ankle weakness with FAI. Thus, our primary intent was to clarify this discrepancy. Our findings indicate that concentric evertor weakness is present in participants with FAI. Specifically, we noted the SDM between stable and unstable ankles to be significant but small (0.224). In other words, on average, the means of the FAI groups and stable groups were separated by 0.224 SDs (Figure 2). Interestingly, of the 27 outcomes included in this analysis, only 4 outcomes (limited to 1 study) were significant (as indicated by the P value associated with the Z value). This finding clearly illustrates the usefulness of meta-analysis: when the studies were combined, a significant result that most individual studies failed to demonstrate was revealed. The most likely reason for the lack of differences in the individual studies was inadequate statistical power. Using the Cohen method for determining sample size,42 the overall effect size of 0.224 from this analysis, an alpha level of .05, and a target power of 80%, we determined that 313 participants would be needed per group in a 2-group analysis to detect this effect. Of the studies we included, Lentell et al15 had the largest group (N = 42). Not surprisingly, none of the outcomes of Lentell et al displayed a significant difference.
Based on these findings, we question whether ankle weakness can be detected clinically. One method of determining clinical detectability is by calculating the minimal detectable change (MDC). The MDC is calculated as for differences between 2 measures. Using the average intraclass correlation coefficient (0.89) from previous studies43,44 and the pooled SD of the injured ankles from our analysis (5.62 Nm), the calculated MDC was 2.64 Nm. Similarly, using our overall SDM (Figure 2) and multiplying it by the pooled SD, the difference between stable and unstable ankles was 1.26 Nm, below the MDC.
The clinical importance of this difference is unknown. The 1.26-Nm difference between stable and unstable ankles seems too small to be clinically important. However, it appears likely to us that the Nm unit may not be the best for ankle strength measurements. The moment arm of the peroneal muscles is only 21 to 22 mm.45 Thus, 1.26 Nm converts to a 57.3- to 60.0-N (12.9- to 13.5-lb) weakness of the peroneal muscles. We believe this may represent a clinically meaningful weakness of the ankle evertors.
Based on the relatively small overall SDM, we question previous suggestions that ankle strength training may be beneficial to only a minority of patients16 and that return-to-play criteria not be based on strength.26 Given the result of our analysis, we suspect that strength is an important factor. Strength might be better assessed using a measure of muscle force rather than joint torque. Finally, we believe that evidence of ankle evertor weakness in FAI is sufficient to warrant randomized control trials focused on strength training at the ankle.
Differences Between Velocities
Six of the 12 groups included in this analysis used more than 1 velocity. Although we are unaware of any previous research suggesting this is necessary, we believe it to be a common practice based on theoretical grounds. Specifically, faster velocities are perceived as being more representative of joint power than strength. We also suspect that clinicians and researchers believe that capturing strength across the spectrum of testing speeds offers a more complete assessment of ankle strength. However, our results suggest that this is not the case. Our analysis (Figure 2) did not reveal a difference between fast and slow velocities. Based on this finding, we would recommend performing clinical testing at only 1 velocity and using any velocity above 30°/s and below 240°/s. However, with the small differences described above and the force-velocity relationship of muscle, testing at slower velocities to maximize the difference would seem wiser to us.
Assessment of Bias
We did not detect any publication bias in our analysis. Publication bias may actually be the result of multiple factors, including true publication bias as well as bias in the retrieval method.37 The primary concern is that studies may be missing from the data set and that these missing studies may actually cause a false-positive result. One of the easiest ways to detect bias is via the funnel plot (Figure 3). When no bias exists, studies are expected to distribute equally to the left and right of the SDM (ie, the open diamond). When bias exists, the typical pattern is an absence of data points (ie, studies) on the lower, left-hand side of the funnel. This is because the left hand side of the plot represents studies having or approaching either no effect or a negative effect. For example, in our analysis, studies between the open diamond (SDM = 0.19) and zero had no strength differences or small differences, with unstable ankles being weaker. Conversely, studies to the left of zero indicated that stable ankles were weaker than unstable ankles (ie, a negative effect). Studies tend to be missing from the lower portion of the plot because they have smaller sample sizes and, thus, larger standard errors. (It should be noted that, by convention, the funnel plot has a reversed y-axis, with higher values at the origin.) In other words, it is typical for small studies to be published only when they have large (ie, significant) effect sizes.
As can be seen from our funnel plot, studies are equally distributed to the left and right of the overall SDM, reflecting no bias. Although the funnel plot is a useful tool in visually identifying bias, as with regression plots, the funnel plot can be deceptive. Thus, we conducted confirmatory statistical analyses (ie, the Egger intercept and the Duvall and Tweedie trim-and-fill tests). Both of these tests also failed to detect bias.
Assessment of Quality of Included Studies
One concern with any meta-analysis is the effect of study quality on the analysis; that is, does individual study quality influence its effect size (SDM), and does that influence the overall effect? We are not aware of any quality assessment tool that has been specifically designed for nonexperimental or quasi-experimental designs. Rather, the existing tools are designed for randomized controlled trials. The consequence is that studies that are not randomized or do not have appropriate control groups are necessarily scored lower. Although we agree that this is appropriate for studies comparing treatments, it seems inappropriate for studies that are not specifically designed to compare treatments. Therefore, we created a new quality scoring tool based on the threats of internal, construct, and external validity identified by Cook and Campbell.31 Our intertester reliability (0.676) was comparable with that of the PEDro.40 We would consider this intraclass correlation coefficient fair to good. Unlike laboratory measures that are quite stable, parts of this assessment were more subjective and potentially more error prone. The range of the CIs (0.360 to 0.882) suggests that the reliability may be quite high but could also be rather low. This finding suggests to us that users should practice with the tool before engaging in a serious quality assessment of studies.
There was no evidence that study quality affected the results, but the overall quality of the included studies was not high. It is important to note that quality is evaluated based on what the authors reported, not what they did. In our view, the quality score reflects at least 2 facets of the research process: how the research was conducted and how it was reported (which also includes the editorial process). Certain studies may have received lower-than-deserved scores because of how they were reported. For example, if multiple levels of outcome measures (eg, isokinetic velocities) were not reported as counterbalanced or randomly assigned (ie, monomethod bias31), studies lost points. This does not mean that levels of an outcome were not counterbalanced, only that such assignments were not reported. Thus, the low quality ratings of the included studies should be viewed cautiously, because they do not solely represent the quality of the research.
Despite previous authors' failure to demonstrate strength differences between participants with FAI and those with stable ankles, our meta-analysis clearly shows that weakness is associated with FAI. Based on this finding, we disagree with suggestions that strength assessment27 and strength training15 are not important parts of return-to-play criteria and rehabilitation, respectively. However, because of the small SDM and MDC calculated for this investigation, it may be necessary to rethink how ankle strength is measured, particularly whether Newtons (force) or Newton-meters (torque) are the preferred units. Finally, our results suggest that the velocity of strength testing is not a relevant factor in ankle strength testing.
Brent L. Arnold, PhD, ATC, FNATA, contributed to conception and design; acquisition and analysis and interpretation of the data; and drafting, critical revision, and final approval of the article. Shelley W. Linens, PhD, ATC, and Sarah J. de la Motte, PhD, ATC, contributed to conception and design; acquisition of the data; and drafting, critical revision, and final approval of the article. Scott E. Ross, PhD, ATC, contributed to conception and design; analysis and interpretation of the data; and drafting, critical revision, and final approval of the article.