Strength training in people with multiple sclerosis (MS) is an important component of rehabilitation, but it can be challenging for clinicians to quantify strength accurately and reliably. This study investigated the psychometric properties of a clinical strength assessment protocol using handheld dynamometry and other objective, quantifiable tests for the lower extremities and trunk in people with MS.
This study determined discriminant validity between 25 participants with MS and 25 controls and between participants with MS who had higher versus lower disability; test-retest reliability across 7 to 10 days; and response stability. The protocol included handheld dynamometry measurements of ankle dorsiflexion, knee flexion and extension; hip flexion, extension, abduction, and adduction; and trunk lateral flexion. Muscular endurance tests were used to measure trunk extension, trunk flexion, and ankle plantarflexion.
The protocol discriminated between participants with MS and controls for all muscles tested (P < .001–.003). The protocol also discriminated between low- and moderate-disability groups (P = .001–.046) for 80% of the muscles tested. Test-retest reliability intraclass correlation coefficients were high (0.81–0.97). Minimal detectable change as a percentage of the mean was 13% to 36% for 85% of muscles tested.
This study provides evidence for the discriminant validity, test-retest reliability, and response stability of a strength assessment protocol in people with MS. This protocol may be useful for tracking outcomes in people with MS for clinical investigations and practice.
Muscle weakness is present in up to 70% of people with multiple sclerosis (MS) and often occurs early in the disease process.1 As MS-related disability progresses to the point where assistance is needed to walk, virtually all people with MS experience weakness.1 Muscle weakness tends to be more pronounced in the lower than the upper extremities,2 and it also affects the trunk.3 Strength training for people with MS can improve strength and has been shown to have an effect on mobility, quality of life, and participation.2,4,5 Although there are a variety of measures with favorable psychometric properties available to clinicians to assess changes in mobility and participation after strength training, such as the Timed 25-Foot Walk test, the 2- and 6-Minute Walk Tests, and the Multiple Sclerosis Walking Scale,6,7 it can be challenging to quantify changes in strength accurately and reliably in a clinical setting.
The standard clinical strength assessment is manual muscle testing (MMT) rated on an ordinal scale (from 0 to 5).8 Although efficient, MMT is objective only at grades 0 to 3, whereas grades 4 and 5 require a more subjective interpretation by the tester. Furthermore, MMT may not detect subtle changes in strength or early weakness that can be present in people with MS.9–11 The gold standard of quantifying muscle strength is computerized or electromechanical isokinetic dynamometry.2 These devices provide valid and reliable assessments of strength but are primarily used in research. Typically, isokinetic dynamometry is not considered feasible for clinical use because it is expensive, requires special training, and is not time-efficient for testing multiple muscles.12
Handheld dynamometry (HHD) is an instrument that can be useful for both research and clinical purposes in people with MS. Compared with MMT, HHD quantifies strength more accurately,13 is more responsive to small changes in muscle strength,14 and may be more reliable.9,11 Although not as common in research as isokinetic dynamometry, HHD has been used in a variety of studies in people with MS.15–20 Handheld dynamometry has adequate concurrent validity to isokinetic dynamometry in both healthy and other neurologic populations and has been proposed as a clinically feasible alternative to isokinetic dynamometry in terms of cost, ease of use, and efficiency.12 Handheld dynamometry has the potential to be a useful clinical outcome for people with MS; however, little is known about its psychometric properties in this population.
Discriminant validity is an important psychometric property that measures the ability of a test to differentiate between two groups. Little has been reported on discriminant validity of strength measured by HHD in people with neurologic conditions.21 In people with MS, HHD measurements of knee strength have been shown to discriminate between disability levels,22 but, to our knowledge, discriminant validity of HHD between people with MS and controls has not been reported.
Test-retest reliability assesses the consistency of an instrument and is a fundamental property of an outcome measure. Handheld dynamometry has been found to have high test-retest reliability in controls,23–26 in elderly persons who fall,27 and in other neurologic populations.21,28–31 In people with MS, test-retest reliability of HHD has been reported only for knee flexion and extension, where it has been shown to have acceptable reliability.22 Test-retest reliability has not been investigated in the ankle, hip, or trunk in people with MS.
Response stability is another crucial property of an outcome measure because it assesses the consistency of repeated tests over time, accounts for the inherent variability present in a measure, and can provide insight into the effectiveness of an intervention. Response stability of HHD in healthy adults23,29,32 and in patients with other neurologic conditions31 is likely not directly applicable to people with MS in part due to the variety of HHD testing protocols and devices but also due to the unique variability that is present in people with MS, who can experience fluctuating symptoms, which might affect HHD measurements. In people with MS, response stability of HHD has been reported between raters for a single session in knee extension, hip extension, and ankle dorsiflexion.33 Response stability has not been reported for a single rater between sessions in people with MS.
Currently, there is not a standard strength testing protocol in MS, and, therefore, it is important to establish a protocol, based on HHD, that is feasible for clinical use. However, HHD may not be valid for assessing ankle plantarflexion,34 and little has been published on HHD assessments for the trunk.35 Therefore, it is also important to consider alternative strength assessments for these muscle groups.36–38 A clinically feasible, valid, and reliable protocol to quantify lower-extremity and trunk strength in people with MS would help clinicians to better identify muscle weakness, target interventions, and track outcomes.
The objective of this study was, therefore, to establish the psychometric properties of a strength assessment protocol using HHD and other strength tests in people with MS. First, the discriminant validity of the strength assessment protocol was determined between people with MS and controls and between people with MS who had mild versus moderate disability. Second, between-session test-retest reliability and response stability were determined in people with MS.
Methods
Participants
Before participation, all the participants signed an informed consent form approved by the Colorado Multiple Institutional Review Board. Twenty-five participants with MS and 25 controls were enrolled. Participants were recruited through the Rocky Mountain MS Center at the University of Colorado Hospital Anschutz Medical Campus.
Participants with MS enrolled in this study were recruited as part of a larger investigation on the relationship of strength and gait with the following eligibility criteria: age 18 to 65 years, confirmed diagnosis of MS, able to provide consent and follow simple directions, and ambulatory for 100 m without an assistive device. Age- and sex-comparable controls without neurologic, muscular, or skeletal disorders were included. The exclusion criteria for all the participants included pain or other conditions limiting ambulation or muscle strength, or inability to give consent, follow simple directions, or ambulate 100 m without an assistive device. People with MS were also excluded if they had more than minimal lower-extremity spasticity (≥2 on the Modified Ashworth Scale), had an exacerbation or changes to drug therapy in the previous month, or were currently undergoing physical therapy for strength training.
Outcomes
Muscle strength measured by HHD (Lafayette Manual Muscle Tester; Lafayette Instrument Co, Lafayette, IN) was recorded in kilograms and normalized to body-mass index (BMI) for group comparisons. The muscle testing protocol was originally developed and used as a clinical tool by the authors and is based on a combination of existing protocols.23,29,35 Handheld dynamometry was used to measure ankle dorsiflexion; knee flexion and extension; hip flexion, extension, abduction, and adduction; and trunk lateral flexion (Figure 1). The trunk extensors were assessed by a timed prone extensor endurance test,37 and the trunk flexors by a timed curl-up test (Supplementary Figures 1 and 2, which are published in the online version of this article at ijmsc.org).38 Ankle plantarflexion strength was measured by maximal number of single-leg heel raises.36 The testing protocol was performed in a standardized order and is detailed in Table 1. Disability was measured using the Kurtzke Expanded Disability Status Scale (EDSS)39; a score of 0 to 3.5 was defined as mild disability and 4.0 to 5.5 as moderate disability.40 The primary author (M.M.M.) conducted all the strength and disability assessments, has more than 11 years of clinical experience working with people who have MS, and received standardized training in EDSS assessment through the Rocky Mountain MS Center.
Strap placements and test position for the trunk lateral flexion test
Procedure and Protocol
At the first session, demographic data (age, sex, height, and weight) were recorded, followed by muscle testing. Participants with MS also underwent EDSS assessment on day 1, then returned 7 to 10 days later for a second session to repeat the muscle testing only. Strength assessed with HHD used a “break” protocol defined by the maximal force recorded at the time the examiner overcame the resistance of the participant.25,26 The HHD protocol was designed to allow participants multiple attempts because people with MS may have other impairments (coordination, fatigue, etc.) that could affect a single test. Each muscle was assessed until two maximal contractions were within 10% of each other, and those two values were averaged. No more than five assessments were made for any one muscle. To avoid excess fatigability, rest periods of 15 seconds were used between each muscle contraction. Ankle plantarflexion and trunk endurance tests were performed only once. The weaker side in participants with MS was compared with the nondominant side in the control group, and the stronger side in participants with MS was compared with the dominant side in the control group. If no limb was clearly weaker, the side with weaker knee flexion was used. To the extent possible, participants with MS returned at a similar time of day for the second muscle testing session.
Statistical Analysis
Sample size was calculated based on detecting a mean difference in strength between the MS and control groups (discriminant validity). Preliminary data from the weaker limb of a clinical sample of people with MS was compared with published data for the nondominant limb in age- and sex-comparable controls.24 The HHD protocol used in the published data is similar to the protocol used in this study for ankle dorsiflexion (mean ± SD muscle strength, 25.20 ± 5.23 kg) and knee extension (mean ± SD muscle strength, 36.99 ± 6.12 kg).24 The preliminary mean ± SD muscle strength in people with MS for ankle dorsiflexion was 20.42 ± 5.18 kg and for knee extension was 23.60 ± 5.06 kg. The smaller of the two differences (ankle dorsiflexion) was used because it resulted in a more conservative sample size. To detect this 4.78-kg mean difference, at a power of 90% using a two-sided t test with a significance level of 5%, 25 participants per group were needed.
Between-group comparisons using independent t tests (α = .05) were made for baseline characteristics in the MS versus control groups, the mild versus moderate disability groups, and people with MS who had no spasticity versus those with mild spasticity. Discriminant validity was evaluated for each muscle group and was determined based on significant mean differences using independent t tests (P < .05). Test-retest reliability was assessed by two-way random-effects, single-measurement intraclass correlation coefficient (ICC[2,1]). The ICC values were considered excellent from 0.75 to 1.0, fair to good from 0.40 to 0.74, and poor if less than 0.40.41 Response stability of the strength assessments was calculated using the standard error of the mean (SEM) and minimal detectable change (MDC).42 The SEM is a measure of consistency of repeated measures where SEM = SD√ (1−ICC). The MDC is a measure of the minimal change needed to exceed the measurement error, where MDC95 = SEM*1.96*√ 2. Both SEM and MDC as a percentage of the mean (SEM% and MDC95%, respectively) were also calculated as a standardized way to compare response stability between the tests.
Results
Twenty-five people with MS (21 women) and 25 controls (21 women) consented to study participation. The mean ± SD age of the participants with MS was 45 ± 12.3 years, and the median EDSS score was 3.5. There were no significant differences between the participants with MS and the controls for age, sex, or BMI (Table 2) or between the disability groups in the participants with MS for age (P = .448) or BMI (P = .290). In addition, there were no significant differences between those with (n = 11) and without (n = 14) spasticity in strength for any muscle group (P = .25–.82).
There were no adverse events in this study, but discomfort during testing was reported for three participants with MS (one in the knee flexor test and two during the trunk extension test) and for one participant in the control group (during the hip flexion test). Five participants with MS were unable to perform any heel raises on the weaker limb, and two were unable to perform a single trunk curl-up. Data were still recorded for participants who experienced discomfort, and a value of zero was recorded when the test could not be performed; no data were excluded. In participants with MS, 35% of the muscles tested with HHD required more than two tests during the first session and 28% during the second. In the control group, only 17% of the muscles needed more than two assessments. Two strength values within 10% of each other were recorded within five repetitions for all the participants. The mean ± SD duration of follow-up for the second visit for participants with MS was 8.6 ± 3.2 days.
Discriminant validity was confirmed by significant differences between the participants with MS and the controls for all muscle groups, for both weaker versus nondominant (P < .001) and stronger versus dominant sides (P < .001–.003) (Table 3). Compared with the controls, where side-to-side differences were all within 5%, there were significant differences in the participants with MS between the weaker and stronger sides of 37% in ankle plantarflexion (P = .001) and 10% to 20% in ankle dorsiflexion (P = .026), knee flexion (P < .001) and extension (P = .024), and hip flexion (P < .001). Side-to-side differences in people with MS were less than 10% and nonsignificant for hip extension (P = .121), abduction (P = .054), adduction (P = .274), and lateral trunk flexion (P = .518). Discriminant validity between low- and moderate-disability groups in participants with MS was confirmed by significant differences in all the muscle groups (P = .001–.046) except for stronger-side hip extension (P = .155) weaker-side ankle plantarflexion (P = .520), and stronger-side hip adduction (P =.055), and trunk extension (P = .651) (Table 4).
Differences in normalized muscle strength values between low- and moderate-disability groups in participants with MS

Test-retest reliability ICC(2,1) values were statistically significant and excellent for all weaker-side (ICCs = 0.81–0.96) and stronger-side (ICCs = 0.83–0.97) muscles in the participants with MS (Table 5). The SEM ranged from 0.88 to 20.22, and SEM% ranged from 4.8% to 21.0%. The MDC ranged from 2.45 to 56.05, and MDC% ranged from 13.31% to 58.26% (Table 5).
Discussion
To our knowledge, this is the first study to determine the psychometric properties of a clinically feasible lower-extremity and trunk strength testing protocol for people with MS. The strength testing protocol discriminated between patients with MS who had mild-to-moderate disability and controls and, thus, may be a useful tool to identify weakness in people with MS. People with MS were significantly weaker in all muscle groups tested in the lower extremities and trunk, which is consistent with findings in people with Huntington's disease.21 Less has been reported on the discriminant validity of the non-HHD assessments used in this study, but the mean ± SD values from the present study are comparable with previously published values for people with MS in the trunk flexion test (29.48 ± 17.76 repetitions)38 and for controls in the ankle plantarflexion endurance (range, 28.0–29.8 repetitions)36 and trunk extension (range, 141–197 seconds)43 tests.
In addition to statistical significance, there were large differences in the strength values between the MS and control groups for the weaker versus nondominant limb (range, 49%–72%), stronger versus dominant limb (range, 64%–79%), and the trunk (range, 60%–81%). There were also notable side-to-side differences within the MS group, where six muscle groups were more than 10% weaker in the weaker versus stronger side. Meanwhile, differences between sides in the control group were 5% or less for all muscle groups, which is consistent with differences of up to 10% being considered normal in healthy populations.44 These data support the idea that weakness tends to affect one side of the body more than the other even early in MS.3,45 However, weakness can also be present in the stronger limb and in the trunk, so when prescribing strength training, clinicians might consider focusing on both limbs and the trunk.
This study was not powered to detect changes between disability groups in people with MS, yet there were significant differences in almost all muscle groups between those with low (n = 13) compared with moderate (n = 12) disability. These results extend previous findings of Pilutti et al.,22 who found that HHD was able to discriminate among mild, moderate, and severe disability in the knee extensors and flexors. Together, these findings support the ability of strength measurements to discriminate between level of disability in people with MS.
Test-retest reliability of the strength assessments across 7 to 10 days was good-to-excellent. This is consistent with results from other neurologic conditions for lower-extremity muscles tested via HHD, where comparably high values have been reported for both same-day (r = 0.91–0.99, ICCs = 0.86–0.99) and between-session (ICCs = 0.90–0.98) test-retest reliability.21,27–30 High reliability of a similar lateral trunk flexion protocol has also previously been reported in people with spinal cord injury (ICCs = 0.86–0.99).35 Comparable reliability has also been reported for the non-HHD assessments used in this study: the trunk flexion test has been shown to have high reliability in people with MS (ICC = 0.995),38 the ankle plantarflexion assessment has been shown to have good reliability in healthy males (ICCs = 0.78–0.84),36 and the range of values of the trunk extension test in controls and people with low back pain is fair-to-excellent (ICCs = 0.54–0.99).43 Based on these findings, the protocol described in this study can be considered for use as an outcome measure in clinical practice for people with MS.
Response stability as measured by the SEM and MDC was acceptable for most muscle groups and further supports the use of this protocol as an outcome measure. Only SEM and MDC values for ankle plantarflexion, trunk extension, and trunk flexion were unfavorable. Of note, strength values in the participants with MS for ankle plantarflexion, trunk extension, and trunk flexion were also among the weakest muscle groups compared with controls and had some of the highest levels of variance. Possibly the difficulty of these tests led to more variable performance, which contributed to the larger SEM and MDC values. However, the remainder of the SEM values in this study were considerably smaller than those reported by Toomey and Coote33 for intertester reliability of HHD in people with MS. This suggests that HHD may be better suited for a single rater to assess changes after intervention, rather than between multiple raters.
The MDC% values from this study indicate that a change in weaker-side muscle strength of 22% to 28% in ankle dorsiflexion, knee flexion and extension, hip flexion and abduction, and trunk lateral flexion is needed to detect real change outside of the inherent variability of the tests. These values are within the range of published MDCs calculated for HHD in people with cerebral palsy.31 Changes in the remaining five muscles tested had a higher MDC95% (range, 33%–58%). Nevertheless, there have been reports of strength changes measured by HHD of up to 53% in hip extension and 95% in knee extension for people with MS.19,20 Other studies have measured comparable improvements using isokinetic dynamometry for ankle plantarflexion (range, 52%–55%)46,47 and maximal repetitions on leg press and reverse leg press (range, 29%–32%).48,49 Although data from this study on response stability are promising, more investigation is needed on the responsiveness of these measures, including meaningful change.
This study has several limitations. The person performing the assessments in this study was experienced at using this protocol in this population, and another tester may have produced different results. Also, there may be a learning effect of the protocol, as more participants with MS required more than two HHD trials to obtain a consistent result on the first day than on the second day. Although reliable, the ankle plantarflexion, trunk extension, and trunk flexion tests had large MDCs and SEMs and so may not be as useful for tracking outcomes but still may be considered for evaluative purposes; future studies might explore alternative clinical strength assessments that allow for clinical assessment of these muscle groups.
This sample was part of a larger study that required stricter exclusion criteria; therefore, people with higher disability (EDDS score >6) were excluded. This limits generalizability of this study to people with MS who can ambulate unassisted for at least 100 m, and future studies are needed to determine the usefulness of this protocol in advanced disability. Generalizability was further decreased by excluding people who had more than minimal spasticity in the lower extremities. Although spasticity should not affect the strength assessment protocol any more than other impairments that are common in MS, and there were no differences in strength between those with mild spasticity and those without, caution should be applied if generalizing this protocol to people with MS who have moderate-to-severe spasticity.
Conclusion
This study provides evidence for the discriminant validity, test-retest reliability, and response stability of a clinically feasible strength assessment protocol in people with mild-to-moderate disability with MS. The equipment required for the protocol is affordable, and no formal training is required. Future investigations are needed to establish utility in people with MS who have higher disability and greater-than-minimal spasticity. Also, the responsiveness of this protocol to intervention needs to be determined. Meanwhile, this protocol can be considered for clinical practice to establish a baseline and to track changes in strength.
PracticePoints
This study found that an objective and quantifiable clinically feasible strength assessment protocol for people with MS was reliable and valid. Use of this assessment protocol in the clinical setting may help clinicians better identify weakness and track changes in strength over time.
Response stability was acceptable for most muscle groups assessed using this protocol, providing clinicians with strength outcomes that offer insight into the effectiveness of the intervention on the impairment level.
The results of this study show that weakness in people with MS tends to be worse on one side of the body compared with the other. However, compared with controls, people with MS had significant weakness in weaker and stronger limbs and in the trunk. Strengthening in people with MS, therefore, should include both weaker and stronger limbs and the trunk.
Acknowledgments
We acknowledge the University of Colorado Hospital Department of Rehabilitation for providing testing equipment, administrative support, and use of clinical space.
Financial Disclosures
The authors have no conflicts of interest to disclose.
References
Author notes
From the Physical Therapy Program, Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Denver, Aurora, CO, USA (MMM, JRH, MS); and Department of Rehabilitation, University of Colorado Hospital, Aurora, CO, USA (MMM).
Note: Supplementary material for this article is available on IJMSC Online at ijmsc.org.