Predicting and promoting physical performance are important goals within the tactical professional community. Movement screens are frequently used in this capacity but are poor predictors of performance outcomes. It has recently been shown that prediction improved when movement quality was evaluated under load, but the mechanisms underlying this improvement remain unclear. Because balance, range of motion, and strength are mutually relevant to physical performance and movement quality, these attributes may mediate load-related decreases in movement quality and account for the resulting increase in performance prediction.
To quantify the roles of balance, range of motion, and strength in mediating load-related decreases in clinical movement-screen scores.
Crossover study.
Research laboratory.
Twenty-five male (age = 23.96 ± 3.74 years, height = 178.82 ± 7.51 cm, mass = 79.66 ± 12.66 kg) and 25 female (age = 22.00 ± 2.02 years, height = 165.40 ± 10.24 cm, mass = 63.98 ± 11.07 kg) recreationally active adults.
Participants completed a clinical movement screen under a control condition and while wearing an 18.10-kg weighted vest as well as tests of balance, range of motion, and strength.
Item score differences were assessed using Wilcoxon signed rank tests for matched pairs. Interactions between (1) balance, range of motion, and strength and (2) load condition were modeled using penalized varying-coefficients regression with item scores as the dependent measure.
Except for the hurdle step, item scores were lower in the weighted-vest than in the control condition for all tests (P < .05). Except for rotary stability, F statistics were significant for all models (P values < .05, R2 values = 0.22–0.77). Main effects of balance, range of motion, and strength on Functional Movement Screen scores were observed (P < .05); however, little evidence was found to suggest that these attributes mediated load-related decreases in Functional Movement Screen item scores.
Balance, range of motion, and strength affected movement quality but did not mediate the effect of the load treatment.
Functional Movement Screen item scores decreased when testing was administered with a standardized external load.
Balance, range of motion, and strength appeared to be important contributors to item score variability, with or without an external load.
Balance, range of motion, and strength did not mediate the effect of load on item scores.
Predicting and promoting physical performance are important goals in the tactical professional community. Within the military specifically, broad human-performance–optimization initiatives1 have been launched that take a multifaceted approach to addressing performance deficits. Movement screens have become increasingly integrated with these initiatives as a correlate of injury risk and performance, and their use in other tactical professional settings has followed suit.2,3
The popularity of movement screening has grown dramatically in recent years as evidenced by the number of screens that have been developed. Examples include the Functional Movement Screen (FMS),4,5 the Return to Duty screen,6 and the Athletic Ability Assessment7 to name a few. Of these, the FMS is likely the most popular and well researched with extensive application in tactical professional populations.2 However, despite its popularity, the FMS is a poor predictor of physical performance outcomes.8
It has recently been shown that FMS scores decreased when the screen was administered using a standardized external load.9 Further, item scores from an FMS administered when wearing an external load were better predictors of tactical performance than were conventional FMS item scores.9 This latter finding might suggest that a load-enhanced screening can sensitize tools such as the FMS to movement deficits that are relevant to physical performance. Yet it remains unclear what mechanisms might account for this improved predictability and to what extent they can be evaluated using load-enhanced screening. Importantly, understanding these mechanisms could potentially improve our ability to identify and develop high-performing tactical professionals.
Performance is a multidimensional construct with several underlying factors. Classical models10 of human performance identify components such as strength, speed, power, agility, balance, flexibility, coordination, and endurance, of which several are also suggested to influence performance on FMS movement tests.4,5,11 The factors that mutually affect physical performance and clinical movement screens—particularly when such screens are modified to incorporate external load carriage—may mediate the observed changes in movement quality and the resulting increase in its association with tactical performance. We propose here that balance, range of motion, and strength are potential factors underlying these observations. Of the traits that are theorized to be assessed by clinical movement quality scales, each of these 3 has a role in promoting athleticism12–14 and each has been shown to interact with external loading.15–17
This article presents part 1 of a 2-part work on the relationship between clinical movement screens and physical performance. Part 1 focused on clinically observable (ie, noninstrumented) aspects of movement-screen performance. Specifically, the purpose of this investigation was to examine the roles played by balance, range of motion, and strength in mediating load-related decreases in clinically rated movement quality, specifically FMS item scores. Part 2 investigated the relationship between these same factors and the dynamic behavior of the movement system during similar clinical movement tests. The knowledge derived from these works could provide a theoretical basis for the development of tactical training programs and refined movement-quality assessments with increased predictive validity for a variety of outcomes. We hypothesized that decreases in FMS item scores in the weighted-vest condition would be smaller in individuals with high levels of balance, range of motion, and strength.
METHODS
This study used a randomized crossover trial to quantify the mediating effects of balance, range of motion, and strength on within-participants differences in movement quality related to external loading. Approval was obtained from the institutional review board of the University of North Carolina at Greensboro. Data were collected in a laboratory setting by a single investigator experienced in the required measurement techniques. Twenty-five male (age = 23.96 ± 3.74 years, height = 178.82 ± 7.51 cm, mass = 79.66 ± 12.66 kg) and 25 female (age = 22.00 ± 2.02 years, height = 165.40 ± 10.24 cm, mass = 63.98 ± 11.07 kg) recreationally active adults were recruited. Participation was limited to individuals between 18 and 34 years of age in order to reflect the recruitment pool for military and tactical occupations. Participation was additionally limited by the following exclusion criteria: (1) reporting less than 90 minutes of physical activity per week, (2) chronic instability of the lower extremity, (3) clinical hypermobility condition, (4) dizziness or balance problem, or (5) recent (<6 months) musculoskeletal injury. All volunteers provided written consent to participate and completed a physical activity readiness questionnaire to screen for disqualifying injuries or medical conditions before data collection.
Procedures
Participants reported to the laboratory for a single data-collection session. After consent and completion of the physical activity readiness questionnaire, participants proceeded through the data collection in the following order: (1) balance, (2) range of motion, (3) FMS testing, (4) strength testing. This was determined to be the optimal order for minimizing the influence of fatigue on our outcomes. Testing procedures lasted approximately 2 hours.
Balance
Balance was tested in quiet, single-legged stance using a portable Accusway force plate and Balance Clinic software (Advanced Mechanical Technology, Inc, Watertown, MA). Participants were instructed to remain as motionless as possible for the duration of the test. Testing was conducted using the nondominant (ie, nonkicking) limb with hands on hips and eyes closed. Mediolateral and anteroposterior center-of-pressure (COP) coordinates were calculated from the raw force data sampled at 100 Hz. These data were used to create a resultant displacement time series, which was then differentiated with respect to time to yield a resultant COP velocity (CPV) series. The mean of this velocity series was recorded for each participant. The first 10 seconds of CPV data was used for analysis in this investigation.
Range of Motion
Range of motion was quantified using 3 validated clinical measures: (1) the Apley scratch test quantified range of motion in the shoulders and thoracic spine, (2) the sit-and-reach test measured hip and trunk flexibility, and (3) the weight-bearing–lunge test measured dorsiflexion range of motion.
The Apley scratch test closely mirrors the FMS shoulder-mobility test and has demonstrated high interrater (0.89–0.97) and intrarater (0.92–0.99) reliability.18 The test begins with participants standing with arms at their sides. When directed, the participant attempts to touch the hands together behind his or her back. With one hand, the participant reaches behind his or her head and down the back. The other hand reaches behind his or her lower back and up the spine. The distance between the participant's hands is measured with tape and recorded (in centimeters) as the score. In the present study, the average score of the left and right sides was used for analysis.
The sit-and-reach test was conducted using a 30.5-cm wooden box in accordance with the procedures outlined by Ayala et al,19 in which a test-retest reliability coefficient of 0.92 was reported. Participants sat on the floor with their legs together and fully extended. For each participant, the examiner positioned the wooden box so that it was touching the soles of the participant's feet, which were aligned with the 22-cm mark. Participants were instructed to place one hand on top of the other with palms facing down and to keep the knees and elbows extended. They were then instructed to reach forward along the measuring tape as far as possible and to hold the terminal position for 6 seconds. Participants repeated the testing procedures until their scores stabilized to within 1 cm for 3 successive efforts.
The weight-bearing–lunge test was conducted according to the methods of Hoch et al.20 This test began with the participant facing a wall and standing with the test foot aligned with a strip of measuring tape placed perpendicular to the wall. The nontest foot was stepped back 12 to 18 in (30.5–45.7 cm) for support. While keeping the heel of the test foot firmly on the ground, the participant was instructed to bend at the knee until his or her knee contacted the wall. After being familiarized with the task, participants moved progressively further away from the wall and repeated the procedure until they were unable to move any further away without lifting the heel of the test foot during the lunge. Interrater and intrarater correlation coefficients of 0.97 or greater have been reported for measuring the resulting angle or distance from the wall in this position.21 For this study, the distance between the wall and the great toe was recorded and the test was then repeated on the other side. The average distance of both feet was used for analysis.
Strength
Strength-testing procedures were selected on the basis of reliability and safety for the population of interest. One-repetition–maximum testing is the criterion standard for strength assessment; however, a reliable estimate cannot be obtained during a single session in untrained populations22 and may additionally be unsafe for these individuals.23 We therefore used alternative methods that are feasible in untrained populations and are strongly associated with their repetition-maximum analogs.
The modified YMCA bench-press test was used to assess upper body strength. The test was conducted using sex-specific standardized weights: 36.4 kg (80 lb) for men and 15.9 kg (35 lb) for women.23 The test began with the participant positioned on a standard weight bench grasping the bar at a comfortable position. A metronome was then set to 60 beats/min and the participant was instructed to perform bench presses at 30 repetitions/min such that each beat of the metronome coincided with the bar reaching the up (fully extended) or down (bar-on-chest) position. When the participant was no longer able to maintain the 30-repetitions/min cadence or could no longer continue, that number of repetitions was recorded as the final score. This score is a strong predictor of 1-repetition–maximum bench-press loads (Pearson r = 0.87, average standard error of the estimate = 5.55).23 A truncated familiarization trial was performed so as to allow participants to become accustomed to the weight and cadence of the test.
Countermovement-jump peak power has been shown to estimate 1-repetition–maximum back squat with high fidelity (test-retest reliability = 0.98, Pearson r = 0.92)24 and was therefore selected to assess lower body strength. Each jump test requires the participant to exert maximal effort. Participants were allotted 1 practice trial and 3 test trials with approximately 1 minute of rest between efforts. They began by standing on a force plate (model 4060-NC; Bertec Corp, Columbus, OH) with hands on hips. When instructed, they crouched to a preferred depth (countermovement), immediately jumped as high as possible, and then landed on the force plate. Vertical ground reaction force was sampled at 1000 Hz and low-pass filtered at 40 Hz using The MotionMonitor software (Innovative Sports Training Inc, Chicago, IL). Data were recorded from the sampling buffer starting 1 second before the activation of a threshold trigger that marked the initiation of the countermovement. Because the first 1 second of data corresponded to quiet standing, we assumed that initial center-of-mass velocity was zero. Instantaneous velocity was then calculated using the forward-dynamics approach.25 Next, a power time series was calculated as the product of the force and velocity curves. The peak of the power time series during the concentric phase of the countermovement jump was used for analysis.
Functional Movement Screen
After a familiarization round, the FMS4,5 was administered both under conventional conditions (FMSC) and while wearing an adjustable vest weighing 18.10 kg (FMSW; MiR Vest Inc, San Jose, CA). This is comparable with loads used in other studies15,26 involving military personnel but may be less than the average combat loads in recent conflicts. We elected to use a standardized 18.10-kg load as this amount is sufficient to challenge FMS performance9 and has a basis in previous research.26 For men and women respectively, the vest added an average of 22.72% or 28.29% of the participant's body weight. The weighted-vest and control conditions were randomized.
The FMS was administered by an experienced, although not FMS certified, member of the investigative team with established reliability in the relevant measures (Cohen κ = 0.67–1.00; S.M.G. et al, unpublished data, 2015). The tests administered are listed in the “Results” and were scored according to the following criteria: (1) participant was unable to complete the movement; (2) participant was able to complete the task with errors noted; (3) participant was able to complete the task without error.
Statistical Analysis
In order to compare our results with previous work, we tested decreases in FMS item scores related to the weighted-vest condition with directional Wilcoxon signed-rank tests for matched pairs. We then tested our hypotheses concerning effect modification using separate regression models with the log transform of each FMS test item serving as a dependent variable. (The log transform was used to facilitate analysis with the existing options available in the relevant software packages, detailed in the following paragraphs.) Other options for analyzing this type of data include ordinal logistic regression or multinomial logistic regression. Our choice of statistical approach was based on the need to account for within-subjects effects and interaction effects in the same model. We further required a suitable means of performing model selection in a high-dimensional predictor space.
We modeled differential covariate effects in the 2 testing conditions using a varying coefficients structure.27 Although this type of model is traditionally used to analyze effects that vary over time, it can be applied similarly to analyze effects that vary by condition.27 Regardless of the order in which the tests were administered, the design matrix was specified such that data from the control condition were modeled as the first of 2 coefficients for each variable. The second coefficient, corresponding to the weighted-vest condition, represented the change in the effect of the covariate relative to the control condition. Thus, this coefficient can be interpreted as a covariate × condition interaction term. Note that, unlike the examples offered by Hess et al,27 the modifying factor in our model was not time but rather weighted-vest condition. Because all data were collected on the same day, our set of independent variables was the same for each model. This is reasonable because we did not expect intrinsic performance attributes to vary within the span of a few minutes and bias associated with condition order is accommodated through randomization.
We hypothesized that the decrease in weighted-vest FMS item scores relative to the control condition would be smaller for those participants showing greater levels of balance, range of motion, and strength. With 2 exceptions, this would be visible in our models as a positive relationship between the item score and our 3 mediators at time point 2. The 2 exceptions are the Apley scratch test and resultant CPV, for which higher values were interpreted as worse and the predicted signs of their coefficients in the weighted-vest condition were therefore negative.
Several nuisance variables, as well as their interactions with the weighted-vest condition, were also accounted for in our models: age, sex, height, and mass. It is apparent from our list of independent variables that the models of interest in our study were likely substantially underpowered for conventional regression techniques. This was especially true for the detection of interaction effects. Model selection under these circumstances can be achieved through penalization.28 Penalized regression methods minimize an error term just as more familiar forms of regression do but are subject to additional constraints on the magnitude of the coefficients. These constraints are incorporated using a data-driven tuning parameter, here denoted as Λ, which is usually selected on the basis of some information criterion or cross-validation procedure. The effect of using such a penalty is to prevent overfitting a model to the variance that is unique to a given sample. Once the models have been selected, standard methods of estimation and significance testing can be applied. Though estimation and significance testing are performed at an optimized value of Λ, relative importance among variables can be approximated by observing the order in which predictors are retained in the model as the penalty is progressively relaxed from a point at which all coefficients are equal to zero.27
In this investigation, the tuning parameter (Λ) associated with minimum cross-validation error was first determined using a 5-fold cross-validation routine. Model selection was then performed using the group lasso at the identified Λ value. Because we are interested in explaining variance after accounting for differences attributable to the nuisance covariates (age, sex, height, and weight), these variables were not penalized during tuning parameter identification or model selection. All analyses in the present study were conducted using R (The R Foundation, Vienna University of Economics and Business, Vienna, Austria) with add-on packages grpreg28 and boot.29 A significance level of α = .05 was specified a priori.
RESULTS
The weighted-vest condition was associated with a decrease in item scores for each FMS test except the hurdle step. A descriptive summary of the predictor variables modeled in this study is presented in Table 1. A count summary and paired difference tests for FMS item scores in the 2 experimental conditions are shown in Table 2. Model summary statistics are provided in Table 3. With the exception of the rotary-stability test, each model was significant at the .05 level and accounted for a moderate to large proportion of the variance (adjusted R2 = 0.22–0.77). Exponentiated coefficients for individual predictors are presented in Table 4. These coefficients may be interpreted as the factors by which the outcome score is expected to change in response to a 1-unit increase in the associated predictor. In this context, a value of 1 corresponds to no effect, whereas values greater than or less than 1 correspond to positive and negative effects, respectively. For a given model, the relative importance of the various predictors after accounting for the nuisance parameters can be seen in Table 5, which shows the order in which variables were selected as the penalty parameter was incrementally relaxed.
Nuisance Parameters
In general, body weight was the most influential nuisance covariate, reducing test performance in the deep squat, active straight-leg raise, and trunk-stability push-up versus increasing performance in the shoulder-mobility test (Table 4). Sex was also differential in its effect depending on the test. Male sex was associated with poorer performance in the deep squat (Table 4) and better performance in the trunk-stability push-up (Table 4), each of these being relatively strong effects. Height was predictive of better hurdle-step (Table 4) performance. No effects related to height, body weight, or sex indicated a mediating role on the effect of the weight vest. Age was the only nuisance parameter in which a mediating effect was observed, with lower scores in the weighted-vest condition of the shoulder-mobility test (Table 4) being associated with greater age. No general effects of age were observed.
Balance, Range of Motion, and Strength
A significant effect was observed for the YMCA bench-press repetitions, for which a greater number of repetitions was predictive of higher item scores in the deep-squat (Table 4) and trunk-stability push-up (Table 4) tests. For the countermovement jump, higher peak power was predictive of lower hurdle-step scores (Table 4).
High scores on the weight-bearing–lunge test predicted better performance in the deep squat and in-line lunge (Table 4). In the weighted-vest condition specifically, the same variable was predictive of greater shoulder-mobility (Table 4) test performance. Higher scores on the sit-and-reach test were associated with better performance in the active straight-leg raise (Table 4). Finally, lower (ie, better) Apley scratch test scores predicted better performance in the shoulder-mobility (Table 4) test.
Lastly, greater mean CPV was a significant predictor of better hurdle-step performance (Table 4).
DISCUSSION
Paired differences in FMS item scores closely mirrored previously reported changes.9 One noteworthy finding concerns the distribution of scores in the 2 conditions. Particularly in the control condition, very few scores of 1 were observed. This may support what has been suggested regarding an inability of clinical movement tests to differentiate performance at higher levels.7 The load-bearing treatment used in this study may therefore have served to normalize the score distribution.
Previous authors9 have shown that FMS item scores, except for the trunk-stability push-up, were not related to tactical criterion performance tasks unless the screen was performed with an external load. Considering the possibility that the additional load highlighted performance-relevant attributes during the screening process, we hypothesized that the external-load treatment preferentially taxed those participants with relatively low levels of balance, range of motion, and strength. We therefore hypothesized that high levels of balance, range of motion, and strength would be associated with smaller decreases in FMS item scores when comparing FMSW with FMSC. This hypothesis would be supported by coefficients corresponding to the weighted-vest condition (“*W”) being selected early and exhibiting a relatively large effect in the appropriate direction, which was generally not the case. Most of the condition-specific covariates were among the last to be retained in the models and were usually not selected at the optimized value of the tuning parameter (Λ). The only condition-specific covariate selected was the weight-bearing–lunge test in the shoulder-mobility model, for which a positive relationship was observed. This is not an unreasonable finding, as both tests depend on range of motion; however, if any single range-of-motion predictor was to be selected for this outcome, we would expect it to be the Apley scratch test.
In contrast, several of our mediator variables showed noteworthy global effects. The weight-bearing–lunge test was the most important predictor of the deep squat, supporting what has been suggested by previous researchers,11 as well as the in-line lunge. These movements are particularly susceptible to bottom-up deviation cascades in which dorsiflexion range of motion is limited. Deep-squat performance was also promoted by higher levels of upper body strength as measured via the YMCA bench-press test, possibly indicating the importance of strength throughout the kinetic chain as this outcome focuses more closely on the lower body.
Our balance variable, mean CPV, was retained as the most important variable in the hurdle-step model. Traditionally, greater CPV would be interpreted to reflect poorer balance control.30 However, in this case, it predicted higher hurdle-step scores. Although balance is one of the attributes purported to be assessed during clinical movement screens, data relating static balance and FMS component tests are limited. One group8 found no relation between in-line–lunge scores and COP excursion, whereas another31 observed an inverse relationship between hurdle-step performance and COP standard deviation. Previous data from our laboratory9 indicated that anteroposterior CPV, albeit from double-legged standing, was positively associated with higher scores in the deep squat, weighted deep squat, and weighted in-line lunge. This was initially interpreted as a spurious result potentially related to the small sample size or the use of a double-legged–standing protocol, which may lack discriminatory ability in young, healthy populations. Although it is difficult to directly compare the present results with the previous findings, the pattern suggests at least 2 possibilities. First, it could be that the variance in CPV that is predictive of lower FMS component scores is actually related to a confounder variable such as height or weight. This possibility seems unlikely based on our control and model selection procedures. Alternatively, higher CPV in this data set may actually reflect better postural control. It is possible that balance limitations can result in compensatory decreases in postural motion when individuals are not confident in exploring their postural-control space. Although it is somewhat paradoxical to assert that the participants who followed instructions better actually demonstrated worse performance, research32 suggested that increases in “exploratory” sway behavior may play a functional role in maintaining balance when sensory information is lacking. The lower CPV in our sample may also indicate a more constrained postural-control strategy that restricted the performance of dynamic tasks.33
One limitation that should be considered in interpreting the present findings concerns the measures used to evaluate performance attributes. These measures were chosen on the basis of their reliability and feasibility among the population of interest. However, other performance metrics may exist that more precisely quantify attributes associated with functional movement or other aspects of performance may be more relevant for a given clinical movement test. Specifically, outcomes related to agility, power, and coordination may play a role. These can be evaluated through timed cone drills, medicine-ball throws, or Olympic-style weightlifting, among other means.
Lastly, we note that the FMS was proposed to assess efficiency or quality of movement and was presented as a potentially separate component of performance that might complement multidimensional models such as that of Fleishman.10 In contrast, the hypotheses in our investigation considered clinical movement screens to be a convenient, feasible method of observing previously identified performance domains. Although some of the effects presented may be interpreted to support the notion of a latent movement-quality capacity, we interpreted the variation in predictors and directionality to suggest that the FMS items did not load on a single factor. This is consistent with the authors34 of a large-scale analysis who concluded that the underlying structure of the FMS composite score was not unidimensional.
CONCLUSIONS
Our findings confirm that a moderate to large portion of the variance (R2 = 0.22–0.77) in FMS scores was explained through models that included balance, range-of-motion, and strength predictors. These attributes may therefore be important constituents of performance on clinical movement screens after accounting for the influence of age, sex, height, and weight. At the same time, our analyses failed to show that balance, range of motion, and strength prevented movement-quality decreases related to external loading. This suggests that the differential abilities of FMSC and FMSW in predicting tactical performance outcomes are perhaps attributable to other factors. In conclusion, although load-enhanced clinical scoring of movement behaviors may be a viable means of predicting physical performance, further research is needed to understand the complex relationships among movement-quality and performance attributes. In addition to the agility, power, and coordination outcomes mentioned earlier, future authors should also analyze other features of clinical movement tests. Such features may include more specifically coded behaviors similar to the criteria for loss of points in FMS component tests or semistructured participant feedback regarding perceived effort and degree of difficulty required to complete a given task.