Postconcussion reaction time deficits are common, but existing assessments lack sport-related applicability. We developed the Standardized Assessment of Reaction Time (StART) tool to emulate the simultaneous cognitive and motor function demands in sport, but its reliability is unestablished.
To determine the intrarater, interrater, and test-retest reliability of StART and to examine the dual-task effect, time effect, and relationships between StART and computerized and laboratory-based functional reaction time assessments.
Prospective cohort study.
Clinical laboratory.
Twenty healthy, physically active individuals (age = 20.3 ± 1.8 years, females = 12, no concussion history = 75%).
Participants completed the StART, computerized reaction time (Stroop task via CNS Vital Signs), and laboratory-based jump landing and cutting reaction time under single-task and dual-task (subtracting by 6s or 7s) cognitive conditions at 2 testing sessions a median of 7 days apart. We used intraclass correlation coefficients (ICCs), repeated-measure analysis of variance, and Pearson r correlations to address our aims.
Overall, good to strong interrater (ICC [2,k] range = 0.83–0.97), intrarater (ICC [3,k] range = 0.91–0.98), and test-retest (ICC [3,k] range = 0.69–0.89) reliability were observed. A significant reaction time assessment-by-cognitive condition interaction was present (P = .018, ηp2 = 0.14), with StART having the largest dual-task effect. Main time effects for dual-task conditions were seen across all reaction time assessments (mean difference = −25 milliseconds, P = .026, ηp2 = 0.08) with improved performance at the second testing session. No StART outcomes correlated with computerized reaction time (P > .05), although some correlated with single-task (r range = 0.42–0.65) and dual-task (r range = 0.19–0.50) laboratory cutting reaction time.
The StART demonstrated overall reliable performance relative to other reaction time measures. Reliability coupled with a strong dual-task effect indicates that StART is a valid measure for examining functional reaction time and may have future utility for sport-related concussion return-to-play decision-making.
Excellent overall reliability with only 3 trials indicates any clinician can accurately and quickly use the Standardized Assessment of Reaction Time (StART).
Minimal detectable changes were observed and indicate strong metric sensitivity for future clinical use.
The StART demonstrated a similar or stronger dual-task effect compared with other reaction time measures and thus may better induce cognitive-motor interference clinically.
Concussions are a widespread condition occurring across sport,1,2 hospital,3 and military4 populations and result in transient dysfunctional neurotransmission throughout the brain.5 Concussions present clinically with heightened symptoms, sensorimotor impairment, and dysfunctional neurocognition. Postconcussion deficits typically normalize for most people between 21 and 28 days after injury.1,6 To ensure that common deficits can be detected, best clinical practices call for standardized assessments to examine these areas and thus to accurately diagnose concussions.7,8 With neurocognitive assessments, clinicians specifically evaluate numerous cognitive domains, with reaction time being one robust and valid domain of focus.9 Postconcussion reaction time deficits are well-established, are considered an important assessment outcome due to deficits observed immediately after injury until 21 to 59 days postinjury,10 and are robust across all reaction time methods used.10–12 Thus, numerous methods ranging widely in complexity exist to ensure that reaction time can be tested in any setting after concussion.12–17
Current reaction time measures are typically evaluated in sports medicine7 via either computerized neurocognitive assessments12,18 or clinical assessments (eg, drop ruler test).14,15 One major concern is that computerized reaction time and clinical reaction time measures do not correlate with functional reaction time determined using whole-body motion-capture tracking.13,19 Increasingly, researchers have also identified lingering impairments on dual-task (ie, simultaneously completing cognitive and motor tasks) gait assessments 2 months after concussion20 and consistently observed a 2-fold increase in musculoskeletal injury for up to 2 years.21,22 The reaction time and gait findings13,20–22 collectively raise concerns about return-to-sport safety and the lack of functional, sport-like applicability, as current clinical measures23 are not associated with heightened musculoskeletal injury after concussion,21,22 whereas dual-task gait outcomes are.24,25 The recommended diagnostic assessments may not be well-suited for determining return-to-sport readiness, as the on-field demands are not emulated.
Clinical assessments occur in a controlled, quiet testing environment26,27 and consist of simple finger movements (computerized neurocognitive testing) or static stances (postural stability). These circumstances differ greatly from the dynamic environments of most sport settings and the whole-body movement coordination needed to successfully compete and avoid injury. Such differences between clinical and sport environments may be important considerations for ensuring return-to-sport safety, and some components could perhaps be emulated via dual-task and whole-body functional movement assessments. We have developed a novel reaction time assessment battery called the Standardized Assessment of Reaction Time (StART) to address these concerns and potentially improve return-to-sport patient safety.28,29 However, before implementation in clinical practice, we must first understand the StART psychometric properties and their relationship with established computerized and laboratory-based functional reaction time assessments to ensure reliability and identify the minimal detectable change in scores.
The purposes of our study were to (1) determine the intrarater, interrater, and test-retest reliability; standard error of measurement; and minimal detectable change of StART; (2) examine the dual-task and time effect of StART compared with other reaction time assessments; and (3) examine the relationship between StART and previously established reaction time measures (computerized18 and functional movement13,30). We hypothesized that (1) StART would display good31 intrarater, interrater, and test-retest reliability (intraclass correlation coefficients [ICCs] ≥0.75) and relatively small measurement error and detectable change; (2) StART would display a similar dual-task effect (ie, worse reaction time during the dual-task relative to the single-task condition) and time effect as other reaction time assessments; and (3) StART would not correlate with computerized reaction time but would correlate with functional movement reaction time.
METHODS
Study Design and Participants
An a priori power analysis32 was conducted using the test-retest reliability ICCs provided by Lynall et al30 for their laboratory-based jump landing, single-legged hop, and cutting reaction time performance under single and dual tasks. Power analysis was calculated using 2-tailed tests (α = .05, β = 0.90) and the lowest published30 test-retest reliability ICC value of 0.75 relative to the alternative hypothesis ICC of 0.21 (lowest fair ICC interpretation value).32,33 A sample size of 20 participants was needed to maintain adequate power for detecting true effects.
Therefore, a convenience sample of 20 healthy, recreationally active, college-aged participants were enrolled and completed all assessments across 2 testing sessions a median of 7 (interquartile range = 7–7; range = 7–38) days apart as part of the prospective repeated-measures study. All participants were recruited across the university campus through fliers placed in university buildings; interested individuals contacted the research team to be screened and to enroll in the study. Volunteers were included if they were healthy young adults (age = 18–30 years old), English was their primary language, and they were recreationally active (performed physical activity for a minimum of 30 minutes, 3 times per week).34 Volunteers were excluded if they had a lower extremity injury in the last 3 months resulting in physical activity time loss of ≥1 day; a history of lower extremity or low back orthopaedic surgery; a self-reported concussion within the past year; or any learning disability or attention-deficit/hyperactivity, psychiatric, balance, or mental health disorder. Participants received their honoraria only after completing testing sessions 1 and 2 separately to encourage maximal performance throughout and minimize potential attrition. This investigation was approved by the University of Georgia institutional review board, and all individuals provided written informed consent before the study.
Instrumentation and Procedures
Participants completed the assessments in a block-randomized (StART, computerized, and laboratory) order established a priori. The assessment order was identical between individual participant testing sessions to keep any learning or fatiguing effects consistent. All single-task conditions were completed before dual-task conditions, where applicable. The assessments consisted of (1) StART, (2) computerized reaction time,18 (3) laboratory jump landing,13,30 and (4) laboratory cutting.13,30 All tests were completed in a designated, isolated laboratory space in order to mimic recommended concussion assessment conditions.26,27 Jump landing and cutting were performed under both single-task (completing the motor task as quickly as possible) and dual-task (completing the cognitive task while simultaneously completing the motor task as quickly as possible) conditions. The cognitive task was standard across all assessments and was used in StART. Participants subtracted from a random number between 90 and 150 by either 6s or 7s throughout the motor task.13,30,35,36 Subtraction was initiated when they were instructed to “get set” and stopped once the trial was completed. The subtraction task was started from a different random number for each subsequent trial.
Standardized Assessment of Reaction Time
All trials were video and audio recorded on an iPad (model A1701; Apple Inc) using either the OnForm (version 1.95)37 or Hudl (no longer available; sold and phased out to OnForm) application and recorded at 240 Hz (ie, Slo-Mo function on an iPhone [Apple Inc]) with 720-pixel resolution while fixed on a tripod approximately 3.05 m from participants to ensure that individuals of all heights would completely fit in the frame with space above and below their head and feet.28 The video frame rate is equivalent to or faster than that of criterion standard motion-capture cameras.36 A light-emitting diode penlight tip was placed in the camera recording frame and served as the time-synchronized visual stimulus to enable us to calculate reaction time. For all StART trials, participants were instructed to “get set,” and then the penlight was randomly illuminated 2 to 10 seconds later. Participants initiated and completed the specified condition as fast as possible after the penlight illumination. For visual depictions of each StART subtest, please see Lempke et al.28
The StART consisted of 5 trials of 3 movements (standing, single-legged balance, and cutting) across 2 conditions (single and dual task, 30 trials total in each testing session) and took approximately 10 minutes to complete.28 For the standing trials, individuals stood with their feet together and hands on hips. The standing trials were conducted as a relatively simple way of assessing reaction time that theoretically would display postconcussion deficits.10 For single-legged balance trials, participants stood on their nondominant leg (the leg they would not use to kick a ball) with hands on hips as they balanced throughout the trial. We chose single-legged balance because patients with concussion are known to have challenges integrating sensory information20,38,39 and may display a slower reaction time. For the cutting trials, individuals assumed a semisquatting athletic stance with hands on hips. The cutting trials were used to emulate sport-like functional movements that have been previously completed only in laboratory settings using motion-capture equipment.13,30
Participants moved their hands off their hips until their arms were straight out in a “T” position (parallel to the ground) as soon as they saw the penlight illuminate for the standing and single-legged balance tasks. Cutting trials required participants to sprint from the starting position to the left- or right-side target positioned approximately 3.05 m away at 45° by performing an athletic cutting motion. The time (milliseconds) between penlight activation and the first frame of arm movement (eg, hands off hips, elbows bending, fingers raising) was deemed the reaction time for standing and single-legged trials, and the first body movement (eg, foot pivoting, hands coming off hips, torso or head laterally deviating) was identified as the reaction time for cutting trials. A secondary reaction time metric was calculated for cutting trials that included the time from light activation to the first frame of the torso moving, as indicated by white tape placed on the sternum to replicate a center-of-mass area starting to move, similar to prior motion-capture work.13 Each trial from the 6 StART reaction time movement and cognitive combinations was scored individually and then averaged for each movement (standing, single-legged balance, and cutting) and cognitive condition (single task and dual task) separately. Additionally, we derived composite single-task and dual-task StART outcomes by averaging their respective trials separately, as well as an omnibus StART outcome comprising all StART outcomes averaged as detailed in the Data Processing and Statistical Analysis section.
Computerized Reaction Time
The computerized reaction time was assessed via the Stroop Task test on a reliable and valid computerized neurocognitive platform (CNS Vital Signs).18,40 The Stroop Task is widely used in computerized neurocognitive programs.18,41 Participants were presented with the embedded standard instructions for each of the 3 subtests along with a practice sample before testing.13 The test displayed color words (“red,” “yellow,” “blue,” or “green”) randomly in different font colors for the subtests. The 3 subtests required participants to press the spacebar as quickly as possible after (1) any word was presented, (2) the color word was presented in the same color font, and (3) thecolor word was not presented in the same color font. The computerized reaction time composite (milliseconds) was calculated as a weighted score from the subtests per the neurocognitive test’s standard method and was the main outcome measure.7,18
Laboratory Jump Landing and Cutting Reaction Time
Laboratory-based jump landing and cutting occurred in an 8-camera 3-dimensional motion-capture space (model MIQUIS; Qualisys AB) recording at 240 Hz as previously described.13,30,36 Participants stood on a 30-cm box and placed at 50% of their body height behind the landing target with 3 reflective markers over their posterior-superior iliac spine and sacral body. They adopted an athletic stance after being told to “get set” by the research team, and then a green light was randomly triggered in 2 to 5 seconds. Individuals initiated the movement as quickly as possible after seeing the green light.42 The time (milliseconds) between visual stimulus and sacral marker movement of ≥3 cm in either the sagittal or frontal plane from its mean position 0.5 seconds before the visual stimulus was deemed the reaction time for each assessment.13,30,36,42 Participants were given at least 1 practice trial before data collection and took a 1-minute break between the jump landing and cutting tasks to minimize any physical fatigue.
They jumped forward off the box, landed on both legs, and performed a maximal-height countermovement jump for jump landing in 5 single-task and 5 dual-task trials.13,30,36 For cutting, participants jumped forward, landed on a single leg, and immediately executed a 45° athletic-cutting motion in the direction provided before the trial. They landed on their left foot and ran to the right side of the laboratory space for cuts to the right side and vice versa for left cuts. Individuals completed 5 trials in each direction under single- and dual-task conditions (10 trials total). The marker positional data for the laboratory jump landing and cutting sacral were processed and filtered with a fourth-order, low-pass Butterworth filter and a 10-Hz cutoff frequency. The laboratory jump landing and cutting data were imported, processed, and analyzed using Visual3D (version 2021.02.1; C-Motion Inc) to calculate reaction time across all trials as previously defined.13,30,36,42 Each laboratory jump landing and cutting trial was processed individually and averaged for each movement (jump landing and cutting) and cognitive condition (single task and dual task) separately.
Data Processing and Statistical Analysis
We averaged the 5 trials from all reaction time assessments and cognitive conditions to determine the 6 (single- and dual-task standing, single-legged balance, and cutting) main StART outcome scores separately. We used these 6 StART main outcomes to calculate the 3 following composite scores as the average score from the specified trials: single-task reaction time (all single-task StART trials), dual-task reaction time (all dual-task StART trials), and an omnibus StART composite reaction time (all StART trials combined) to comprehensively examine the measurement reliability and precision. The trials from laboratory-based jump landing and cutting were averaged separately to calculate mean single- and dual-task reaction time outcomes for each movement separately.
Descriptive statistics were computed for participant demographic variables and mean reaction time assessments. To determine interrater reliability, 2 raters (E.J.S. and T.A.P.) independently scored the first testing session StART reaction time for all trials and conditions while blinded to the other rater’s scoring. Author E.J.S. scored all participants’ first testing session StART trials on 2 occasions at a median (interquartile range) of 23 (21–24.5) days apart to assess intrarater reliability and then examined all participants’ first and second testing sessions to assess test-retest reliability. Intrarater, interrater, and test-retest reliability were evaluated using ICCs31,43 with 95% CIs. Intrarater and interrater ICCs were generated from the first testing session. All ICCs were interpreted as poor (<0.50), moderate (0.50–0.74), good (0.75–0.89), or strong (≥0.90).31 The test-retest reliability ICC was used to calculate the standard error of measurement (SEM), which was used to calculate the minimal detectable change (MDC)44 for each StART subtest and composite score using the equations described by Weir.44 Test-retest reliability for center-of-mass cutting reaction time was poor (ICC [3,k] = 0.28), and therefore, we excluded this outcome from all subsequent analyses. Sensitivity analyses via separate repeated-measures analyses of variance (ANOVAs) were also conducted to determine the minimal number of trials needed for each StART subtest before the outcomes were significantly influenced. Specifically, we compared the average of all 5 trials versus the average of the first 4, the first 3, the first 2, and the first trial for each StART outcome separately with unadjusted post hoc t tests to detect any potential difference.
A 5 × 2 repeated-measures ANOVA was computed to compare the StART subtests (standing, single-legged balance, and cutting) and laboratory jump landing and cutting reaction times under cognitive conditions (single task versus dual task) in the first testing session data. We conducted a 6 × 2 repeated-measures ANOVA to compare all reaction time measures across the 2 testing sessions in order to examine any time effect for the single-task condition and a 5 × 2 repeated-measures ANOVA (no computerized reaction time due to the absence of the dual-task) to examine any dual-task time effect. Any significant ANOVA interactions or main effects were followed up with post hoc Tukey t tests, mean differences, and 95% CIs.
To determine the relationship between StART and previously established reaction time measures at the first testing session, we calculated Pearson correlation coefficients between all reaction time measures. Correlations were interpreted as negligible (≤0.20), low (0.21–0.40), moderate (0.41–0.60), high (0.61–0.80), or strong (≥0.81). All data were assessed for general linear model assumptions before analysis and analyzed in the R Project for Statistical Programming (version 4.0.4; The R Foundation) with α = .05 a priori. Specifically, all reaction time metrics were examined for the normality of their residuals visually via QQ plots and statistically via Shapiro-Wilk tests; no statistical or visual violations were present (P values ≥ .067).
RESULTS
Participants
All 20 participants completed both testing sessions. Their mean ± SD age was 20.3 ± 1.8 years, height was 173.0 ± 8.6 cm, and mass was 69.9 ± 13.4 kg. They reported 7.0 ± 1.2 and 7.0 ± 1.7 hours of sleep the night before the first and second testing sessions, respectively. A total of 60% of participants were female, 75% were without a history of concussion (n = 4 with 1 prior concussion, n = 1 with 2 prior concussions), 85% were right-hand dominant (identified as the hand with which the individual would throw), and 100% were right leg dominant.
Reliability, SEM, and MDC of StART and Trial Sensitivity Analysis
The interrater reliability for all StART outcomes was good to strong, with the lowest value for dual-task cutting (ICC [2,k] = 0.83; Table 1). Intrarater reliability was strong across all StART outcomes; the lowest value was for dual-task cutting (ICC [3,k] = 0.91; Table 1). Test-retest reliability was relatively lower and ranged from moderate to good across outcomes (ICC [3,k] range = 0.69–0.89). The associated StART SEM values ranged from 8 to 31 milliseconds, and MDC values ranged from 21 to 85 milliseconds, with the highest (worst) value for dual-task single-legged balance (Table 1).
The sensitivity analyses revealed that numerous StART outcomes were statistically influenced by the number of trials used (Table 2). Post hoc testing indicated that the StART outcomes differed when the 5-trial average was compared with only the single trial (P ≤ .005) but not when compared with the first 2-, 3-, or 4-trial averages. Similarly, all 3 StART composite scores using the 5-trial average were different than the 1-trial or 2-trial averages but not different than the 3- or 4-trial averages.
Dual-Task and Time Effect Across all Reaction Time Measures
A significant reaction time outcome-by-cognitive condition interaction was observed (F4,19 = 3.17, P = .018, ηp2 = 0.14) such that the dual-task increase in reaction time varied by each assessment outcome (Table 3). All measures resulted in increased reaction time during dual-task versus single-task (F4,19 = 155.83, P < .001, ηp2 = 0.67) conditions, with StART single-legged balance having the largest mean difference and StART cutting having the smallest.
No significant reaction time outcome-by-testing session interactions were seen for either single-task (F5,19 = 0.93, P= .466, ηp2 = 0.05) or dual-task (F4,19 = 0.72, P = .582, ηp2 = 0.04) comparisons. No main time effect was identified for single-task outcomes (F5,19 = 0.92, P= .351, ηp2 = 0.01); however, for dual-task outcomes (F4,19 = 5.88, P = .026, ηp2 = 0.08; Table 3), reaction time overall at the second testing session was 25 milliseconds (95% CI = 5, 45 milliseconds) faster. Mean differences between testing sessions for all outcomes are presented in Table 3, with StART single-legged balance and cutting during the dual task being faster during the second testing session.
Correlations Between Clinical, Laboratory, and StART Reaction Time Measures
None of the StART outcomes were significantly correlated with computerized reaction time (P ≥ .05; Figure). The StART single-task outcomes of standing (r = 0.42), single-legged balance (r = 0.49), and cutting (r = 0.65) were moderately to highly correlated with single-task cutting, whereas no StART outcomes were correlated with single-task jump landing. Under dual-task conditions, only the StART cutting condition (r = 0.50) and StART dual-task composite (r = 0.47) were moderately correlated with laboratory-based cutting. Among StART conditions, various correlation levels were noted (r range = −0.19–0.57). All single- and dual-task StART subtest scores were highly correlated with their respective composite scores (Figure).
Pearson correlations across reaction time outcomes (n = 20). Only correlations with P ≤ .05 are colored according to the correlation strength with black numeric values. White cells with gray numeric values indicate nonsignificant correlations (P values > .05). Abbreviations: StART, Standardized Assessment of Reaction Time; ST, single-task; DT, dual-task.
Pearson correlations across reaction time outcomes (n = 20). Only correlations with P ≤ .05 are colored according to the correlation strength with black numeric values. White cells with gray numeric values indicate nonsignificant correlations (P values > .05). Abbreviations: StART, Standardized Assessment of Reaction Time; ST, single-task; DT, dual-task.
DISCUSSION
Our findings provide foundational evidence for the StART measurement properties, which will be beneficial for future potential clinical use. Excellent overall intrarater, interrater, and test-retest reliability was coupled with a relatively small SEM and MDC for most StART outcomes. We also detected similar or stronger dual-task effects for StART than for the laboratory-based measures from which it was derived,13 correlations with laboratory-based reaction time, and no correlation with computerized reaction time. Significant time effects were present for dual-task single-legged balance and cutting StART conditions, reflecting performance improvements upon retesting, which were similar to those in other dual-task paradigms.30,45 Cumulatively, our results indicate that StART may serve as a quick, reliable, accurate, and clinically feasible measure that can be implemented using readily available tools. Yet research in participants postconcussion is warranted to understand its clinical utility and safety before it is used in clinical decision-making.
Measurement reliability is an important consideration for interpreting whether assessment outcome changes are due to error or injury. Our findings demonstrated similar reliability as those for laboratory-based, functional reaction time measures30 and StART had overall excellent and clinically suitable reliability. Specifically, intrarater and interrater reliability were mostly strong across all StART conditions (Table 1) and showed that the same examiner and different examiners can determine reaction time equally accurately. Test-retest reliability was lower than intrarater and interrater reliability but was still moderate to good (ICC [3,k] range = 0.69–0.89) with the lowest performing subtest being dual-task cutting. The relatively lower reliability for dual-task cutting may be attributed to it being the most cognitively and motor-demanding task in the StART, which may inherently introduce more variability as the cognitive-motor interference increases.46 Although the StART test-retest reliability was relatively lower, it was again similar to the laboratory-based, cutting reaction time measures previously established (ICC [3,k] range = 0.75–0.91).30 We also observed a relatively small MDC range across StART subtests and composites (MDC range = 21–85 milliseconds) that was smaller than for laboratory-based, cutting reaction time30 and may provide additional measurement diagnostic precision for future postconcussion use. Based on our sensitivity analyses, 5 trials are not necessary to measure stable StART performance. Two or three trials were needed for all StART outcomes to establish stable metrics (Table 2), and thus, we recommend 3 trials for each condition to ensure measurement stability and reduce StART administration time from approximately 10 minutes to 5 minutes.
Dual-task paradigms are growing in clinical use due to their functional applicability.13,30,47 We found that a dual-task effect was present in StART (mean difference range = 78–129 milliseconds; Table 3) that was similar to laboratory-based reaction time measures in our study and prior reports.13 Importantly, the StART random light stimulus window was 2 to 10 seconds, whereas the laboratory-based random light stimulus window was 2 to 5 seconds. The additional potential time provided to subtract during StART may have contributed to the optimal dual-task effects. Regardless, our results indicate that StART can sufficiently elicit the intended dual-task effect via simpler and clinically practical equipment.
Time effects during StART were assessed to understand whether improved performance occurred across repeat test administrations. Most outcomes did not show any time effects across the 7-day retest window (Table 3); however, single-legged balance (32 milliseconds) and cutting (27 milliseconds) under dual-task conditions did improve at the second testing session. Improved dual-task single-legged balance and cutting performance at the second testing session coupled with the test-retest reliability ICCs imply that learning or practice improvements took place. Better performance at the second testing session is commonly reported for dual-task30,45 and reaction time assessments generally30,47,48 when individuals are retested within a relatively short time period. Improvements with repeat testing highlight the importance of determining test-retest reliability and using MDCs, which mathematically account for test-retest reliability,44 in clinical practice to better isolate the source of the decreased performance.
Determination of the relationship between assessments is a key consideration to ensure that the constructs of interest are being measured accurately. We specifically examined the relationships among StART relative to laboratory-based reaction time and computerized reaction time to learn whether the appropriate factors were being assessed. The overall correlations supported our hypotheses (Figure): moderate to high correlations were evident between StART single-task subtests and the single-task laboratory-based cutting reaction time (r range = 0.42–0.65) and between StART dual-task cutting and dual-task laboratory-based cutting reaction time (r = 0.50). No StART outcomes correlated with single- or dual-task jump landing except the StART dual-task composite score (r = 0.39), which may indicate that only cutting, rather than general functional movement, was emulated in StART. Conversely, no to low correlations were present between any StART outcome and computerized reaction time (r range = 0.07–0.33). Thus, our cumulative findings support the notion that StART is a reliable and accurate assessment for measuring functional reaction time in clinical settings. Of note, moderate correlations between StART standing and single-legged balance outcomes (single task = 0.57, dual task = 0.53; Figure) may reflect redundancy. However, these possible redundancies may be attributed to the healthy cohort examined. Individuals experiencing a concussion often display postural stability impairments,20 and therefore, evaluating the correlations in a sample of individuals experiencing a concussion may elicit different relationships. Thus, future researchers should address how the StART metrics and conditions relate to concussion in order to further optimize StART for maximal efficiency.
Limitations
We investigated college-aged, physically active individuals attending a university, but our results may not be generalizable to collegiate athletes or individuals younger or older than this cohort. Our research questions and study design required that we use numerous reaction time measures at each visit and that may have affected reaction time performance despite randomization. Hence, the outcome summary statistics may not represent expected performance in clinical practice and should not be used as healthy reference data. We also offered participant honoraria to support recruitment, the effort expended, and retention in the study, but this might have produced a biased sample or altered levels of effort than could be expected in a clinical setting. Therefore, future researchers should implement StART in a more naturalistic sports medicine setting, both pre- and postconcussion, to better understand StART performance and its potential diagnostic properties before it can be recommended for clinical decision-making.
CONCLUSIONS
The StART demonstrated good to strong intrarater, interrater, and test-retest reliability coupled with a relatively small SEM and MDC. Dual-task effects that were similar or stronger than prior laboratory-based reaction time measures were identified,13 along with no correlation with computerized reaction time and correlations with laboratory-based reaction time. Excellent reliability combined with appropriate dual-task effects indicates that StART is an appropriate measure for examining functional reaction time by using relatively low-cost tools and may efficiently translate to clinical settings. However, future authors should study StART in patients after concussion to understand its clinical utility before it is used in clinical decision-making.
ACKNOWLEDGMENTS
We thank undergraduate students Abby Moss and Rachel Stupp for their assistance in data processing in this project, which was funded by the University of Georgia Louise E. Kindig Doctoral Grant.