A mobile application neurocognitive assessment has been used in place of equipment-intensive computerized neurocognitive-assessment protocols. A previous study showed high to very high test-retest reliability of neurocognitive assessment using the mobile application in healthy adults, but no researchers have explored test-retest reliability, reliable change indices (RCIs), and sex effects in middle school and high school populations when conducted 1 year apart.
To examine the test-retest reliability and RCIs of baseline data collected at 2 time points approximately 1 year apart using a mobile application neurocognitive test in middle school and high school athletes. The secondary purpose was to investigate sex differences in neurocognitive measures.
Cross-sectional study.
Institution.
A total of 172 middle school and high school healthy student-athletes (mean age = 13.78 ± 1.59 years).
Mobile application neurocognitive test scores (reaction time, impulse control, inspection, and memory).
Neurocognitive measures had low test-retest reliability across a 1-year time period in the middle and high school settings. Upon retesting, reaction time and inspection time improved in both middle and high school athletes, and impulse control improved in middle school athletes. More athletes in middle school showed RCI improvements compared with high school athletes. Although both males and females demonstrated improvements in neurocognitive measures throughout adolescence, males outperformed females in reaction time and impulse control.
A mobile application neurocognitive test displayed unacceptably low test-retest reliability, most likely due to the cognitive development occurring throughout adolescence. Additionally, significant RCIs were noted. These naturally occurring improvements due to cognitive development could mask postconcussion deficits. Age and sex warrant consideration with respect to the neurocognitive performance of middle and high school athletes.
Clinicians should be aware of the influences of age and sex on baseline and postconcussion neurocognitive tests in adolescents.
Further research to identify the ideal frequency of neurocognitive baseline testing for adolescents is needed.
In the last decade, education, policy interventions, and research regarding sport-related concussions (SRCs) in youth athletes have grown remarkably. Neurocognitive testing that includes baseline and postinjury testing is considered the best practice for monitoring the cognitive recovery of individuals with concussion.1 These neurocognitive tests are commonly delivered through a computer platform that offers advantages such as rapid scoring, standardized administration, ease of administration, and increased test-retest reliability.2 Despite their advantages, these computerized neurocognitive tests can be troublesome, time consuming, impractical to perform on the field, and expensive, which makes them challenging, especially in high schools with a high proportion of low-income students.3 In a survey of responders in high school settings, only 39.9% used computerized neurocognitive tests to manage patients with concussions.4 Therefore, more feasible options of computerized neurocognitive tests for concussion management in high school settings are needed.
New technology has enhanced the efficiency of health care systems in enabling remote monitoring of health status and physical activities.5,6 According to Dufau et al,7 mobile devices will likely be the primary platform for the next generation of clinical practice and research. Recently, the US Food and Drug Administration cleared a mobile application (Sway Medical, Inc), to be downloaded on smartphones and tablets as a quick and objective test for assessing balance and neurocognitive function that can be performed on the field to assist SRC management.8 The neurocognitive tests included in this mobile application assess reaction time, impulse control, inspection time, and memory. The balance tests evaluate static postural control using the mobile device’s 3-dimensional accelerometer and gyroscope, providing more precise quantification of the magnitude of postural changes than a simple counting of total errors allows. Previous researchers reported that the balance test of the Sway Medical Test had high to excellent reliability (intraclass correlation coefficient [ICC] = 0.72–0.92)9–11 and concurrent validity when compared with Biodex balance12 and inertial measurement units10 in both healthy and clinical populations. Despite the reliability and validity of the balance tests of the Sway Medical Test having been well studied and established, only a few authors have investigated its cognitive tests. Examining the cognitive tests of the Sway Medical Test separately from the balance tests may enable further analysis of outcomes. Additionally, balance measures do not affect cognitive measures, as the Sway Medical Test produces scores for each test rather than overall scores. Burghart et al8 described the reliability of reaction time in 27 healthy adults as a high to very high ICC (0.84–0.90), indicating that the reaction time test in this mobile application is reliable. However, their study was limited to reaction time. Therefore, the reliability of the cognitive tests used in this mobile application is currently unclear. In addition to establishing the ICC for test-retest reliability, it is also important to determine a reliable change index (RCI) to examine the magnitude of change in neurocognitive test scores over time relative to measurement errors, practice effects, or both.13
An additional factor to consider when assessing reliability using the test-retest model is the interval between tests, as postconcussion management involves repetitive administration of neurocognitive tests. Most importantly, postconcussive neurocognitive assessment requires comparison with the baseline neurocognitive scores to account for individual differences. The National Athletic Trainers’ Association position statement on the management of sport-related concussion suggests that baseline neurocognitive assessments should be administered to adolescent athletes annually.14 McCrory et al15 proposed more frequent baseline testing, especially during the period of rapid cognitive development (8–15 years old), to account for rapid improvements in cognitive ability; however, the effects of neurocognitive development on baseline scores are currently unclear.16,17 Furthermore, sex differences in brain development are an important factor in neurocognitive testing of adolescents.18,19 Males demonstrated superior performance to females in visual-spatial working memory,20 whereas females displayed superior performance to males in linguistic working memory.21 These differences should be considered when implementing this novel mobile application neurocognitive assessment for concussion management in secondary school settings.
Therefore, the purpose of our study was to examine the test-retest reliability and RCI measures of neurocognitive baseline data collected using a mobile application in middle and high school athletes. We hypothesized that test-retest reliability would be acceptable (ICC > 0.60) and no reliable change (RCI < 1.96) would be evident in neurocognitive test scores. Secondly, we investigated sex differences in the neurocognitive tests. We hypothesized that neurocognitive test scores between females and males would not differ.
METHODS
Study Design
The within-participants design was used to evaluate the test-retest reliability of the mobile application neurocognitive test (Sway Medical Test, versions 4.2.8–5.3.2).
Participants
Participants were middle and high school athletes, aged 11 to 18 at the beginning of the study. During the study period, 184 athletes used the mobile application neurocognitive test before the 2021 and 2022 sports seasons; however, 12 athletes were excluded from further analysis because of missing (n = 4) or extreme outlier (n = 8) data. Eight extreme outlier data points were removed because they exceeded the lower and upper extreme values of the box plots. Thus, a total of 172 athletes (age = 13.78 ± 1.59 years) were included in the study. The sex distribution was 100 males (58.1%) and 72 females (41.9%). Participants who were 11 to 13 years at the beginning of the study were included in the middle school group, and those who were 14 to 18 at the beginning of the study were included in the high school group. A summary of the demographic data is presented in Table 1.
Materials
The Sway Medical Test is a mobile or tablet application that consists of 3 main sections: a symptom checklist, balance tests, and cognitive tests. For our purposes, only the cognitive test data were used. The cognitive tests have 4 modules: reaction time, impulse control, inspection time, and memory. The reaction time test requires athletes to tilt their mobile device as quickly as possible when they see a single stimulus to permit evaluation of stimulus recognition, processing, and initiation of a neuromotor response. For the impulse control test, athletes differentiate 2 stimuli by reacting differently and as quickly as possible to allow assessment of response inhibition and the ability to process information and initiate the correct response. The inspection time test involves the quick identification of a slight difference in 2 images of the letter “T” displayed next to each other as a measure of visual information processing speed. The memory test asks athletes to memorize 3 letters given at the beginning of the test and recall them after a distraction task to allow evaluation of both working and delayed memory. Reaction time, impulse control, and inspection time are recorded in milliseconds. A memory score on a 100-point scale, with 0 as the minimum score and 100 as the maximum score, is produced using a proprietary algorithm.
The concurrent validity of Sway Medical Test reaction time, compared against 7 Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Quick Test (QT) reaction time measures, was r = −0.46 to 0.22, and of Sway Medical Test impulse control and inspection time was r = −0.25 to −0.46.22 The potential concern about these concurrent validity measures is that Sway Medical Test reaction time uses a single test to calculate the score, whereas the ImPACT QT uses a composite score; each test involves different combinations of neurocognitive tests. Therefore, the authors compared Sway Medical Test reaction time with all 7 tests in ImPACT QT and reported the range of correlations. Burghart et al8 examined the validity of Sway Medical Test reaction time in 27 healthy adults using the Computerized Test of Information Processing measures and found a moderate positive correlation (r = 0.59).
Procedures
The data for this study were obtained as part of the annual mandatory preseason baseline neurocognitive test, which every athlete must retake every year. The Sway Medical Test data were collected at 2 schools at 2 time points, before the 2021 and 2022 sports seasons, with an average time span between tests of 0.93 ± 0.21 years. Group data-collection sessions were conducted with approximately 15 athletes in a quiet classroom and supervised by the same 2 certified athletic trainers (ATs) who had completed a training session on administering the Sway Medical Test. The ATs provided each athlete with a unique code to start the Sway Medical Test, and standardized instructions were supplied for each neurocognitive test along with a familiarization trial. Each session lasted 10 to 15 minutes. This procedure was approved by the University of Hawaii Human Studies Program Internal Review Board. All recruits gave informed consent before taking part.
Statistical Analysis
Means and SDs were determined for all tests and exported to SPSS (version 28.0; IBM Corp).
To address our first hypothesis, we assessed test-retest reliability using the single measures of 2-way mixed ICCs. The paired-samples t test was conducted to evaluate the difference in neurocognitive scores between 2 time points. The RCI was calculated to analyze the magnitude of change between 2 neurocognitive assessments.13 The standard error of measurement (SEM) was computed as at each time point, where r12 was the correlation between time 1 and time 2, and the standard error of the difference (Sdiff) was calculated using the following formula:
. The RCI was established using the following formula: (X2 − X1)/SEM, where X1 and X2 were the neurocognitive outcomes at observed times 1 and 2, respectively. The 90% CIs were based on the criteria used by ImPACT to define deficits due to concussion.2
The percentage of the RCI that fell outside of the CI indicated the proportion of athletes whose cognitive performance reliably declined or improved at time 2 compared with time 1. A reliable improvement was defined as faster reaction time, faster impulse control, faster inspection time, or higher memory score compared with time 1. These analyses were conducted by school level (ie, middle school versus high school).
To address our second hypothesis, repeated (within × between)-measures analyses of variance were conducted for each neurocognitive outcome by school level to identify the sex differences and interaction effects. Post hoc analysis was conducted using paired-samples t tests. An α level of P < .05 was applied for all analyses.
RESULTS
Means and SDs of the neurocognitive measures and the results of the paired-samples t test, ICC, and RCI analyses are presented in Table 2. The ICC value by school level showed low test-retest reliability (ICC = 0.20–0.54)23 for all neurocognitive scores. The paired-samples t test indicated improvement from time 1 to time 2 in reaction time and inspection time in both middle and high school athletes (all P values ≤ .01). Impulse control showed a distinctly different trend: improvement at time 2 in middle school athletes (P = .01) but no difference in high school athletes (P = .99) between times 1 and 2. The RCI analysis by school level demonstrated reliable improvements in a higher proportion of middle school than high school athletes in reaction time, impulse control, and inspection time (middle school = 9.88%–16.05%, high school = 6.59%–9.98%).
The effect of sex on neurocognitive test scores was examined by school level via repeated (within [times 1 and 2] × between [males and females)-measures analysis of variance (Table 3). We noted a significant interaction effect for inspection time in middle school athletes (P = .03). Post hoc analysis revealed a sex difference in inspection time, with females outperforming males, at time 1 (P = .04) but no difference at time 2 in middle school participants. Inspection time improved from time 1 to time 2 in middle school males (P < .001) but not in middle school females. No significant main effects were indicated.
DISCUSSION
The primary purpose of our study was to examine the test-retest reliability and RCIs of the mobile application neurocognitive test in middle and high school athletes when used as an annual baseline assessment. We found that neurocognitive test scores had low test-retest reliability across a 1-year time period in both the middle school and high school settings (Table 2). Similar results have been reported using computer-based neurocognitive assessment tools in adolescents. MacDonald et al24 described low to marginal reliability (ICC = 0.40–0.67) for 4 cognitive measures of the Axon Sports CCAT (attention, processing speed, learning, and memory) conducted 1 year apart. Brett and Solomon25 examined ImPACT test-retest reliability among high school athletes and observed low to high reliability (ICC = 0.47–0.83) with tests conducted 2 years apart. Although these neurocognitive tests use different methods and scoring systems—for example, ImPACT involves more comprehensive assessment and scoring whereas the Sway Medical Test neurocognitive test uses a single test to generate an associated score, which may affect the ICC values—our data provide additional evidence of low reliability measures in adolescents using a mobile application neurocognitive test. A generally acceptable ICC in psychological tests26 is >0.6, yet ICC values for clinical decision-making should be >0.9 according to a comprehensive literature review of neurocognitive tests for SRCs. The authors stated that neurocognitive tests in adolescents administered 1-year apart did not meet this standard.27 Low test-retest reliability may be associated with several factors, including practice effects, confusion about some aspects of the tests, and rapid cognitive development.
Reaction time and inspection time improved at time 2 compared with time 1 in both middle and high school athletes, and impulse control improved in middle school athletes (Table 2). The RCI analyses indicated that the proportion of athletes who showed reliable improvement ranged from 9.88% to 16.05% in middle school and from 6.59% to 9.98% in high school for these outcomes (ie, reaction time, impulse control, and inspection time). McCrory et al15 reported similar results of improvements in reaction time, working memory, and learning with CogSport among participants 9 to 18 years old, with the largest improvement between ages 9 and 15. Researchers using paper-and-pencil tests determined that visual motor processing speed consistently improved every year in participants 12 to 17 years old.28 Furthermore, visual motor speed is believed to improve through early adulthood, as the performance of college athletes was better than that of high school athletes.29 Our data provide additional evidence for rapid neurocognitive development throughout adolescence, which is often apparent in reaction time, impulse control, inspection time, and processing speed. These findings raise concerns about conducting baseline neurocognitive tests annually in adolescent athletes, as recommended by the National Athletic Trainers’ Association position statement on management of concussion.14 McCrory et al15 suggested obtaining a baseline test every 6 months for athletes who are younger than 15 years to accommodate rapid cognitive maturation. Although further research is needed to identify appropriate timelines for baseline testing using the mobile application neurocognitive test, our data indicate that reaction time, impulse control, and inspection time measures require more frequent administration of baseline tests for accurate comparison of baseline and postconcussion outcomes.
Evidence regarding sex differences in cognitive performance among adolescents is limited and inconclusive. Our only significant finding involving sex was the interaction effect on inspection time in middle school athletes: ie, males’ inspection time skills improved more quickly than those of females. Our result contradicts that of Burns and Nettelbeck,30 who noted no sex differences in inspection time among children, adolescents, and adults. Their cross-sectional study relied on an inspection time test that was similar to ours and compared the outcomes of boys (n = 113) and girls (n = 55) ranging in age from 8 to 15 years (mean age = 11.3 years). This age range represents the third grade (elementary school) to 10th grade (high school sophomore) population. The sex difference we observed was in middle school–aged adolescents, with the age range of 11 to 13 years. The main differences between the 2 investigations were the study design and age ranges of participants. In general, a repeated-measures design increases statistical power, which may have contributed to the different results. Also, our middle school adolescent population varied from the wider adolescent population of Burns and Nettelbeck.30 Additional evidence reported by Jorm et al31 indicated the trends in middle school athletes toward faster reaction time in males and more accuracy in females, the latter a trade-off that might affect the inspection time. Sex differences in memory performance and impulse control have been reported, with females performing better in verbal memory and males performing better in spatial memory32 and males showing worse impulse control than females.33 Although inconclusive, these findings suggest the influence of sex in cognitive performance during development. These differences are possibly associated with sex hormone levels, genetics, total brain volume, brain physiology, and psychosocial factors.34,35 Moreover, cognitive abilities, such as attention and spatial navigation,36,37 are influenced by hormonal changes during the menstrual cycle. These individual factors associated with sex development increase the complexity of neurocognitive development, make assessing sex differences more challenging, and warrant further investigation. Our study supplies additional evidence of sex differences, specifically for inspection time in the middle school age group, to the current body of literature.
Our investigation was not without limitations. Our sample consisted of middle and high school athletes; thus, the results are applicable only to these populations. Because the actual grade levels of the student-athletes were not available, the middle and high school groups were divided based on age, which might not correspond to each student-athlete’s grade level. Although all assessments were obtained in group settings using a standardized protocol under the supervision of the same certified ATs, administrative procedures by school may have differed.
In conclusion, we examined the test-retest reliability of baseline mobile application neurocognitive tests conducted 1-year apart in middle and high school athletes. Our results indicated low test-retest reliability, most likely due to rapid cognitive improvement occurring throughout adolescence. These naturally occurring improvements from cognitive development could mask postconcussion deficits and lead to a premature return to play. For example, neurocognitive test results after the concussion may appear normal if improvement due to cognitive development is greater than the decline due to postconcussion deficits. Thus, the developmental change from baseline to postconcussion in adolescents should be considered when health care providers make clinical decisions. Further examination is needed to identify the ideal frequency with which to administer baseline tests in the adolescent population.