Although the prevalence of invalid baseline neurocognitive testing has been documented, and repeated administration after obtaining invalid results is recommended, no empirical data are available on the utility of repeated assessment after obtaining invalid baseline results.
To document the utility of readministering neurocognitive testing after an invalid baseline test.
Schools, colleges, and universities.
A total of 156 athletes who obtained invalid results on ImPACT baseline neurocognitive testing and were readministered the ImPACT baseline test within a 2-week period (mean = 4 days).
Overall prevalence of invalid results on reassessment, specific invalidity indicators at initial and follow-up baseline, dependent-samples analysis of variance, with Bonferroni correction for multiple comparisons.
Reassessment resulted in valid test results for 87.2% of the sample. Poor performance on the Design Memory and Three-Letter subscales were the most common reasons for athletes obtaining an invalid baseline result, on both the initial assessment and the reassessment. Significant improvements were noted on all ImPACT composite scores except for Reaction Time on reassessment. Of note, 40% of athletes showed slower reaction time scores on reassessment, perhaps reflecting a more cautious approach taken the second time. Invalid results were more likely to be obtained by athletes with a self-reported history of attention-deficit disorder or learning disability on reassessments (35%) than on initial baseline assessments (10%).
Repeat assessment after the initial invalid baseline performance yielded valid results in nearly 90% of cases. Invalid results on a follow-up assessment may be influenced by a history of attention-deficit disorder or learning disability, the skills and abilities of the individual, or a particular test-taking approach; in these cases, a third assessment may not be useful.
Baseline neurocognitive test results should be checked for validity.
If the athlete's initial baseline performance is invalid, repeat assessment is warranted.
Nearly 90% of athletes will obtain valid baseline results on repeat administration.
Sport-related concussion continues to receive increased attention in the media, literature, and legislative arenas. Researchers have noted an increase in visits to the emergency department from 1997 to 2007 among children ages 8–13 years (100%) and ages 14–19 years (>200%),1 with similar increases noted in high school athletes.2 As of April 2013, 45 states require mandatory education on concussion management for coaches. Although legislation does not require or specify that athletes undergo baseline or postconcussion neurocognitive testing, “consensus experts” have identified that the assessment of cognitive function is an important component in the overall assessment of concussion.3 It is important to note, however, that neurocognitive testing is only one tool to be used in the assessment of concussion, along with clinical review of symptoms and balance testing.3
Following a model established by Barth et al,4 athletes typically complete preseason neurocognitive testing to establish a baseline level of functioning, and then postconcussion test data are compared with baseline test results to document the neurocognitive effects of concussion. Originally conducted using traditional paper-based neuropsychological measures,5 computer-based neurocognitive test batteries have been developed and used by numerous high school, collegiate, and professional sport organizations. The development and use of computer-based neurocognitive test measures have received considerable attention in the literature, and among the areas of focus is the validity of an athlete's approach to baseline and postconcussion testing.
Clinicians and researchers5–7 have speculated that athletes might underreport symptoms after a concussion to facilitate and expedite the return to competition, and others8,9 have focused on identifying those athletes who attempt to purposefully perform poorly on baseline testing (ie, “sandbag”). A number of factors have been shown to affect baseline neurocognitive performance (eg, depression,10 distractions,11 computer problems12) and other factors have been shown to affect cognitive performance (eg, dehydration,13 anxiety or stress,14 lack of sleep or fatigue13). Athletes have recently reported intentionally underperforming on baseline tests,15,16 seemingly unaware that test developers (eg, of the ImPACT test battery) have identified symptom validity cutoffs to identify patterns of performance that are outliers or reflective of inadequate effort.17 The incidence or prevalence of invalid baseline test results has been documented in the literature,18 and repeating baseline testing after obtaining invalid baseline test results is recommended.17 Despite these recommendations, no data are available on the utility of readministration of baseline assessments after invalid performance. The purpose of our study was to document the utility of readministering baseline computerized neurocognitive testing after an invalid baseline testing performance.
Participants were 156 athletes who reported English as a first language, obtained invalid results on the online baseline ImPACT test battery (ImPACT Applications, Inc, Pittsburgh, PA), and were subsequently reassessed on the baseline ImPACT within 2 weeks. The resultant sample comprised athletes ages 11–22 years (mean = 14.9 ± 2.4) who were predominantly male (68%) and completed another ImPACT baseline assessment approximately 4 days after their initial assessment (range = 1–14 days, SD = 3.8 days). The athletes participated in a variety of sports, including football (43%), soccer (15%), and basketball (8%), with 9.6% reporting a history of concussion. A total of 9 athletes (5.8%) self-reported a history of attention-deficit disorder (ADD), 6 (3.8%) reported a history of learning disorder (LD), and 1 (<1%) self-reported a history of both LD and ADD. Data were obtained from several athletics programs and clinical practices supporting athletics programs, and university institutional review board approval was obtained for retrospective analysis of deidentified data. More specifically, athletes who met the inclusion criteria were extracted from regional databases from Pennsylvania, New Jersey, Tennessee, and Texas during the years 2009–2012. All data were obtained from regional schools and colleges or universities that had a relationship with an independent neuropsychological practice, hospital-based practice, or sports medicine professional at a college or university. All athletes were assessed in groups of 10 to 20, supervised by a certified athletic trainer or member of the school's medical staff.
Materials and Procedures
All participants completed a baseline ImPACT test (online version) as part of their institution's ongoing concussion-assessment and -management program. ImPACT consists of 6 neuropsychological test modules, each designed to target different aspects of cognitive functioning, including attention, memory, visual motor (processing) speed, and reaction time. From these 6 tests, 5 separate composite scores are generated: Verbal Memory, Visual Memory, Reaction Time, Visual Motor Speed, and Impulse Control. More thorough descriptions of the ImPACT subscales contributing to the composite scores and the formula for the composite scores are presented in Table 1, and more comprehensive descriptions are available in the literature.19–21 Athletes were automatically flagged as having an invalid baseline (ie, with a ++ on the test report) on the basis of preestablished validity indicators.17 Subscale scores, composite scores, validity indicators, and demographic data are presented in Tables 1 and 2.
As stated, all participants completed 2 baseline assessments. The ImPACT test randomizes presentation of stimuli for the X's and O's, Symbol Match, Color Match, and Three Letter Memory subscales across test administrations. Words and stimuli for word memory and design memory are randomized with respect to the order of presentation, but the actual word lists and collections of visual stimuli are only randomized from baseline to postconcussion assessments.
Only 18.6% of athletes (156 of 837) who obtained invalid baseline results were reassessed within 2 weeks. Of those reassessed, 87.2% obtained valid results on reassessment within 2 weeks (mean = 4 days). The most common cause of an athlete obtaining an invalid baseline was poor performance on the Three Letter Memory subscale (60% of the sample at the initial baseline), followed by Design Memory learning percentage (30%) and Impulse Control (14%). On reassessment, only 20 athletes had invalid scores (12.8% of the initial sample), with the most common causes being the Three Letter Memory subscale (14 of 20; 70%), followed by Design Memory learning percentage (7 of 20; 35%) and Word Memory learning percentage (3 of 20; 15%) (Table 3). Although not listed as invalidity indicators in the ImPACT Manual,17 reaction time composite scores above 0.80 represent 3 standard deviations above the mean and are considered a red flag for possible “sandbagging.” In this sample, 4.5% (n = 7) were above 0.80 on initial baseline, and 9.6% (n = 15) at reassessment, with 40% of athletes obtaining slower reaction time scores on reassessment.
A total of 7 of 20 (35%) of those who obtained invalid test results on the reassessment reported a history of either ADD or LD, compared with only 16 of 156 (10.3%) who obtained invalid results on the initial baseline (χ21 = 9.55; P = .002). The sample of 20 athletes receiving invalid results on reassessment had an average age of 14.2 years (SD = 1.4; t154 = 1.38; P = .17) and was composed of 75% males (χ21 = 0.52; P = .47). Comparisons between athletes receiving valid and invalid baselines revealed poorer scores on initial, invalid baseline assessments (Table 4) across all composites and the symptom scale. Dependent-samples t tests demonstrated differences (indicating improvement) between scores on initial (invalid) versus valid follow-up baselines for the Verbal Memory, Visual Memory, and Visual Motor Speed composite scores, as well as for the Symptom Scale score, but not for the Reaction Time composite (Table 5). With respect to symptom endorsement, 42% of the sample endorsed no concussion-related symptoms at the time of their first baseline, and 56% endorsed no concussion-related symptoms at the follow-up assessment. Comparisons between symptom endorsement at initial and follow-up assessments showed that 43% endorsed the same number of symptoms, 14% endorsed more symptoms on follow-up, and 43% endorsed fewer symptoms on follow-up. Scores on follow-up baseline assessments were considered within valid ranges but remained significantly below normative data (yet within 1 SD) for Visual Memory, Reaction Time, and Visual Motor Speed (Table 6).
From analysis of the number of flagged validity indicators out of the 5 principal validity indicators (listed in Table 1), at the time of the first baseline assessment 89% of the sample had only 1 invalidity indicator, 10% had 2 indicators, and only 1% had 3 indicators. At the reassessment, 11% had 1 invalidity indicator, and 1% had 2 or 3 indicators.
Although it is recommended that athletes who obtain invalid results at baseline be reassessed, we are the first to document the utility of repeating baseline assessments for those who produce invalid results in the initial baseline assessment. We found that even though only 16% of athletes with invalid baselines were assessed within 14 days, nearly 90% obtained valid results on reassessment, suggesting there is considerable utility in readministration. However, performance on the reassessment was still below average, and a subsample of athletes continued to demonstrate invalid performance, even after a second assessment. In this respect, it is not clear whether athletes put forth optimal effort on reassessment or were performing to the best of their abilities, albeit below the average range corresponding to normative data.
A large portion of individuals with invalid baselines also showed reductions in symptoms with repeated testing, which may support the need to repeat baseline testing, given that the level of symptom endorsement at baseline is often used as a comparator for postinjury. Both cognitive performance and symptom endorsement appear to improve after repeat administration for many athletes obtaining invalid baseline assessments.
In this regard, definitively identifying athletes who put forth their best effort on baseline testing continues to remain an enigma. The concern about sandbagging by professional athletes has received attention in the popular press. Also, the need to be very cautious in the postconcussion treatment and return-to-play decision making for youth athletes has been well emphasized, making the need for valid baseline results a critical issue. However, to date the identification of an invalid baseline assessment has not been systematically evaluated with respect to the clinical utility of the results (eg, how these scores affect comparison with postconcussion performance), nor is the relationship between low scores and sandbagging fully understood. Although researchers have documented the efficacy of identifying students attempting to sandbag in laboratory simulations (Schatz and Glatts9), as well as athletes attempting to feign impairment (Erdal8), it is not known whether the methods that ImPACT recommends for identifying sandbagging (eg, reaction time scores >0.80) definitively identify individuals who volitionally misrepresent themselves (as the term implies). The increased incidence of invalid baseline performance by athletes with ADD or LD in this study is consistent with previous research documenting similar results in high school athletes completing the online version of ImPACT.18 In addition, athletes with LD have been shown to perform more poorly on baseline neurocognitive assessments using both traditional, paper-based neuropsychological test measures22 and computer-based measures.23 The overall prevalence of ADD and LD was nearly double in our subsample of athletes with invalid baselines, but the diagnoses were self-reported and may not be entirely accurate. Also, it is not clear whether invalid performance among athletes with ADD or LD reflects inherent cognitive weaknesses, decreased understanding of test instructions, decreased motivation, variable attention, or simply the best performance by that athlete.
The present study addresses these issues, in part, and confirms the following:
For most athletes, retesting those who initially generate invalid baselines produced stronger, improved (and valid) test results. These data provide a great incentive to support retesting athletes with invalid baselines, both in large-scale testing programs as well as in the small clinic, where time and resources may be scarce.
Athletes with ADD or LD appeared to represent a larger portion of the invalid baselines. This observation may not be the result of effort problems or sandbagging but likely represents the nature of these disorders, increasing the importance of carefully and contextually evaluating test results in this group of athletes. More in-depth research in this area is needed, especially with regard to comparing baseline with postconcussion testing in this population, which can be a challenge.
On reassessment, Reaction Time scores tend to worsen. It is not clear whether this reflects a more careful approach to test taking the second time (9.6% versus 4.5% on initial assessment), with twice as many athletes scoring above the >0.80 indicator for sandbagging on reassessment. If completing baseline assessments a second time results in a lower Reaction Time score, this might also apply to an athlete who has a valid initial baseline, becomes concussed, and then takes a postconcussion test. As such, a lower Reaction Time score on postconcussion assessment could be due to a concussion deficit or a more cautious test-taking approach. Differentiating between these phenomena is difficult. Moser et al24 recently documented the case of a youth athlete who completed a baseline assessment after recovering from a concussion (ie, to “reestablish” his baseline) but performed poorly because he had been advised by his mother to take his time and do his best.
Interestingly, despite the general improvement in the rate of valid tests with reassessment, scores still tended to remain low compared with normative data. Perhaps this population of test takers tends to be borderline performers and thus more prone to invalid test results.
If an athlete obtains an invalid baseline, it is reasonable to repeat testing. This allows correction for possible artifacts that might have affected performance: low effort, distractions, malfunctioning equipment, lack of seriousness, not taking the time to read and understand the directions, etc. Given the increased likelihood of invalid results for individuals with a history of ADD or LD, it may be helpful to interview the athlete (or his or her parent or guardian) to discuss a history of attentional or learning concerns that have not been explored or evaluated. However, if a follow-up (ie, second) baseline is also invalid, then these results may simply reflect the individual's skills, and retesting (ie, a third assessment) may not be valuable. When testing within a large sample or population, there will be outliers, the interpretation of which requires clinical judgment and consideration of other factors. As we know, concussion screening is recommended as only 1 tool in the return-to-play decision-making model.3 As such, postconcussion test results can be compared with normative data, but a more comprehensive neuropsychological evaluation would not be warranted based solely on the results of an invalid baseline.
Future authors need to focus on, and perhaps control for, effort, preparation, and environment in baseline testing. We obtained data from different sites and there was no controlled, standardized protocol for administration across sites or administrators. We do not know the extent to which improvement on invalid rates with retesting would have occurred if a consistent, standardized, serious, and controlled approach had been used across the entire sample. Computerized administration of neuropsychological tests and group testing of computerized tests have been critiqued because of a lack of test-administration standards that may affect the validity of test results.11,25 Perhaps standardization of computerized baseline testing instructions and environments will result in fewer invalid results overall, and it may be that the percentage of individuals with ADD or LD who make up the population with invalid baseline results will be greater. In addition, although many of the stimuli are randomized from test to test, some stimuli are simply reordered, so there may be some learning effects upon reassessment. Finally, it is unclear from this retrospective study exactly what athletes who produced invalid baseline results with respect to their initial test performance were told about why they had to retake the baseline test. Future researchers might assess the utility of different feedback instructions to those athletes whose initial baseline test results are invalid to determine which instructions yield the highest rate of valid tests upon retaking a baseline test. Future investigators may also determine whether there are alternative or additional indicators for invalid baselines. Two of the invalidity indicators (Word Memory learning percentage and Design Memory learning percentage) do not directly contribute to the formulas for calculating composite scores. It is possible that other subscales contributing to ImPACT composite scores could also contribute as invalidity indicators.