Context:

Although the prevalence of invalid baseline neurocognitive testing has been documented, and repeated administration after obtaining invalid results is recommended, no empirical data are available on the utility of repeated assessment after obtaining invalid baseline results.

Objective:

To document the utility of readministering neurocognitive testing after an invalid baseline test.

Design:

Case series.

Setting:

Schools, colleges, and universities.

Patients or Other Participants:

A total of 156 athletes who obtained invalid results on ImPACT baseline neurocognitive testing and were readministered the ImPACT baseline test within a 2-week period (mean = 4 days).

Main Outcome Measure(s):

Overall prevalence of invalid results on reassessment, specific invalidity indicators at initial and follow-up baseline, dependent-samples analysis of variance, with Bonferroni correction for multiple comparisons.

Results

 Reassessment resulted in valid test results for 87.2% of the sample. Poor performance on the Design Memory and Three-Letter subscales were the most common reasons for athletes obtaining an invalid baseline result, on both the initial assessment and the reassessment. Significant improvements were noted on all ImPACT composite scores except for Reaction Time on reassessment. Of note, 40% of athletes showed slower reaction time scores on reassessment, perhaps reflecting a more cautious approach taken the second time. Invalid results were more likely to be obtained by athletes with a self-reported history of attention-deficit disorder or learning disability on reassessments (35%) than on initial baseline assessments (10%).

Conclusions:

Repeat assessment after the initial invalid baseline performance yielded valid results in nearly 90% of cases. Invalid results on a follow-up assessment may be influenced by a history of attention-deficit disorder or learning disability, the skills and abilities of the individual, or a particular test-taking approach; in these cases, a third assessment may not be useful.

Key Points
  • Baseline neurocognitive test results should be checked for validity.

  • If the athlete's initial baseline performance is invalid, repeat assessment is warranted.

  • Nearly 90% of athletes will obtain valid baseline results on repeat administration.

Sport-related concussion continues to receive increased attention in the media, literature, and legislative arenas. Researchers have noted an increase in visits to the emergency department from 1997 to 2007 among children ages 8–13 years (100%) and ages 14–19 years (>200%),1  with similar increases noted in high school athletes.2  As of April 2013, 45 states require mandatory education on concussion management for coaches. Although legislation does not require or specify that athletes undergo baseline or postconcussion neurocognitive testing, “consensus experts” have identified that the assessment of cognitive function is an important component in the overall assessment of concussion.3  It is important to note, however, that neurocognitive testing is only one tool to be used in the assessment of concussion, along with clinical review of symptoms and balance testing.3 

Following a model established by Barth et al,4  athletes typically complete preseason neurocognitive testing to establish a baseline level of functioning, and then postconcussion test data are compared with baseline test results to document the neurocognitive effects of concussion. Originally conducted using traditional paper-based neuropsychological measures,5  computer-based neurocognitive test batteries have been developed and used by numerous high school, collegiate, and professional sport organizations. The development and use of computer-based neurocognitive test measures have received considerable attention in the literature, and among the areas of focus is the validity of an athlete's approach to baseline and postconcussion testing.

Clinicians and researchers57  have speculated that athletes might underreport symptoms after a concussion to facilitate and expedite the return to competition, and others8,9  have focused on identifying those athletes who attempt to purposefully perform poorly on baseline testing (ie, “sandbag”). A number of factors have been shown to affect baseline neurocognitive performance (eg, depression,10  distractions,11  computer problems12) and other factors have been shown to affect cognitive performance (eg, dehydration,13  anxiety or stress,14  lack of sleep or fatigue13). Athletes have recently reported intentionally underperforming on baseline tests,15,16  seemingly unaware that test developers (eg, of the ImPACT test battery) have identified symptom validity cutoffs to identify patterns of performance that are outliers or reflective of inadequate effort.17  The incidence or prevalence of invalid baseline test results has been documented in the literature,18  and repeating baseline testing after obtaining invalid baseline test results is recommended.17  Despite these recommendations, no data are available on the utility of readministration of baseline assessments after invalid performance. The purpose of our study was to document the utility of readministering baseline computerized neurocognitive testing after an invalid baseline testing performance.

METHODS

Participants

Participants were 156 athletes who reported English as a first language, obtained invalid results on the online baseline ImPACT test battery (ImPACT Applications, Inc, Pittsburgh, PA), and were subsequently reassessed on the baseline ImPACT within 2 weeks. The resultant sample comprised athletes ages 11–22 years (mean = 14.9 ± 2.4) who were predominantly male (68%) and completed another ImPACT baseline assessment approximately 4 days after their initial assessment (range = 1–14 days, SD = 3.8 days). The athletes participated in a variety of sports, including football (43%), soccer (15%), and basketball (8%), with 9.6% reporting a history of concussion. A total of 9 athletes (5.8%) self-reported a history of attention-deficit disorder (ADD), 6 (3.8%) reported a history of learning disorder (LD), and 1 (<1%) self-reported a history of both LD and ADD. Data were obtained from several athletics programs and clinical practices supporting athletics programs, and university institutional review board approval was obtained for retrospective analysis of deidentified data. More specifically, athletes who met the inclusion criteria were extracted from regional databases from Pennsylvania, New Jersey, Tennessee, and Texas during the years 2009–2012. All data were obtained from regional schools and colleges or universities that had a relationship with an independent neuropsychological practice, hospital-based practice, or sports medicine professional at a college or university. All athletes were assessed in groups of 10 to 20, supervised by a certified athletic trainer or member of the school's medical staff.

Materials and Procedures

All participants completed a baseline ImPACT test (online version) as part of their institution's ongoing concussion-assessment and -management program. ImPACT consists of 6 neuropsychological test modules, each designed to target different aspects of cognitive functioning, including attention, memory, visual motor (processing) speed, and reaction time. From these 6 tests, 5 separate composite scores are generated: Verbal Memory, Visual Memory, Reaction Time, Visual Motor Speed, and Impulse Control. More thorough descriptions of the ImPACT subscales contributing to the composite scores and the formula for the composite scores are presented in Table 1, and more comprehensive descriptions are available in the literature.1921  Athletes were automatically flagged as having an invalid baseline (ie, with a ++ on the test report) on the basis of preestablished validity indicators.17  Subscale scores, composite scores, validity indicators, and demographic data are presented in Tables 1 and 2.

Table 1.

Subtests and Composite Scores and Validity Indicators for the ImPACT Test (Online Version)a

Subtests and Composite Scores and Validity Indicators for the ImPACT Test (Online Version)a
Subtests and Composite Scores and Validity Indicators for the ImPACT Test (Online Version)a
Table 2.

Demographics of the Study Samples

Demographics of the Study Samples
Demographics of the Study Samples

As stated, all participants completed 2 baseline assessments. The ImPACT test randomizes presentation of stimuli for the X's and O's, Symbol Match, Color Match, and Three Letter Memory subscales across test administrations. Words and stimuli for word memory and design memory are randomized with respect to the order of presentation, but the actual word lists and collections of visual stimuli are only randomized from baseline to postconcussion assessments.

RESULTS

Only 18.6% of athletes (156 of 837) who obtained invalid baseline results were reassessed within 2 weeks. Of those reassessed, 87.2% obtained valid results on reassessment within 2 weeks (mean = 4 days). The most common cause of an athlete obtaining an invalid baseline was poor performance on the Three Letter Memory subscale (60% of the sample at the initial baseline), followed by Design Memory learning percentage (30%) and Impulse Control (14%). On reassessment, only 20 athletes had invalid scores (12.8% of the initial sample), with the most common causes being the Three Letter Memory subscale (14 of 20; 70%), followed by Design Memory learning percentage (7 of 20; 35%) and Word Memory learning percentage (3 of 20; 15%) (Table 3). Although not listed as invalidity indicators in the ImPACT Manual,17  reaction time composite scores above 0.80 represent 3 standard deviations above the mean and are considered a red flag for possible “sandbagging.” In this sample, 4.5% (n = 7) were above 0.80 on initial baseline, and 9.6% (n = 15) at reassessment, with 40% of athletes obtaining slower reaction time scores on reassessment.

Table 3.

Percentage of Individuals Below Cutoffs for Validity Indicators on Initial and Follow-up Baseline Assessments

Percentage of Individuals Below Cutoffs for Validity Indicators on Initial and Follow-up Baseline Assessments
Percentage of Individuals Below Cutoffs for Validity Indicators on Initial and Follow-up Baseline Assessments

A total of 7 of 20 (35%) of those who obtained invalid test results on the reassessment reported a history of either ADD or LD, compared with only 16 of 156 (10.3%) who obtained invalid results on the initial baseline (χ21 = 9.55; P = .002). The sample of 20 athletes receiving invalid results on reassessment had an average age of 14.2 years (SD = 1.4; t154 = 1.38; P = .17) and was composed of 75% males (χ21 = 0.52; P = .47). Comparisons between athletes receiving valid and invalid baselines revealed poorer scores on initial, invalid baseline assessments (Table 4) across all composites and the symptom scale. Dependent-samples t tests demonstrated differences (indicating improvement) between scores on initial (invalid) versus valid follow-up baselines for the Verbal Memory, Visual Memory, and Visual Motor Speed composite scores, as well as for the Symptom Scale score, but not for the Reaction Time composite (Table 5). With respect to symptom endorsement, 42% of the sample endorsed no concussion-related symptoms at the time of their first baseline, and 56% endorsed no concussion-related symptoms at the follow-up assessment. Comparisons between symptom endorsement at initial and follow-up assessments showed that 43% endorsed the same number of symptoms, 14% endorsed more symptoms on follow-up, and 43% endorsed fewer symptoms on follow-up. Scores on follow-up baseline assessments were considered within valid ranges but remained significantly below normative data (yet within 1 SD) for Visual Memory, Reaction Time, and Visual Motor Speed (Table 6).

Table 4.

Comparisons Between Initial Invalid and Valid Baseline Data

Comparisons Between Initial Invalid and Valid Baseline Data
Comparisons Between Initial Invalid and Valid Baseline Data
Table 5.

Initial and Follow-up Baseline Scores for Athletes with Valid Follow-up Baseline Assessments

Initial and Follow-up Baseline Scores for Athletes with Valid Follow-up Baseline Assessments
Initial and Follow-up Baseline Scores for Athletes with Valid Follow-up Baseline Assessments
Table 6.

Comparisons Between Follow-up Baseline and Sample Data for Athletes With Valid Follow-up Baseline Assessments

Comparisons Between Follow-up Baseline and Sample Data for Athletes With Valid Follow-up Baseline Assessments
Comparisons Between Follow-up Baseline and Sample Data for Athletes With Valid Follow-up Baseline Assessments

From analysis of the number of flagged validity indicators out of the 5 principal validity indicators (listed in Table 1), at the time of the first baseline assessment 89% of the sample had only 1 invalidity indicator, 10% had 2 indicators, and only 1% had 3 indicators. At the reassessment, 11% had 1 invalidity indicator, and 1% had 2 or 3 indicators.

DISCUSSION

Although it is recommended that athletes who obtain invalid results at baseline be reassessed, we are the first to document the utility of repeating baseline assessments for those who produce invalid results in the initial baseline assessment. We found that even though only 16% of athletes with invalid baselines were assessed within 14 days, nearly 90% obtained valid results on reassessment, suggesting there is considerable utility in readministration. However, performance on the reassessment was still below average, and a subsample of athletes continued to demonstrate invalid performance, even after a second assessment. In this respect, it is not clear whether athletes put forth optimal effort on reassessment or were performing to the best of their abilities, albeit below the average range corresponding to normative data.

A large portion of individuals with invalid baselines also showed reductions in symptoms with repeated testing, which may support the need to repeat baseline testing, given that the level of symptom endorsement at baseline is often used as a comparator for postinjury. Both cognitive performance and symptom endorsement appear to improve after repeat administration for many athletes obtaining invalid baseline assessments.

In this regard, definitively identifying athletes who put forth their best effort on baseline testing continues to remain an enigma. The concern about sandbagging by professional athletes has received attention in the popular press. Also, the need to be very cautious in the postconcussion treatment and return-to-play decision making for youth athletes has been well emphasized, making the need for valid baseline results a critical issue. However, to date the identification of an invalid baseline assessment has not been systematically evaluated with respect to the clinical utility of the results (eg, how these scores affect comparison with postconcussion performance), nor is the relationship between low scores and sandbagging fully understood. Although researchers have documented the efficacy of identifying students attempting to sandbag in laboratory simulations (Schatz and Glatts9), as well as athletes attempting to feign impairment (Erdal8), it is not known whether the methods that ImPACT recommends for identifying sandbagging (eg, reaction time scores >0.80) definitively identify individuals who volitionally misrepresent themselves (as the term implies). The increased incidence of invalid baseline performance by athletes with ADD or LD in this study is consistent with previous research documenting similar results in high school athletes completing the online version of ImPACT.18  In addition, athletes with LD have been shown to perform more poorly on baseline neurocognitive assessments using both traditional, paper-based neuropsychological test measures22  and computer-based measures.23  The overall prevalence of ADD and LD was nearly double in our subsample of athletes with invalid baselines, but the diagnoses were self-reported and may not be entirely accurate. Also, it is not clear whether invalid performance among athletes with ADD or LD reflects inherent cognitive weaknesses, decreased understanding of test instructions, decreased motivation, variable attention, or simply the best performance by that athlete.

The present study addresses these issues, in part, and confirms the following:

  1. 1.

    For most athletes, retesting those who initially generate invalid baselines produced stronger, improved (and valid) test results. These data provide a great incentive to support retesting athletes with invalid baselines, both in large-scale testing programs as well as in the small clinic, where time and resources may be scarce.

  2. 2.

    Athletes with ADD or LD appeared to represent a larger portion of the invalid baselines. This observation may not be the result of effort problems or sandbagging but likely represents the nature of these disorders, increasing the importance of carefully and contextually evaluating test results in this group of athletes. More in-depth research in this area is needed, especially with regard to comparing baseline with postconcussion testing in this population, which can be a challenge.

  3. 3.

    On reassessment, Reaction Time scores tend to worsen. It is not clear whether this reflects a more careful approach to test taking the second time (9.6% versus 4.5% on initial assessment), with twice as many athletes scoring above the >0.80 indicator for sandbagging on reassessment. If completing baseline assessments a second time results in a lower Reaction Time score, this might also apply to an athlete who has a valid initial baseline, becomes concussed, and then takes a postconcussion test. As such, a lower Reaction Time score on postconcussion assessment could be due to a concussion deficit or a more cautious test-taking approach. Differentiating between these phenomena is difficult. Moser et al24  recently documented the case of a youth athlete who completed a baseline assessment after recovering from a concussion (ie, to “reestablish” his baseline) but performed poorly because he had been advised by his mother to take his time and do his best.

Interestingly, despite the general improvement in the rate of valid tests with reassessment, scores still tended to remain low compared with normative data. Perhaps this population of test takers tends to be borderline performers and thus more prone to invalid test results.

If an athlete obtains an invalid baseline, it is reasonable to repeat testing. This allows correction for possible artifacts that might have affected performance: low effort, distractions, malfunctioning equipment, lack of seriousness, not taking the time to read and understand the directions, etc. Given the increased likelihood of invalid results for individuals with a history of ADD or LD, it may be helpful to interview the athlete (or his or her parent or guardian) to discuss a history of attentional or learning concerns that have not been explored or evaluated. However, if a follow-up (ie, second) baseline is also invalid, then these results may simply reflect the individual's skills, and retesting (ie, a third assessment) may not be valuable. When testing within a large sample or population, there will be outliers, the interpretation of which requires clinical judgment and consideration of other factors. As we know, concussion screening is recommended as only 1 tool in the return-to-play decision-making model.3  As such, postconcussion test results can be compared with normative data, but a more comprehensive neuropsychological evaluation would not be warranted based solely on the results of an invalid baseline.

Future authors need to focus on, and perhaps control for, effort, preparation, and environment in baseline testing. We obtained data from different sites and there was no controlled, standardized protocol for administration across sites or administrators. We do not know the extent to which improvement on invalid rates with retesting would have occurred if a consistent, standardized, serious, and controlled approach had been used across the entire sample. Computerized administration of neuropsychological tests and group testing of computerized tests have been critiqued because of a lack of test-administration standards that may affect the validity of test results.11,25  Perhaps standardization of computerized baseline testing instructions and environments will result in fewer invalid results overall, and it may be that the percentage of individuals with ADD or LD who make up the population with invalid baseline results will be greater. In addition, although many of the stimuli are randomized from test to test, some stimuli are simply reordered, so there may be some learning effects upon reassessment. Finally, it is unclear from this retrospective study exactly what athletes who produced invalid baseline results with respect to their initial test performance were told about why they had to retake the baseline test. Future researchers might assess the utility of different feedback instructions to those athletes whose initial baseline test results are invalid to determine which instructions yield the highest rate of valid tests upon retaking a baseline test. Future investigators may also determine whether there are alternative or additional indicators for invalid baselines. Two of the invalidity indicators (Word Memory learning percentage and Design Memory learning percentage) do not directly contribute to the formulas for calculating composite scores. It is possible that other subscales contributing to ImPACT composite scores could also contribute as invalidity indicators.

REFERENCES

REFERENCES
1
Bakhos
LL
,
Lockhart
GR
,
Myers
R
,
Linakis
JG
.
Emergency department visits for concussion in young child athletes
.
Pediatrics
.
2010
;
126
(
3
):
e550
e556
.
2
Lincoln
AE
,
Caswell
SV
,
Almquist
JL
,
Dunn
RE
,
Norris
JB
,
Hinton
RY
.
Trends in concussion incidence in high school sports: a prospective 11-year study
.
Am J Sports Med
.
2011
;
39
(
5
):
958
963
.
3
McCrory
P
,
Meeuwisse
WH
,
Aubry
M
,
et al
.
Consensus statement on concussion in sport: the 4th International Conference on Concussion in Sport held in Zurich, November 2012
.
Br J Sports Med
.
2013
;
47
(
5
):
250
258
.
4
Barth
JT
,
Alves
W
,
Ryan
T
,
Macciocchi
SN
,
Rimel
RW
,
Nelson
WE
.
Mild head injury in sports: neuropsychological sequelae and recovery of function
.
In
:
Levin
HS
,
Eisenberg
HM
,
Benton
AL
,
eds
.
Mild Head Injury
.
New York, NY
:
Oxford University Press;
1989
:
257
275
.
5
Lovell
MR
,
Collins
MW
.
Neuropsychological assessment of the college football player
.
J Head Trauma Rehabil
.
1998
;
13
(
2
):
9
26
.
6
Echemendia
RJ
,
Cantu
RC
.
Return to play following sports-related mild traumatic brain injury: the role for neuropsychology
.
Appl Neuropsychol
.
2003
;
10
(
1
):
48
55
.
7
Lovell
MR
,
Collins
MW
,
Maroon
JC
,
et al
.
Inaccuracy of symptom reporting following concussion in athletes [abstract]
.
Med Sci Sports Exerc
.
2002
;
34
(
suppl 1
):
S298
.
8
Erdal
K
.
Neuropsychological testing for sports-related concussion: how athletes can sandbag their baseline testing without detection
.
Arch Clin Neuropsychol
.
2012
;
27
(
5
):
473
479
.
9
Schatz
P
,
Glatts
C
.
“Sandbagging” baseline test performance on ImPACT, without detection, is more difficult than it appears
.
Arch Clin Neuropsychol
.
2013
;
28
(
3
):
236
244
.
10
Covassin
T
,
Elbin
RJ
III,
Larson
E
,
Kontos
AP
.
Sex and age differences in depression and baseline sport-related concussion neurocognitive performance and symptoms
.
Clin J Sport Med
.
2012
;
22
(
2
):
98
104
.
11
Moser
RS
,
Schatz
P
,
Neidzwski
K
,
Ott
SD
.
Group versus individual administration affects baseline neurocognitive test performance
.
Am J Sports Med
.
2011
;
39
(
11
):
2325
2330
.
12
Schatz
P
,
Neidzwski
K
,
Moser
RS
,
Karpf
R
.
Relationship between subjective test feedback provided by high-school athletes during computer-based assessment of baseline cognitive functioning and self-reported symptoms
.
Arch Clin Neuropsychol
.
2010
;
25
(
4
):
285
292
.
13
Neylan
TC
,
Metzler
TJ
,
Henn-Haase
C
,
et al
.
Prior night sleep duration is associated with psychomotor vigilance in a healthy sample of police academy recruits
.
Chronobiol Int
.
2010
;
27
(
7
):
1493
1508
.
14
Law
R
,
Groome
D
,
Thorn
L
,
Potts
R
,
Buchanan
T
.
The relationship between retrieval-induced forgetting, anxiety, and personality
.
Anxiety Stress Coping
.
2012
;
25
(
6
):
711
718
.
15
Marvez
A
.
Players may try to beat concussion tests
.
Fox Sports Web site
. .
Updated April 21, 2011. Accessed December 17, 2013
.
16
Reilly
R
.
Talking football with Archie, Peyton, Eli
.
ESPN Web site
. .
Updated April 27, 2011. Accessed December 17, 2013
.
17
Lovell
MR
.
Clinical Interpretive Manual Online ImPACT 2007–2012
.
ImPACT Web site
.
http://www.impacttest.com. Accessed December 17
,
2013
.
18
Schatz
P
,
Moser
RS
,
Solomon
GS
,
Ott
SD
,
Karpf
R
.
Prevalence of invalid computerized baseline neurocognitive test results in high school and collegiate athletes
.
J Athl Train
.
2012
;
47
(
3
):
289
296
.
19
Iverson
GL
,
Gaetz
M
,
Lovell
MR
,
Collins
MW
.
Relation between subjective fogginess and neuropsychological testing following concussion
.
J Int Neuropsychol Soc
.
2004
;
10
(
6
):
904
906
.
20
Lovell
MR
,
Collins
MW
,
Iverson
GL
,
et al
.
Recovery from mild concussion in high school athletes
.
J Neurosurg
.
2003
;
98
(
2
):
296
301
.
21
Podell
K
.
Computerized assessment of sports-related brain injury
.
In
:
Lovell
MR
,
Echemendia
RJ
,
Barth
J
,
Collins
MW
,
ed
.
Traumatic Brain Injury in Sports: An International Neuropsychological Perspective
.
Lisse, The Netherlands
:
Swets & Zeitlinger;
2004
:
375
396
.
22
Collins
MW
,
Grindel
SH
,
Lovell
MR
,
et al
.
Relationship between concussion and neuropsychological performance in college football players
.
JAMA
.
1999
;
282
(
10
):
964
970
.
23
Elbin
RJ
,
Kontos
AP
,
Kegel
N
,
Johnson
E
,
Burkhart
S
,
Schatz
P
.
Individual and combined effects of LD and ADHD on computerized neurocognitive concussion test performance: evidence for separate norms
.
Arch Clin Neuropsychol
.
2013
;
28
(
5
):
476
484
.
24
Moser
RS
,
Schatz
P
,
Lichtenstein
J
.
The importance of proper administration and interpretation of neuropsychological baseline and post-concussion computerized testing
.
Appl Neuropsychol Child
.
2013
.
In press
.
25
Bauer
RM
,
Iverson
GL
,
Cernich
AN
,
Binder
LM
,
Ruff
RM
,
Naugle
RI
.
Computerized neuropsychological assessment devices: joint position paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology
.
Arch Clin Neuropsychol
.
2012
;
27
(
3
):
362
373
.