ABSTRACT

Background

In-training examinations (ITEs) are intended for low-stakes, formative assessment of residents' knowledge, but are increasingly used for high-stake purposes, such as to predict board examination failures.

Objective

The aim of this review was to investigate the relationship between performance on ITEs and board examination performance across medical specialties.

Methods

A search of the literature for studies assessing the strength of the relationship between ITE and board examination performance from January 2000 to March 2019 was completed. Results were categorized based on the type of statistical analysis used to determine the relationship between ITE performance and board examination performance.

Results

Of 1407 articles initially identified, 89 articles underwent full-text review, and 32 articles were included in this review. There was a moderate-strong relationship between ITE and board examination performance, and ITE scores significantly predict board examination scores for the majority of studies. Performing well on an ITE predicts a passing outcome for the board examination, but there is less evidence that performing poorly on an ITE will result in failing the associated specialty board examination.

Conclusions

There is a moderate to strong correlation between ITE performance and subsequent performance on board examinations. That the predictive value for passing the board examination is stronger than the predictive value for failing calls into question the “common wisdom” that ITE scores can be used to identify “at risk” residents. The graduate medical education community should continue to exercise caution and restraint in using ITE scores for moderate to high-stakes decisions.

Introduction

In-training examinations (ITEs) have been used as an objective measure of residents' and fellows' medical knowledge since the 1970s. ITE scores and reports provide program directors with information on the strengths and weaknesses of their trainees' medical knowledge in various content areas, which can be used in a low-stakes, formative fashion to support development of individualized learning plans. ITE scores may also be utilized by program directors at the program level, with areas of poor performance across trainees suggesting potential gaps in program curricula and identifying areas on which to focus for continuous program improvement. Ultimately, graduate medical education (GME) programs are responsible for ensuring their trainees are equipped to succeed in passing the qualifying examination (QE) and/or certifying examination (CE), administered by their respective specialty board, at the conclusion of their training. It is unclear, however, if ITEs are predictive of trainees' success in the board certification process.

Validity evidence for the interpretation of scores from assessment tools can be organized into 5 categories, based on Messick's unified framework, including content, response process, relationship to other variables, internal structure, and consequences.1  The category most relevant to gather evidence for ITE scores is relationship to other variables. If the ITE and respective specialty board examinations had similar test content, ITE scores would share a strong relationship with board examination scores. The predictive ability of ITEs has been an area of interest since the early 1990s, and the number of investigations of this topic has continued to increase in recent years. Furthermore, some specialties and programs have begun to expand the use of ITEs beyond the original low-stakes formative intent to more high-stakes decisions, including formal academic actions, such as formal remediation, probation, non-advancement, and non-retention within the training program, which has significant implications for the consequences of ITE scores.24 

Given that ITEs could be utilized in a manner that impacts a trainee's future in terms of promotion and program completion, ensuring that there is validity evidence for the relationship between ITE scores and board examination scores is of the utmost importance. To date, there has neither been a review synthesizing the literature on the use of ITEs across medical specialties nor a synthesis of correlations/prediction results between ITE scores and board examination scores. Thus, the purpose of this study was to complete a systematic review of the literature on relationships to other variables' evidence for interpretation of GME ITE scores, with the other variable being performance on board examinations. A secondary aim of the study was to identify current use of ITEs across specialties.

Methods

Selection of Studies

We conducted a systematic review of the research on the association between ITEs and board examinations published from January 2000 to March 2019 using the following databases: PubMed, Embase, Cochrane Library, and Scopus. Major medical subject heading terms used for the systematic review included: in-training examination, in-service examination, medical education, and certification. Two authors (B.K.S. and H.C.M.) independently reviewed titles, abstracts, and full-text articles to determine if they met inclusion criteria. This process was completed with the assistance of systematic review software (Covidence, 2019). Phase 1 included screening of titles and abstracts for relevance. Phase 2 included evaluation of the full text. The search methods are reported using relevant items of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) checklist (Figure).

Figure

PRISMA Diagram Demonstrating Study Selection

Figure

PRISMA Diagram Demonstrating Study Selection

Eligibility Criteria

Studies were included if: (1) they reported quantitative analysis of an association between performance on the ITE and performance on the respective specialty board examinations; (2) the study population included US GME trainees (residents or fellows); (3) manuscripts were available in the English language; (4) the full-text article was able to be obtained; and (5) articles were published after the year 2000. The criteria to include studies published after 2000 was established given our assessment of the availability of literature, which increased substantially after the year 2000.

Title/Abstract and Full-Text Review

Two authors (B.K.S. and H.C.M.) independently reviewed the titles and abstracts of all 1407 articles captured by the search, removing duplicates and articles obviously not meeting predetermined eligibility criteria. Discrepant opinions were discussed until consensus was reached during the abstract and full-text review stages. Two authors (B.K.S. and H.C.M.) completed the abstract review phase, while all 4 authors participated in the full-text review. A full-text review of 89 articles determined eligibility for inclusion in the final review, with a total of 32 articles ultimately included (Figure).

Relationship to Other Variables' Evidence

In the Messick validity evidence framework, relationship to other variables evidence refers to gathering information to show that assessment scores relate to scores from similar assessments. Such evidence generally takes 3 forms, including correlation coefficient, regression equation, and Area Under the ROC Curve (AUC). For continuous scores (eg, 0%–100%), relationships are measured with a correlation coefficient, where a strong positive correlation value is a metric for validity evidence. For educational purposes, correlation values > 0.50 are considered strong, 0.30–0.49 moderate, and < 0.30 low.5  A significant regression equation is another potential metric for validity evidence where either continuous scores or dichotomous outcomes (eg, pass/fail) are used to predict future performance on another variable measured on a continuous scale (linear regression) or as dichotomous outcomes (logistic regression). Finally, an AUC with good accuracy/predictive value is a third potential metric for validity evidence where a particular score (eg, cut score) or outcome is used to discriminate between true positives and false positives of future performance.

Data Extraction and Analysis

Results were categorized based on the type of statistical analysis used to determine the relationship between ITE performance and board examination performance: correlation, linear regression, logistic regression, and/or AUC. Additionally, the type of ITE performance data (eg, percent score or rank) used for the analysis were extracted. Data were also collected from publicly available websites for each specialty society in terms of the format and number of ITE questions, and national pass rates for board examinations (Table 1).

Table 1

Summary of Specialty ITEs and Board Examinations

Summary of Specialty ITEs and Board Examinations
Summary of Specialty ITEs and Board Examinations

Two authors (B.K.S. and H.C.M.) independently assessed the quality of the studies included in the final analysis using the Medical Education Research Study Quality Instrument (MERSQI). The MERSQI scoring system includes 10 items that are used to evaluate the quality of medical education research, including study design, institutions, response rate, type of data, validity, appropriateness of analysis, sophistication of analysis, and outcome.6  Each item is scored (total possible score of 18), with Reed et al citing the mean as 9.6 in a cross-sectional study of 100 medical education research studies.6  The validity and response rate items were not applicable to the studies included in our analysis; thus, these criteria were discarded, resulting in a total possible score of 13.5 points. Any discrepancies in scoring were resolved through group consensus. Importantly, the MERSQI scoring system is not intended to generate an absolute indicator of the validity or reliability of the research results. Furthermore, “cut-points” for “excellent” or “poor” quality have not been defined. Rather, the scores can be used to compare the quality of evidence between studies within a specific body of literature.

Given that there are differences in language across specialties in terms of what QE and CE means, the term board examination will henceforth refer to the written examination for each given specialty, unless a study evaluated how the ITE compared with oral board examination results. This study is consistent with the definition of non-human subjects research, therefore, no Institutional Review Board review was sought.

Results

Thirty-two articles were included in the final review, representing 21 medical specialties. National first-time pass rates for specialty board examinations are high across these specialties, ranging from 83% to 99% (Table 1). Table 2 includes a summary of the characteristics, results, and quality assessment of all studies included in our final analysis.

Table 2

Summary of Included Studies

Summary of Included Studies
Summary of Included Studies

ITE Performance Data

The statistical analyses in the studies utilized a variety of quantification methods for ITE performance. Two studies (5%) grouped ITE performance into stanines (scaling of test scores on a 9-point scale with a mean of 5 and standard deviation of 2), 14 studies (38%) used ITE absolute scores, 11 studies (30%) used ITE percentiles, and 10 studies (27%) used both absolute scores and percentile rank. A total of 16 studies used board examination pass/fail rates (43%), 13 studies (35%) used absolute or percentile board examination scores, and 8 (22%) used both absolute and percentile scores.

Relationship to Other Variables' Validity Evidence

About half of the studies (17, 53%) conducted a single type of statistical analysis to show evidence of relationship to other variables' evidence, 8 (25%) conducted 2 types of statistical analyses, 6 (18%) conducted 3 types of statistical analyses, and 1 (3%) conducted all 4 types of analyses. Nineteen studies used correlations, 12 used linear regressions, 18 used logistic regressions, and 6 used AUC values for the statistical analysis. Two studies reported sensitivity and specificity values, but did not provide an AUC value and thus were not include in the AUC category.

Forty-seven percent (9) of the 19 correlation studies found a strong relationship2,714  between ITE performance and board examination performance for all residents and fellows in the respective study samples, and 1 found a moderate relationship (Withiam-Leitch and Olawaiye, obstetrics and gynecology15) for all residents. The other 9 correlation studies found mixed results by postgraduate year (PGY) or specialty.1624  Eleven of the 12 studies using linear regression found that ITE scores significantly predicted board examination performance.4,7,9,10,13,2529  Only 1 study showing signicant prediction for PGY-3–PGY-4 residents, but not PGY-1–PGY-2 residents (Swanson et al, orthopaedic surgery21).

For logistic regression analysis, studies either used ITE scores as a predictor on a continuous scale or categorized ITE scores into 2 categories (eg, < 10th percentile, > 10th percentile). AUC analysis was used to determine the precision in prediction as a complement to logistic regression results or was done without logistic regression analysis. For predicting a board examination passing outcome, 6 studies showed ITE scores significantly predicted who would pass the board examination.4,9,13,26,27,29  Three additional studies showed that a particular high score, quartile, or stanine significantly predicted who would pass the board examination (Pucas 2012, otolaryngology30), along with AUC good accuracy/predictive value (Lingenfelter et al, obstetrics and gynecology31 ; Pucas 2018, otolaryngology32). O'Neill et al (family medicine)14  also found good AUC accuracy/predictive value for a particular high ITE score. Two additional studies showed that passing the ITE predicted passing the board examination (Johnson et al, ophthalmology33) with good AUC accuracy/predictive value (Indik et al, cariovascular disease fellows10).

For predicting a board examination failing outcome, 2 studies showed ITE scores significantly predicted who would fail the board examination (Swanson et al, orthopaedic surgery21), but only with a moderate AUC accuracy/predictive value (Withiam-Leitch and Olawaiye, obstetrics and gynecology15). Three studies showed that a particular low score or quartile significantly predicted who would fail the board examination (de Virgilio et al, surgery3 ; Kay et al, internal medicine11), with a good AUC accuracy/predictive value for PGY-2 and PGY-3 residents' ITE scores, but poor predictive value for PGY-1 ITE scores (Carey and Drucker, ophthalmology16). Babbott et al (internal medicine)34  did not perform logistic regression and found good AUC accuracy/predictive value for a low quartile score. Only 1 study showed that failing an ITE significantly predicted failing the board examination (Carey and Drucker, ophthalmology16), but with a low positive predictive value and only applied to PGY-2 and PGY-3 residents' ITE scores. McClintock and Gravlee (anesthesiology)29  applied a logistic regression to see how well the model predicted board examination fail/pass outcomes. The accuracy in prediction value was low-moderate for predicting a fail outcome and moderate-high for predicting a pass outcome. Finally, 2 studies found ITE scores had weak to no prediction for board examination pass/fail outcomes (Collichio et al, hematology and oncology8 ; Monaghan et al, hematology35). Additionally, Pucas (otolaryngology)32  and O'Neill et al (family medicine)14  were not able to predict who would fail the board examination based on their respective AUC analysis.

In terms of quality assessment of the articles included in this study, the average MERSQI score was 7.9 out of possible 13.5 points (range 7–9). This is within the range of reported MERSQI scores of medical education research more broadly.36  All the included studies were retrospective cohorts; no studies were randomized controlled trials.

Discussion

This systematic review finds there is generally strong evidence that strong trainee performance on ITEs is predictive of subsequent passing performance on specialty board examinations. However, there is limited evidence that poor performance on the ITE predicts subsequent failure on board examinations, which calls into question the appropriateness of programs using the ITE to make high-stakes decisions. These results are important, as performance on ITEs has been widely accepted as predictive of subsequent performance on specialty board examinations, with pervasive beliefs that low-scoring residents are at risk of failing their board examination, resulting in some specialties reporting high-stakes use of ITE performance.

National first-time pass rates for specialty board examinations are high across specialties, which makes it difficult to predict trainees who will fail the examination (Table 1). In a cohort of otolaryngology residents, even those who scored in the bottom 3 stanines for each of the 4 years they took the ITE still had an 82% pass rate on their board examination.32  If a nephrology program director simply predicted that all nephrology fellows would pass the nephrology board examination, they would be correct 89% of the time; using the ITE to make the same prediction, they would be correct 90% of time. This suggests that, despite correlations between ITE and board performance, prediction of board examination pass/fail using the ITE for an individual resident is of little practical benefit.26  Even residents who perform very poorly on the ITE have a reasonable likelihood of passing their board examination.

The studies that did find a significant outcome of failing may not generalize to all trainees taking that particular ITE; thus, those results may only be useful for the individual program since the studies that found a significant outcome of passing were more likely to use national samples of all residents and fellows. Additionally, since the number of trainees who fail an ITE is small, trying to accurately predict if all will end up failing their boards is statistically difficult since having just one of these trainees pass the board examination will greatly impact whether the outcome is significant. The number of trainees who pass the ITE is much larger so there is more wiggle room to accidently have a few fail the board examination and still find a significant outcome of predicting passing.

It is important to note the different formats of board examinations. Specialties including pediatrics, family practice, pathology, preventative medicine, neurology, internal medicine (and associated subspecialties), and psychiatry typically have 1 written examination that serves as the CE. Thus, evaluating the relationship between the ITE and CE in these fields may represent a more accurate comparison. Within surgical specialties, obstetrics and gynecology, ophthalmology, and anesthesiology there are 2 separate examinations. The QE is a written examination designed to evaluate knowledge in principles and applied science in a given specialty.37  The CE among these specialties is an oral examination with the intent of evaluating a candidate's clinical judgement, reasoning skills, and problem-solving skills.38  The ITE has limited ability to predict performance on oral board examinations. Additional tools that specifically assess application of knowledge and demonstration of clinical judgement in an oral format are needed to predict passage of oral CEs.

ITEs were originally developed as a formative assessment tool to assist learners and programs in identifying deficiencies in medical knowledge. Scores were meant to be used for no or low-stakes decisions and to guide development of individualized learning plans. To maintain the original intent of these examinations, further efforts at delineating “cut-scores” that predict board examination failure should not be undertaken. It remains similarly challenging to predict who will fail board examinations, with few studies designed to address this issue. Even if a significant fail outcome is found the predictive value is low. The paucity of data regarding ITE prediction of board examination failure suggests that program directors should exercise caution in the interpretation and use of low ITE scores at the individual resident level, particularly regarding high-stakes uses to inform formal academic actions (probation, repeating PGY, and requiring remediation) within a program. The majority of studies describe the use of ITE performance as low-stakes and formative for trainees or GME programs, with 2 (6%) studies in pediatrics and ophthalmology using the information for continuous program improvement.2,33  Three studies (9%) in pediatrics and general surgery describe moderate to high-stakes use of ITE performance, including decisions regarding formal academic actions.24  Finally, as expected, ITE performance increases with PGY. Therefore, when a resident is in their final year of training, when the correlations between ITE and board examination performance are strongest, it may be too late to help struggling residents “catch up” in time to pass board examinations.

This study has several limitations. First, the heterogeneity of the assessment instruments and specialties limited our ability to perform a pooled meta-analysis of the data. Furthermore, the studies included in this review vary in population size, from single institutions to a national review of how ITEs correlated with board examinations. There were also variations in study design, with some studies including data on interventions performed within a given residency versus large national data on how ITEs correlate with board examination scores. Future studies should involve national samples and investigate precision in predicting failing or passing board examinations utilizing other assessment data and contextual variables in addition to ITE scores.

Conclusions

This systematic review demonstrates that strong performance on ITEs is associated with passing subsequent board examinations, while the reverse is not necessarily true. Ultimately, this suggests that the GME community should continue to exercise caution and restraint in using ITE scores for moderate to high-stakes decisions.

References

References
1. 
Meissick
S.
Meaning and values in testing validation: the science and ethics of assessment
.
Educ Res
.
1989
;
18
(2)
:
5
11
.
2. 
Aeder
L,
Fogel
J,
Schaeffer
H.
Pediatric board review course for residents “at risk
. ”
Clin Pediatr (Phila)
.
2010
;
49
(5)
:
450
456
.
3. 
de Virgilio
C,
Yaghoubian
A,
Kaji
A,
Collins
JC,
Deveney
K,
Dolich
M,
et al.
Predicting performance on the American Board of Surgery qualifying and certifying examinations: a multi-institutional study
.
Arch Surg
.
2010
;
145
(9)
:
852
856
.
4. 
Jones
AT,
Biester
TW,
Buyske
J,
Lewis
FR,
Malangoni
MA.
Using the American Board of Surgery In-Training Examination to predict board certification: a cautionary study
.
J Surg Educ
.
2014
;
71
(6)
:
e144
e148
.
5. 
Cohen
J.
A power primer
.
Psychol Bull
.
1992
;
112
(1)
:
155
159
.
6. 
Reed
DA,
Beckman
TJ,
Wright
SM,
Levine
RB,
Kern
DE,
Cook
DA.
Predictive validity evidence for medical education research study quality instrument scores: quality of submissions to JGIM's Medical Education Special Issue
.
J Gen Intern Med
.
2008
;
23
(7)
:
903
907
.
7. 
Bedno
SA,
Soltis
MA,
Mancuso
JD,
Burnett
DG,
Mallon
TM.
The in-service examination score as a predictor of success on the American Board of Preventive Medicine Certification Examination
.
Am J Prev Med
.
2011
;
41
(6)
:
641
644
.
8. 
Collichio
FA,
Hess
BJ,
Muchmore
EA,
Duhigg
L,
Lipner
RS,
Haist
S,
et al.
Medical knowledge assessment by hematology and medical oncology in-training examinations are better than program director assessments at predicting subspecialty certification examination performance
.
J Cancer Educ
.
2017
;
32
(3)
:
647
654
.
9. 
Grabovsky
I,
Hess
BJ,
Haist
SA,
Lipner
RS,
Hawley
JL,
Woodward
S,
et al.
The relationship between performance on the infectious diseases in-training and certification examinations
.
Clin Infect Dis
.
2015
;
60
(5)
:
677
683
.
10. 
Indik
JH,
Duhigg
LM,
McDonald
FS,
Lipner
RS,
Rubright
JD,
Haist
SA,
et al.
Performance on the cardiovascular in-training examination in relation to the ABIM Cardiovascular Disease Certification Examination
.
J Am Coll Cardiol
.
2017
;
69
(23)
:
2862
2868
.
11. 
Kay
C,
Jackson
JL,
Frank
M.
The relationship between internal medicine residency graduate performance on the ABIM certifying examination, yearly in-service training examinations, and the USMLE Step 1 examination
.
Acad Med
.
2015
;
90
(1)
:
100
104
.
12. 
Kerfoot
BP,
Baker
H,
Connelly
D,
Joseph
DB,
Matson
S,
Ritchey
ML.
Do chief resident scores on the in-service examination predict their performance on the american board of urology qualifying examination?
J Urol
.
2011
;
186
(2)
:
634
637
.
13. 
Lohr
KM,
Clauser
A,
Hess
BJ,
Gelber
AC,
Valeriano-Marcet
J,
Lipner
RS,
et al.
Performance on the adult rheumatology in-training examination and relationship to outcomes on the rheumatology certification examination
.
Arthritis Rheumatol
.
2015
;
67
(11)
:
3082
3090
.
14. 
O'Neill
TR,
Li
Z,
Peabody
MR,
Lybarger
M,
Royal
K,
Puffer
JC.
The predictive validity of the ABFM's In-Training Examination
.
Fam Med
.
2015
;
47
(5)
:
349
356
.
15. 
Withiam-Leitch
M,
Olawaiye
A.
Resident performance on the in-training and board examinations in obstetrics and gynecology: implications for the ACGME Outcome Project
.
Teach Learn Med
.
2008
;
20
(2)
:
136
142
.
16. 
Carey
A,
Drucker
M.
Standardized training examinations among ophthalmology residents and the American Board of Ophthalmology written qualifying examination first attempt: the Morsani College of Medicine experience
.
J Acad Opthalmol
.
2014
;
7
(1)
:
e8
e12
.
17. 
Dougherty
PJ,
Walter
N,
Schilling
P,
Najibi
S,
Herkowitz
H.
Do scores of the USMLE step 1 and OITE correlate with the ABOS part 1 certifying examination?: a multicenter study
.
Clin Orthop Relat Res
.
2010
;
468
(10)
:
2797
2802
.
18. 
Ellis
E
3rd,
Haug
RH.
A comparison of performance on the OMSITE and ABOMS written qualifying examination. Oral and maxillofacial surgery in-training examination. American Board of Oral and Maxillofacial Surgery
.
J Oral Maxillofac Surg
.
2000
;
58
(12)
:
1401
1406
.
19. 
Klein
GR,
Austin
MS,
Randolph
S,
Sharkey
PF,
Hilibrand
AS.
Passing the boards: can USMLE and orthopaedic in-training examination scores predict passage of the ABOS part 1 examination?
J Bone Joint Surg Am
.
2004
;
86
(5)
:
1092
1095
.
20. 
Ponce
B,
Savage
J,
Momaya
A,
Seales
J,
Oliver
J,
McGwin
G,
et al.
Association between orthopaedic in-training examination subsection scores and ABOS Part I examination performance
.
South Med J
.
2014
;
107
(12)
:
746
750
.
21. 
Swanson
D,
Marsh
JL,
Hurwitz
S,
DeRosa
GP,
Holtzman
K,
Bucak
SD,
et al.
Utility of AAOS OITE scores in predicting ABOS Part I outcomes: AAOS exhibit selection
.
J Bone Joint Surg Am
.
2013
;
95
(12)
:
e84
.
22. 
Juul
D,
Schneidman
BS,
Sexson
SB,
Fernandez
F,
Beresin
EV,
Ebert
MH,
et al.
Relationship between resident-in-training examination in psychiatry and subsequent certification examination performances
.
Acad Psychiatry
.
2009
;
33
(5)
:
404
406
.
23. 
Juul
D,
Flynn
FG,
Gutmann
L,
Pascuzzi
RM,
Webb
L,
Massey
JM,
et al.
Association between performance on neurology in-training and certification examinations
.
Neurology
.
2013
;
80
(2)
:
206
209
.
24. 
Juul
D,
Sexson
SB,
Brooks
BA,
Beresin
EV,
Bechtold
DW,
Lang
JA,
et al.
Relationship between performance on child and adolescent psychiatry in-training and certification examinations
.
J Grad Med Educ
.
2013
;
5
(2)
:
262
266
.
25. 
Althouse
LA,
McGuinness
GA.
The in-training examination: an analysis of its predictive value on performance on the general pediatrics certification examination
.
J Pediatr
.
2008
;
153
(3)
:
425
428
.
26. 
Jurich
D,
Duhigg
LM,
Plumb
TJ,
Haist
SA,
Hawley
JL,
Lipner
RS,
et al.
Performance on the nephrology in-training examination and ABIM nephrology certification examination outcomes
.
Clin J Am Soc Nephrol
.
2018
;
13
(5)
:
710
717
.
27. 
Kempainen
RR,
Hess
BJ,
Addrizzo-Harris
DJ,
Schaad
DC,
Scott
CS,
Carlin
BW,
al. Pulmonary and critical care in-service training examination score as a predictor of board certification examination performance
.
Ann Am Thorac Soc
.
2016
;
13
(4)
:
481
488
.
28. 
Kim
PY,
Wallace
DA,
Allbritton
DW,
Altose
MD.
Predictors of success on the written anesthesiology board certification examination
.
Int J Med Educ
.
2012
;
3
:
225
235
.
29. 
McClintock
JC,
Gravlee
GP.
Predicting success on the certification examinations of the American Board of Anesthesiology
.
Anesthesiology
.
2010
;
112
(1)
:
212
219
.
30. 
Puscas
L.
Otolaryngology resident in-service examination scores predict passage of the written board examination
.
Otolaryngol Head Neck Surg
.
2012
;
147
(2)
:
256
260
.
31. 
Lingenfelter
BM,
Jiang
X,
Schnatz
PF,
O'Sullivan
DM,
Minassian
SS,
Forstein
DA.
CREOG in-training examination results: contemporary use to predict ABOG written examination outcomes
.
J Grad Med Educ
.
2016
;
8
(3)
:
353
357
.
32. 
Puscas
L.
Junior
otolaryngology resident in-service exams predict written board exam passage
.
Laryngoscope
.
2019
;
129
(1)
:
124
128
.
33. 
Johnson
GA,
Bloom
JN,
Szczotka-Flynn
L,
Zauner
D,
Tomsak
RL.
A comparative study of resident performance on standardized training examinations and the American Board of Ophthalmology written examination
.
Ophthalmology
.
2010
;
117
(2)
:
2435
2439
.
34. 
Babbott
SF,
Beasley
BW,
Hinchey
KT,
Blotzer
JW,
Holmboe
ES.
The predictive validity of the internal medicine in-training examination
.
Am J Med
.
2007
;
120
(8)
:
735
740
.
35. 
Monaghan
SA,
Felgar
RE,
Kelly
MA,
Ali
AM,
Anastasi
J,
Bellara
AP,
et al.
Does taking the fellowship in-service hematopathology examination and performance relate to success on the American Board of Pathology hematology examination?
Am J Clin Pathol
.
2016
;
146
(1)
:
107
112
.
36. 
Cook
DA,
Reed
DA.
Appraising the quality of medical education research methods: the Medical Education Research Study Quality Instrument and the Newcastle–Ottawa Scale-Education
.
Acad Med
.
2015
;
90
(8)
:
1067
1076
.
37. 
The American Board of Surgery.
General Surgery Qualifying Exam. 2019.
2020
.
38. 
The American Board of Surgery.
General Surgery Certifying Exam. 2019.
2020
.
39. 
The American Board of Allergy & Immunology.
Statistics. 2020.
2020
.
40. 
American Board of Anesthesiology.
2017 Examination Results
.
2019
.
41. 
American Board of Internal Medicine.
Initial Certification Pass Rates. 2018–2019.
2020
.
42. 
Hayag
MV,
Berman
B,
Weinstein
A,
Frankel
S.
American Board of Dermatology certification examination: preparation and perceptions
.
Arch Dermatol
.
2002
;
138
(4)
:
544
546
.
43. 
American Board of Emergency Medicine.
Application and Exam Statistics. 2019.
2020
.
44. 
American Board of Family Medicine.
2017 Family Medicine Certification Examination Pass Rates
.
2020
.
45. 
The American Board of Surgery.
General Surgery Exam Pass Rates. 2018.
2020
.
46. 
American Board of Medical Genetics and Genomics.
History of Pass Rates. 2011–2019.
2020
.
47. 
American Board of Psychiatry and Neurology.
Pass Rates for First-time Takers
.
2020
.
48. 
The American Board of Neurological Surgery.
Frequently Asked Questions 2020
.
2020
.
49. 
The Nuclear Medicine Technology Certification Board Inc.
Annual Examination Report 2018
.
2020
.
50. 
American Board of Oral and Maxillofacial Surgery.
Stats/Fees/Timelines
.
2020
.
51. 
American Board of Orthopaedic Surgery.
Part I examination statistics
.
2020
.
52. 
American Board of Orthopaedic Surgery.
Part II examination statistics
.
2020
.
53. 
The American Board of Pediatrics.
Initial Certifying Examination First-Time Taker Passing Rates. 2018.
2020
.
54. 
American Board of Physical Medicine and Rehabilitation.
Exam statistics 2020
.
2020
.
55. 
The American Board of Plastic Surgery Inc.
Statistics
.
2020
.
56. 
American Board of Radiology.
Initial Certification for Diagnostic Radiology. Scoring and Results.
2020
.
57. 
American Board of Radiology.
Initial Certification for Radiation Oncology. Qualifying Exam.
2020
.
58. 
American Board of Radiology.
Initial Certification for Radiation Oncology certifying (oral) exam
.
2020
.
59. 
American Board of Thoracic Surgery.
American Board of Thoracic Surgery Report to the Thoracic Surgery Directors Association. 2017.
2020
.
60. 
The American Board of Urology.
Qualifying examination (part 1)
.
2020
.
61. 
The American Board of Surgery.
Vascular Surgery pass rates. 2019.
2020
.

Author notes

All authors are with the University of Utah School of Medicine. Hilary C. McCrary, MD, MPH, is an Otolaryngology Resident, Department of Surgery; Jorie M. Colbert-Getz, PhD, MS, is Assistant Dean, Education Quality Improvement, and Associate Professor, Department of Internal Medicine; W. Bradley Poss, MD, MMM, is Associate Dean, Graduate Medical Education, and Professor, Department of Pediatrics; and Brigitte K. Smith, MD, MHPE, is Vice Chair of Education, Program Director, Vascular Surgery Fellowship, and Assistant Professor, Department of Surgery.

Funding: The authors report no external funding source for this study.

Competing Interests

Conflict of interest: The authors declare they have no competing interests.