ABSTRACT
The US Medical Licensing Examination (USMLE) Step 1 and Step 2 scores are often used to inform a variety of secondary medical career decisions, such as residency selection, despite the lack of validity evidence supporting their use in these contexts.
We compared USMLE scores between non–chief residents (non-CRs) and chief residents (CRs), selected based on performance during training, at a US academic medical center that sponsors a variety of graduate medical education programs.
This was a retrospective cohort study of residents' USMLE Step 1 and Step 2 Clinical Knowledge (CK) scores from 2015 to 2020. The authors used archived data to compare USMLE Step 1 and Step 2 CK scores between non-CR residents in each of the eligible programs and their CRs during the 6-year study period.
Thirteen programs enrolled a total of 1334 non-CRs and 211 CRs over the study period. There were no significant differences overall between non-CRs and CRs average USMLE Step 1 (239.81 ± 14.35 versus 240.86 ± 14.31; P = .32) or Step 2 scores (251.06 ± 13.80 versus 252.51 ± 14.21; P = .16).
There was no link between USMLE Step 1 and Step 2 CK scores and CR selection across multiple clinical specialties over a 6-year period. Reliance on USMLE Step 1 and 2 scores to predict success in residency as measured by CR selection is not recommended.
The US Medical Licensing Examination (USMLE) Step 1 and Step 2 are designed as measures of acquired knowledge intended to inform medical licensure decisions, but they are often used to inform a variety of secondary medical career decisions, for which they lack validity evidence, such as chief resident selection.
A comparison of USMLE scores between non–chief residents and chief residents at a US academic medical center.
Study was conducted at a single institution over a 6-year period with one method of selecting chief residents, limiting generalizability.
There was no link between USMLE Step 1 and Step 2 Clinical Knowledge scores and chief resident selection across multiple clinical specialties over a 6-year period.
Introduction
The US Medical Licensing Examination (USMLE) Step 1 and Step 2 are designed as norm-referenced measures of acquired knowledge intended to inform medical licensure decisions.1 However, USMLE Step 1 and Step 2 scores are also used to inform a variety of secondary medical career decisions, predominantly postgraduate residency selection by program directors, with almost two-thirds of postgraduate medical education programs reporting use of minimum cutoff scores.2,3 Such secondary use may stem from convenience, the ability to compare applicants across schools,4 and the belief that USMLE performance is correlated with success in residency. The medical education community recognizes that performance on USMLE Step 1 evaluates medical knowledge and predicts performance on other standardized tests, such as in-training and certification examinations.5–10 However, there are few data to suggest that USMLE Step 1 and 2 scores predict performance across other critical competency domains, such as patient care, communication, and professionalism.11–14
The Association of American Medical Colleges, American Medical Association, Educational Commission for Foreign Medical Graduates, Federation of State Medical Boards, and National Board of Medical Examiners convened a multistakeholder Invitational Conference on USMLE Scoring (InCUS) in March 2019 to address issues concerning USMLE score reporting, primary and secondary uses, and the transition from undergraduate medical education to graduate medical education (GME).15 InCUS addressed such themes as the unintended consequences of USMLE Step 1 scoring, including its impact on medical student well-being and concerns about racial differences in USMLE performance. After the conference, a summary statement and a list of recommendations were published, including the need to decrease the emphasis on USMLE scores in residency selection and to pursue additional research on the relationship of USMLE scores with actual measures of residency performance.15 In February 2020, the NMBE announced that USMLE Step 1 scoring would be changed to pass/fail no earlier than January 2022.15,16
Selection as chief resident (CR) by program faculty or peers is a marker of exceptional achievement during residency education.17,18 Medical historian Kenneth Ludmerer, MD, asserts that such recognition is the “crown jewel of graduate medical education.”19 While little published literature is available that reports how chief residents are selected in a wide range of programs, anecdotal information suggests that these positions are offered to residents with exceptional leadership, clinical, and academic skills. These residents are likely highly regarded by their peers, coworkers, and program administration. Therefore, we used chief resident status as a proxy for a highly successful resident. Assuming this is the kind of candidate a program director would want to select when reviewing applications, the aim of this study was to compare USMLE scores between non-CRs and CRs at a US academic medical center that sponsors a wide range of GME programs.
Methods
Setting and Participants
This is a retrospective cohort study20 of Step 1 and Step 2 Clinical Knowledge (CK) scores of residents in their final year of training conducted at the McGaw Medical Center of Northwestern University from 2015 to 2020. The McGaw Medical Center is a consortium of urban, suburban, specialized, and general hospitals sponsoring GME programs in metropolitan Chicago. McGaw currently has 26 residency programs accredited by the Accreditation Council for Graduate Medical Education. For this study, programs (and their residents) were included if (1) the training programs operated throughout the study period, and (2) the programs designated some, but not all, of their upper-level residents to be CRs. For example, programs in which all of the final-year residents were given the title of CR were excluded. All data were collected as part of routine operational activities, and results were reported in aggregate with no individual examinee identified.
Outcome Measures
We used data archived in the McGaw GME office to compare USMLE Step 1 and Step 2 CK scores between non-CR residents in each of the eligible programs and their CRs during the 6-year study period. Each resident was included only once for the analysis; although, he or she may have completed multiple years of training. All of the program directors from the eligible programs agreed with the statement that their CRs were selected based on academic and clinical performance.
The study was deemed exempt from review by the Northwestern University Institutional Review Board.
Analysis
In 2017, the average USMLE scores for first takers from the United States and Canada were 229 (SD = 20) for Step 1 and 242 (SD = 17) for Step 2 CK.21 If we hypothesized an estimated 5-point difference between non-CRs (based on the national averages) and CRs, a sample size of 1356 (1159 non-CRs, 197 CRs) has 90% power to detect a difference between a non-CR Step 1 score of 229 and a CR Step 1 score of 234, assuming a common SD of 20 using an independent 2-sided t test with an α of .05. Similarly, for Step 2, a sample size of 978 (836 non-CRs, 142 CRs) has 90% power to detect a difference between a non-CR Step 2 CK score of 242 and a CR Step 2 CK score of 247, assuming a common SD of 17 using an independent 2-sided t test with an α of .05. This 5-point difference was meant as a conservative estimate for sample size. Given data from the USMLE website,21 the 5-point difference approximately represents the difference between the 47th and 56th percentiles for Step 1 and the 42nd and 54th percentiles for Step 2. A larger point differential would require a smaller sample size in order to detect differences. We compared overall mean USMLE Step 1 and 2 scores between non-CRs and CRs using independent samples t tests. Multiple mean comparisons were also made by training programs using independent samples t tests with a Bonferroni correction. All statistical analyses were performed using Stata 14 (StataCorp LLC, College Station, Texas).
Results
Thirteen of 26 current McGaw residency programs (57%) and their trainees qualified to be part of the analysis (Table 1). Five programs (family medicine [2], interventional radiology, thoracic surgery–integrated, and vascular surgery–integrated) were excluded because they did not exist for the entire study period. The remaining 8 programs (child neurology, family medicine, neurological surgery, ophthalmology, otolaryngology, plastic surgery, radiation oncology, and urology) were excluded because program leadership did not select CRs or used a hybrid approach. In these programs, each of the graduating residents served as a CR for part of his or her final year.
Overall, McGaw programs enrolled a total of 1334 non-CRs and 210 CRs over the 6-year study period. These residents were drawn from 144 unique US medical schools and 23 international medical schools. Residency program size ranged from 11 (dermatology) to 118 (internal medicine) trainees annually. The number of CRs ranged from 2 to 5 per year.
USMLE Step 1 scores ranged from 185 to 273 for non-CRs and from 200 to 274 for CRs. USMLE Step 2 CK scores ranged from 199 to 286 for non-CRs and from 198 to 283 for CRs. The Figure shows the spread, quartiles, and outliers of all non-CR and CR USMLE Step 1 and Step 2 CK scores. There were no significant differences overall between non-CRs' and CRs' average USMLE Step 1 (239.81 ± 14.35 versus 240.86 ± 14.31; P = .32) or Step 2 CK (251.06 ± 13.80 versus 252.51 ± 14.21; P = .16) scores.
USMLE comparisons within each program revealed similar results. Using Bonferroni adjusted α levels of .0019 per test (.05/26 comparisons), scores between groups were similar overall (Table 2).
Discussion
This study shows that there was no link between USMLE Step 1 and Step 2 CK scores, and selection as a CR across multiple clinical specialties over a 6-year period. Because of the lack of a link between USMLE scores and CR selection, this study confirms findings from earlier research11 that standardized examination scores are not a proxy for excellence in competencies such as communication and patient care.
Overreliance on USMLE scores in the residency selection process is the result of several factors. First, these standardized examination scores can be compared between applicants, while grades and rankings from various schools are challenging to compare. Additional factors include an increased number of medical students in the residency Match and the fact that applicants are applying to more programs than in prior years.22 Even with the future change in scoring of Step 1 to pass/fail,16 changing these patterns will not be easy. We believe that a good first step would be to develop an examination that measures the qualities educators seek in residency applicants rather than the ability to memorize facts. One possibility is to consider a score for the Step 2 Clinical Skills examination that measures a variety of skills, such as history taking, communication and interpersonal skills, and diagnostic reasoning. Validity evidence for this approach exists; a large study of internal medicine residents showed that Step 2 Clinical Skills communication and interpersonal skills scores predicted performance during the first year of training.23
This study has several limitations. First, it was conducted at a single institution over a 6-year period with 1 method of selecting CRs. These findings should be assessed at other centers of GME to determine generalizability. Not all residency programs were included, although the programs in our study reflect the results of 70% of US medical school seniors in the 2020 Match.24 Additionally, we acknowledge that Northwestern residents performed higher than the average of all first-time USMLE test takers on Steps 1 and 2 CK. It is possible that a threshold score might differentiate residency performance, but that our residents exceeded this score. It is also not known if additional residents were approached to serve as CRs and declined. Although this is conceivable, the residents in this study were high performing, and no differences between these residents' and their peers' USMLE Step scores were found overall. Finally, we used selection for CR as an outcome measure and did not consider other variables, such as quality of clinical care provided, performance in simulated procedures and skills, and evaluations from clinical team members, such as nurses and patients. Further study is needed to assess these variables and their association with USMLE Step scores.
Conclusions
USMLE scores were not associated with selection as CR across 13 programs over a 6-year period. Reliance on USMLE scores to predict markers of success in residency, such as CR selection, is not recommended due to the findings of this study.
References
Author notes
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
The authors would like to thank all of the dedicated residents at the McGaw Medical Center of Northwestern University.