Emergency medicine (EM) residency programs want to employ a selection process that will rank best possible applicants for admission into the specialty.
We tested if application data are associated with resident performance using EM milestone assessments. We hypothesized that a weak correlation would exist between some selection factors and milestone outcomes.
Utilizing data from 5 collaborating residency programs, a secondary analysis was performed on residents trained from 2013 to 2018. Factors in the model were gender, underrepresented in medicine status, United States Medical Licensing Examination Step 1 and 2 Clinical Knowledge (CK), Alpha Omega Alpha (AOA), grades (EM, medicine, surgery, pediatrics), advanced degree, Standardized Letter of Evaluation global assessment, rank list position, and controls for year assessed and program. The primary outcomes were milestone level achieved in the core competencies. Multivariate linear regression models were fitted for each of the 23 competencies with comparisons made between each model's results.
For the most part, academic performance in medical school (Step 1, 2 CK, grades, AOA) was not associated with residency clinical performance on milestones. Isolated correlations were found between specific milestones (eg, higher surgical grade increased wound care score), but most had no correlation with residency performance.
Our study did not find consistent, meaningful correlations between the most common selection factors and milestones at any point in training. This may indicate our current selection process cannot consistently identify the medical students who are most likely to be high performers as residents.
Residency programs want to employ a selection process that will rank best possible applicants for admission into the specialty. Residency selection factors have a largely unknown predictive correlation with training outcomes or have demonstrated poor predictive value.
A secondary analysis on residents trained from 2013 to 2018 in 5 residency programs to look for correlation between some program selection factors and milestone outcomes.
Stability of program leadership was not assessed, and it is possible intra-program differences in assessments were due to differences in assessors rather than resident specific attributes.
The study did not find consistent, meaningful correlations between the most common selection factors and residency milestones at any point in training.
The current residency selection process is a time-consuming, expensive venture for training programs and their departments.1 While training programs actively seek applicants who will succeed and thrive in residency, they also attempt to identify and avoid applicants who will require significant, dedicated, time-consuming resources to fulfill the minimum clinical and professional competency standards. Determining factors that are associated with thriving (or struggling) through training is so far an enigma, but still merit further investigation.
Residency program directors have long considered which metrics to use in an attempt to make reasoned selection decisions.2,3 These metrics often include standardized testing, clinical grades, and a residency interview process.3–8 These may become more important as there are changes in Step 1 scoring to pass/fail and the “pause” in the United States Medical Licensing Examination (USMLE) for clinical skills. Unfortunately, the predictive value for success in residency is generally very low based on these metrics, with the majority of positive correlations between standardized testing outcomes.7,9–14
One of these assessments is the Standardized Letter of Evaluation (SLOE) used in emergency medicine (EM).20 The SLOE provides a specialty-based norm-referenced global assessment of each student and their projected location on the program's match rank list.21 Other methods used to measure noncognitive aspects of applicant bias include the use of novel interview techniques,18,22 standardized letters of evaluation,23 and placing less weight on standardized examinations to reduce racial bias.24 Yet it is clear that bias continues to exist in each step of the selection process.25 All of these factors increase the priority of ensuring a selection process that is rapid, equitable, and reliable at selecting candidates who will be successful in residency.
Though the challenge of resident selection and metrics for selection has been a topic of repeated research, residency selection factors have a largely unknown predictive correlation with training outcomes or have demonstrated poor predictive value.11,26–28 The research has been limited by single institution or single training program studies,29 conducted in a single homogeneous region,30 limited to just a few core competencies,19 conducted over a short time period with few residents, or lacked standard outcome measures (ie, varying definitions and measures of success).31
The development of milestone assessments has provided a potential solution to the issue of non-standardized residency outcomes. Milestones as assessment tools were developed “from a close collaboration among the ABMS certifying boards, the review committees, medical-specialty organizations, program-director associations, and residents… to provide meaningful data on the performance that graduates must achieve before entering unsupervised practice.”32 In addition to the milestones' proposed benefits in enhancing residency education quality, patient safety, and driving innovation in graduate medical education, they were also designed to allow for “comparative data” across residency programs.32 The milestone assessment process has continued to undergo revision, reiteration, and validation to better represent the specific needs of each medical specialty.33–36
The objective of this study is to explore whether application and selection factors predict residents' performance in residency at the conclusion of the postgraduate year (PGY) 1 year. We reviewed factors utilized commonly in selection decisions as well as those factors previously identified to be predictive of success or remediation.19,29,30,37–39 As the milestone assessment was designed to provide for a standard generalizable outcome for residency performance across graduate medical education programs in the same specialty we have used them as our outcomes in this study.
Setting and Participants
The study uses secondary data from 5 EM residency programs. The combined dataset included all residents from the entering intern classes from 2010 to 2018. As the EM Milestones were first published in 2012, the outcome data ranges from academic years 2013 to 2018. The EM Milestones were updated in 2015, but there were no substantive changes to the prompts and no changes to the actual milestones aside from their order listed.
Selection factors in the model were gender, underrepresented in medicine (UiM) status; USMLE Step 1 and 2 CK; AOA awards, grades in EM, medicine, surgery, and pediatrics; advanced degree; SLOE global assessment; and rank list position. These were selected based on the available literature about what EM residency program directors used in their decision-making process as well as demographic identities that may correlate with bias in applicant ranking. Gender, UiM status, and AOA awarded were measured as binary factors. Standardized coefficients were used in the calculations and reporting of results for all continuous measures. Models were initially created that included clerkship grades received as categorical variables. These were compared with alternative models that treated these same variables as continuous with each categorical shift treated as an increase of 1 point. Ultimately, the latter was utilized to make comparisons across so many models and variables feasible. As was done with similar studies in EM in the past,40 interview scores were considered but were not thought to be generalizable by program since each program uses different processes and scoring rubrics for interviews. Rank list is believed to correlate with interview performance; however, given the inclusion of the other factors also thought to correlate with rank list (grades, step scores, etc), it represents unaccounted for decision-making made by program directors based on interview performance and other non-recorded factors.41 Controls for training start date and specific residency program were also included. A variable for PGY-1–PGY-3 vs PGY-1–PGY-4 format was considered but was dropped as it was colinear with the individual residency program identifier. The primary outcomes of study were milestone level achieved in each of the core competencies after year 1 of training. Milestones were measured in 0.5 increments, which allowed for scoring in-between the competency anchor statements. Given the fluidity of scoring between anchoring categories the outcome core competencies were treated as a continuous variable for analysis.
Analysis of the Outcomes
Multivariate linear regression models were fitted for each of the 23 competencies (table 1) with comparisons made between each model's results. This resulted in a total of 23 regression models used in this study. Each individual variable's coefficients for all core competencies were divided into 6 regression coefficient plots (patient care and non–patient care core competency by year).42 Given the multiple comparisons,43 a Bonferroni correction for family-wise error rate and a Benjamini–Hochberg false discovery rate were also calculated and provided for comparison as a more conservative estimate of potential correlation.
Institutional Review Board review was solicited at the primary site (where the centralized databased was housed and statistical analysis performed) and all other participating residency programs. The study was determined to be exempt from further review in all cases. Data use agreements were created between the primary site and all other residency sites for deidentified data transfer.
A full account of the 5 participating residency programs revealed 418 individuals for which demographic data were available. Individuals whose milestone records were not available limited the sample size to 329. Resident subjects with data in all 12 selection variables (plus 2 control variables: training start date and individual residency program identifier) dropped the sample size to 213 (table 2). Demographic information on residents included in the initial study group are found in table 2. Variables including Step 1 and 2 CK, clinical grades, and rank list were not associated with EM residents' performance after the first year of residency (figure 1). Having an advanced degree prior to the onset of residency training had a small negative partial correlation (-0.19, 95% CI -0.34 to -0.05) with ICS1 (Patient Centered Communication, figure 2). SLOE global assessment had a small positive partial correlation (0.08, 95% CI 0.01–0.16) with PC11 (Anesthesia and Acute Management) after year 1 (figure 1). USMLE Step 2 had a small positive partial correlation with MK (0.01, 95% CI 0–0.02). No other significant partial correlations were found between the selection criteria and core competencies after year 1.
The results reported above were based on independent analysis for each milestone. This represented the most generous number of potential partial correlations in our dataset. Given that multiple comparisons were made as part of the statistical analysis of each core competency, there does exist an increased possibility of a false positive inference. We utilized a Bonferroni correction based on the 23 milestones assessed in each PGY outcome to obtain a more conservative P value necessary for statistical significance given our approach (P = .002 from 05/23). Following this correction, the partial correlation between having an advanced degree prior to the onset of residency training and ICS1, SLOE and PC11, and Step 2 and MK no longer reached the level of statistical significance.
All 23 regression models with coefficients (partial correlations) of all variables included are available as online supplemental material. Significant differences in competency scores were also identified between programs and between the specific intern class year studied; however, these were used as controls and were not the focus of the research study (provided as online supplemental material).
Virtually none of the traditional metrics used in residency selection correlated with milestone performance in the first year of residency. The only partial correlation that survived using statistical corrections for multiple comparisons was the one between USMLE Step 2 and MK. Of note the absolute effect was small with an increase of a standardized deviation on USMLE Step 2 score resulting in an increase in the MK milestone rating of 0.08 points when all other factors were held constant. As the milestone ratings are generally applied in 0.5 increments, more than a 5 SD change would be required to make a practical score change. While “negative studies” often receive little consideration, the most important findings of this study are not what partial correlations were found between selection factors and milestone outcomes but instead their significant absence. These findings demonstrate the ongoing challenge with resident selection in that there is no single factor which independently predicts success (or failure) in graduate medical education training.
Gender and UiM status remain complex factors in resident selection and future residency success. We included both of these factors in the model to examine our data for signs of bias in scoring. We did not find significant differences between men and women or between UiM and non-UiM trainees in milestone scores within the first year of residency in our data set. In terms of gender differences in other similar studies, a meta-review by Klein et al reported that in 5 of the 9 studies they examined, “a difference in outcomes attributed to gender including gender-based differences in traits ascribed to residents, consistency of feedback, and performance measures” was found.44 This included articles by Dayal et al, where a significant gender gap in assessments that continued until graduation was seen,45 Rand et al, where male internal medicine residents scored higher than female residents in 6 of 9 categories,46 and Mueller et al who found qualitative differences in the content of feedback by attendings to female EM residents.47 However, a more recent study that incorporated national data from the Accreditation Council for Graduate Medical Education did not find clinically significant differences based on gender.48
The interview itself, while not directly included in our study, has also been found to be poorly predictive of training outcomes.7,12 Residency interviews can also be costly and engender greater bias in selection.1,12,49–51 As far back as 1979, we can identify an argument by Keck et al that above a certain threshold, traditional cognitive academic criteria have likely reached saturation in predicting those capable of completing medical training and that noncognitive factors such as personality and artistic and social achievement need to be considered.52 We have not yet identified the “secret sauce” for graduate medical education training success; however, we can continue to strive for residency application metrics that more accurately predict training success and/or more granular measures of residency performance.
Several important limitations exist in this study. First, while this is the largest cohort studying this issue the authors could find and represents programs spread throughout the country, it still represents only a small portion of the population. It is possible that a larger or different cohort could find different partial correlations. While the programs themselves remained the same, the stability of program leadership was not assessed, and it is possible intra-program differences in assessments were due to differences in assessors rather than resident-specific attributes. Second, while milestone ratings are designed to be a universally applied form of resident outcomes, they are still surrogates for total resident performance and may not be applied in the same way across all residency programs or fully represent the breadth of resident abilities or markers for success. Finally, we conducted multiple comparisons with and without controlling for the potential increase in error generation. When a more conservative standard controlling for multiple comparisons was included (a Bonferroni correction), the significant partial correlations disappeared. In constructing this article, we have included both approaches to provide the most transparent description of how we arrived at our conclusions regarding the lack of predictive accuracy of selection factors on residency outcomes.
Despite efforts to increase standardization of EM clerkship grading and objective assessment of residents with specific measures and prompts, there do not appear to be residency selection factors that partially correlate with resident success during intern year. These findings add to the literature that residency application data which predicts performance in residency remains elusive.
Editor's Note: The online version of this article contains regression models with coefficients and competency scores.
Funding: This research was supported in-part by a grant from the University of Michigan Graduate Medical Education Innovation Fund.
Conflict of interest: The authors declare they have no competing interests.