Background Aligning resident and training program attributes is critical. Many programs screen and select residents using assessment tools not grounded in available evidence. This can introduce bias and inappropriate trainee recruitment. Prior reviews of this literature did not include the important lens of diversity, equity, and inclusion (DEI).
Objective This study’s objective is to summarize the evidence linking elements in the Electronic Residency Application Service (ERAS) application with selection and training outcomes, including DEI factors.
Methods A systematic review was conducted on March 30, 2022, concordant with PRISMA guidelines, to identify the data supporting the use of elements contained in ERAS and interviews for residency training programs in the United States. Studies were coded into the topics of research, awards, United States Medical Licensing Examination (USMLE) scores, personal statement, letters of recommendation, medical school transcripts, work and volunteer experiences, medical school demographics, DEI, and presence of additional degrees, as well as the interview.
Results The 2599 identified unique studies were reviewed by 2 authors with conflicts adjudicated by a third. Ultimately, 231 meeting inclusion criteria were included (kappa=0.53).
Conclusions Based on the studies reviewed, low-quality research supports use of the interview, Medical Student Performance Evaluation, personal statement, research productivity, prior experience, and letters of recommendation in resident selection, while USMLE scores, grades, national ranking, attainment of additional degrees, and receipt of awards should have a limited role in this process.
Introduction
Misalignment of graduate medical education (GME) resident and program attributes is associated with poor resident performance, dissatisfaction, and attrition.1-3 However, the resident recruitment process is complicated and opaque.4,5 Though best practices for identifying applicants who will meet program expectations during GME training has received attention, selecting optimal candidates and predicting resident performance remains challenging, prompting bilateral dissatisfaction, turnover, and occasional dismissal.6,7 Many programs select residents using assessments not grounded in available evidence.8 This creates potential for bias and misalignment of candidates with programs, and portends poor defense of these selection strategies if challenged.9-11
The objective of this study was to critically examine evidence associated with elements of the US residency application process regarding selection and future performance of matriculants. The intention is that education leaders will use this information to review and update their recruitment practices consistent with the most recent evidence.12,13 Systematic review methodology was selected over other approaches to integrative scholarship to comprehensively address our research question, given the goal to “identify, critically appraise, and distill” the existing literature on this topic.14,15
Methods
A search strategy was developed in conjunction with a medical librarian (T.K.) to capture elements of resident selection criteria and educational outcomes. Comprehensive searches were conducted in Ovid MEDLINE, Ovid Embase, ERIC, Web of Science, and the Cochrane Central Register of Controlled Trials on March 30, 2022. A combination of controlled vocabulary and keywords was used along with truncation and adjacency operators. No date, language, or publication type restrictions were used. The full search strategy is included in the online supplementary data. Although a health care education-focused systematic review would usually include health professions outside medicine, those studies were not included given the focus on outcomes specific to residents in the United States.
A systematic review was then conducted concordant with PRISMA guidelines using Covidence software.16 All aspects of the review were performed manually with no computerized automation of review employed. Inclusion criteria were created through an iterative research team consensus to examine studies investigating the alignment of outcomes for US residents with information available through the Electronic Residency Application Service (ERAS) and interviews. All team members participated in publication screening to identify those addressing the research question. Two team members reviewed each work for inclusion, with conflicts adjudicated by a third. Following screening, each included study was again reviewed and coded by 2 researchers based on ERAS application metrics (research, awards, United States Medical Licensing Examination [USMLE] scores, personal statement, letters of recommendation [LORs], medical school transcript, work and volunteer experience, medical school demographics, and presence of additional degrees). An additional code was applied to studies investigating the impact of ERAS elements on diversity, equity, and inclusion (DEI). These were identified either as explicitly stating they were examining DEI or by their investigation of recruiting those underrepresented in medicine (UIM). The studies associated with each metric were then reviewed in detail and a narrative synthesis generated. Most studies investigated multiple domains and thus were included in the review and synthesis of all associated metrics. Interrater reliability was calculated with Cohen’s kappa using Covidence.
“Holistic review” is defined here as it is by the Association of American Medical Colleges (AAMC) as “mission-aligned admissions or selection processes that take into consideration applicants’ experiences, attributes, and academic metrics as well as the value an applicant would contribute to learning, practice, and teaching.”17
Results
The search returned 3360 abstracts for screening from which 761 duplicates were removed. Of the remaining 2599 abstracts, 2215 were excluded as irrelevant to the study question. A total of 383 full-text articles were reviewed by 2 reviewers with a third review required for 62 of these (50 removed). Overall, 152 were excluded due to misalignment with study outcome, design, or setting. Ultimately, 231 were included in the final review (online supplementary data).18 Interrater reliability was moderate with an average Cohen’s kappa of 0.53.
Included studies were published between 1978 and 2023. General concepts or multiple specialties were examined in 73 studies (32.7%). Among specialty-specific work, most were in surgical specialties followed by internal medicine, emergency medicine, and radiology (Table 1).
USMLE Step 1 and 2 Clinical Knowledge Scores as Criteria
Conclusions regarding the association of Step 1 and 2 Clinical Knowledge (CK) scores with performance metrics are widely mixed. Table 2 provides a summary of the associations between USMLE and UIM recruitment, specialty board outcome, in-training examination (ITE) scores, and clinical performance.
Medical school deans have identified that the transition of Step 1 to pass/fail may increase reliance on Step 2 CK to filter applications.71 Only 3 low-quality studies were identified to support a specific Step 2 CK score cutoff for this purpose. While 225 on Step 2 CK is the highest reported associated with improved ITE or board examination performance, this number is of little value given the yearly variability in mean and passing scores.25,26,35
Medical School Grades as Criteria
While some articles in this review noted an association between medical school grades and resident performance in residency,48,72,73 others were equivocal.12,74,75 One group of retrospective studies found that clerkship grades were not predictive of clinical performance in residency.1,27,36,43-45,54,65,76-80 In contrast, other studies found an association.8,33,37,51,53,60,61,81,82 One study examining pediatric intern performance found that a model containing number of clerkship honors grades, LOR strength, medical school ranking, and attainment of a master’s degree explained 18% of the variance in residents’ performance on Accreditation Council for Graduate Medical Education (ACGME) Milestones at the end of internship,61 with the remaining variance unexplained by academic variables. Likewise, academic performance in medical school was found to be associated with residency ITE27,77,78 and board scores,78,81 though the correlation was weak.78 Other studies found no such relationship.26,65 The evidence regarding the association between medical student academic problems and resident performance is also equivocal. While an association was identified between “red flags” in an emergency medicine clerkship (deficiencies in LORs or written comments from clerkship rotations) and negative outcomes in residency,52 other studies found no significant associations between problematic outcomes in residency and medical school academic performance.3,38,45,83
Notably, most studies examining grades as predictor variables were carried out at single institutions.2,27,33,36,43,44,50,51,53,61,65,77-82,84,85 As outcome measures differed across studies, results may not be generalizable.51 In addition, resident performance was often defined subjectively and determined at the end of residency,37,60,78 undermining the predictive capability of grades. At least 2 studies cautioned that range restrictions likely affected results, given the competitive nature of their programs.2,65 Several studies were conducted before ACGME competencies were introduced,50,65,77,79-82,86 and thus cannot be easily compared with more recent studies utilizing Milestone assessments as outcomes.84
Clerkship grades are frequently used to differentiate residency applicants. Many authors have noted the variability of grading systems37,87 and criteria for honors grades,88,89 precluding accurate comparison of applicants across medical schools.45,87,88,90 In addition, significant variability exists across clerkships within and between institutions.90 Concerns regarding the influence of instructor bias on grades has also been noted.87,91 One study found that race and ethnicity were significantly associated with core clerkship grades.91 Due to inconsistency in grading and grading systems,87 clerkship grades may not be a reliable metric for comparison of students across institutions53,87,88,90 or offer an unbiased representation of performance.91
Medical Student Performance Evaluations as Criteria
Most studies examining the Medical Student Performance Evaluation (MSPE) are descriptive and single-institutional.92 These demonstrate that inconsistencies remain in how medical schools apply the AAMC’s standardized MSPE template when reporting overall medical student performance,83,87,93 normative comparisons such as class rank and grading nomograms,93-95 or appendices.94 Furthermore, discourse analysis of MSPE text suggests the presence of bias associated with MSPE authorship,96,97 medical school region,97 and applicants’ demographic characteristics.96,97 Reporting of clerkship grades in MSPEs is more consistent across medical schools in retrospective studies93,95 as is accuracy of Alpha Omega Alpha (AOA) awards.98 However, one report noted that 30% of top 20 U.S. News & World Report medical schools did not report grades in MSPEs as compared to 10% of other schools, which may reflect medical schools’ transition to competency-based assessment.88 The dearth of MSPE literature provides no38,65 to low positive correlational evidence3,49,62,83,99-107 between MSPE content and downstream resident performance. Possible MSPE predictors of suboptimal performance during residency include remediation and course failures,3,83 medical school leave of absence,51 negative comments in MSPE,3,83 and lower-class rank.3 For instance, in a 20-year retrospective case-control study, 40 psychiatry residents with performance or professionalism concerns during and post residency were included. Of these, 30 were classified as having minor issues where their performance fell below program performance standards but successfully remediated, and 10 residents/graduates classified as having major issues requiring severe program or external governing body action. When compared to 42 matched controls, the 40 who underperformed had more negative MSPE comments, especially the 10 with major performance deficits.83 Total number of clerkship honors reported in MSPE provided low, positive correlational evidence for chief residency status.51 Another retrospective study of anesthesiology residents showed weak, positive correlations between medical school class rank and satisfactory clinical performance, passing ITEs, publishing one peer-reviewed article, and entering academic practice.37 Importantly, the extent to which medical schools underreport the weaknesses of their graduates is unknown. An older study identified a 34% prevalence of underreporting events such as leaves of absence and failing grades in MSPEs as compared to school transcripts.99
Letters of Recommendation as Criteria
A recent study suggests that structured LORs and standardized letters of evaluation provide more objective and actionable information than traditional narrative LORs.49 The structured letters also show improved interrater agreement among readers, and wider use across grading categories, thus enhancing their discriminating power.8,100
LORs are inherently subjective and therefore subject to bias. Many studies have examined whether LORs are systematically biased based on gender,101,102 UIM status, or other criteria with mixed results. Some studies show no gender bias, while others show bias toward male applicants and others toward female applicants. There is more consistent evidence for bias against UIM applicants in LORs.103
There is little evidence that LORs predict success in training or subsequent practice, except in limited ways. The strongest evidence for the predictive value of the LORs regards the professionalism and humanistic characteristics of applicants.54 Compared with standardized test scores and medical school grades, LORs are better predictors of clinical performance during training.27
Personal Statements as Criteria
Personal statements are generally valued by resident selection committees. Most surveyed program leaders note that personal statements are at least moderately important in selecting who gets an interview, assigning rank order, or for assessment during interviews. However, this review found no studies associating personal statements with outcomes during GME training.1,74 Their evaluation shows relatively poor interrater reliability, even between evaluators from the same training program.105
Program leaders who value personal statements tend to use them to assess communication skills and personality.107 Brevity, precise language, and original thought are considered favorable attributes. Most believe the personal statement is the appropriate place to explain potentially concerning application elements.104 Problems with personal statements include deceptive or fabricated information, the opportunity for influence from implicit bias, and plagiarism.108-110
Medical School Ranking or Affiliation as Criteria
Adequate data to support the use of U.S. News & World Report medical school ranking in a residency application screening tool were not identified in this review.111 There was mixed evidence surrounding whether this ranking is associated with resident clinical performance. One study of radiology residents found that the perceived prestige of the applicant’s medical school did not predict resident performance.74 The tier of medical school was also not significantly associated with anesthesiology resident performance on any examination, clinical outcome, likelihood of academic publication, or academic career choice.37 In one retrospective study of 46 otolaryngology graduates, a weak correlation was found between rank (in deciles) of the medical school attended and subjective performance evaluation by clinical faculty.112 The authors speculated that residents who attend top-ranked medical schools were a highly select group and thus could predict future success. They also noted their findings may be hampered by affinity bias because they typically enroll students from their affiliated medical school which is ranked in the top decile. There was no statistically significant difference between average ITE scores among students who attended medical school at the same or a different institution as their orthopedic residency (n=60 residents, 2 programs).46
Additional Degrees as Criteria
Few studies have examined whether having an additional advanced degree of any type, other than MD/DO, predicts success during residency. Multivariate analysis did not show an association between advanced degree and higher ratings on multisource assessments, higher ITE score, or odds of passing board examinations.45 Having an advanced degree was associated with higher patient communication scores.45 In one study, anesthesiology residents with additional degrees performed at similar levels as their peers on most outcomes, but tended to be rated lower on clinical performance.37
Research Experience as Criteria
Previous research experience is a readily quantifiable metric in the ERAS application. However, this review did not find associations between resident performance outcomes and research experiences prior to residency across various specialties.37,45,46,63,65 Several studies showed weak to moderate correlations between the number of research publications completed prior to application and those completed during residency.113-115 One manuscript found applicants with more first-author publications prior to residency were more likely to pursue fellowship, have a higher h-index (an author-level metric that measures the productivity and citation impact of publications), and publish more during and after residency.115 This review also identified several studies finding applicants with publications prior to residency were more likely to pursue an academic career.115-117
Volunteer and Work Experience as Criteria
A 7-year retrospective cohort study of 110 residents showed no association between volunteerism and clinical performance. However, one study found an association between having a career prior to medical school for at least 2 years and competency in interpersonal and communication skills and systems-based practice.125 Excellence in athletics, specifically in a team sport, was associated with otolaryngology faculty assessment of clinical excellence,112 clinical performance and completion of general surgery residency,44 and selection as chief resident in radiology. A stronger association was noted among college and elite athletes.51 In one study of an anesthesiology residency program, a negative association was found between leadership experience and ITE and board examination performance. Also, service experience was associated with lower ITE scores.37
There is a paucity of data regarding the strength of association between personal and professional commitment to service and clinical performance in residency. Prior excellence in a team sport may align with success in training.112 No study was identified that evaluated the association of service to underresourced communities, membership in medical school affinity groups, health care, or nonprofit work experience with performance in residency.74
Medical School Honors and Awards as Criteria
There is mixed evidence regarding the association between AOA membership and residency clinical performance in multiple specialties. AOA membership was associated with higher faculty-defined clinical performance evaluations in anesthesiology and orthopedics programs37,78,126 and selection as a chief resident (OR=6.63, P=.002).33 However, AOA award was not predictive of performance in multiple other specialties.1,43,49,50,54,65,74,80,112 A retrospective review of internal medicine applications demonstrated strong associations with selection (P=.0015), but not with performance in residency as determined by faculty evaluations.79
The association between AOA membership and performance on ACGME Milestones across multiple specialties is equivocal. Although AOA membership was associated with the top third of resident performers, defined by ACGME competencies in 9 emergency medicine programs,60 it was not associated with first year performance in emergency or internal medicine, or with professionalism.7,53,84,127 AOA status had a negative correlation with patient care Milestones.61
Evidence from 2 orthopedics studies suggest an association between AOA and passing or higher scores on the ITE, with conflicting evidence on Board examination outcomes.26,33,46 Studies from internal medicine and general surgery suggest an association of AOA with Board examination performance.26,81 No relationship was found between AOA and faculty assessment of technical skills in general surgery,65 or selection for achievement awards.70 As noted below, multiple studies have demonstrated a significant bias against UIM applicants for AOA induction (OR 0.16, 95% CI 0.07-0.37).21,22,128-130
This review found a paucity of evidence related to Gold Humanism Honor Society (GHHS) membership and performance in residency. A prior literature review reported a lack of data regarding its impact on ophthalmology residency selection.74 A retrospective review of internal medicine residents found a positive association of GHHS with Milestone performance in medical knowledge.84
The Interview as a Criterion
This review found mixed evidence regarding resident interviews as predictors of performance.48,131 Of the studies reviewed, 24 of the 25 (96%) articles analyzed data collected during a pre-COVID-19, in-person process. One review that examined the virtual interview experience of residency programs both before and during the COVID-19 pandemic found that faculty and applicant feedback was variable.132
One finding shared by several studies is that structured interviews, in which all applicants are asked the same standardized, job-related questions linked to desired program traits, are more likely to predict resident performance than unstructured, conversational interviews.74,133,134 Another finding was that multiple factors can potentially bias interview scores, such as interviewer knowledge of board scores and other academic metrics,66 as well as applicant physical appearance.67,135 Applicants’ attractiveness can bias interview evaluations and invitations, especially for women applicants.136 Reported associations between interviews and resident performance are provided in Table 3.
Diversity, Equity, and Inclusion
USMLE scores, AOA membership, clinical grades, and LOR were identified to be affected by gender, racial, and ethnic bias.22,91,97,129,130 Reliance on these metrics was found to reduce the number of UIM individuals selected for residency interviews.97,128 Three studies found that holistic review of applications is an effective strategy to reduce bias and increase UIM representation.22,143,144 Specific strategies that were reported effective included de-emphasizing USMLE Step 1 scores, AOA membership, and grades. Also, some studies reported that bias was reduced by developing selection criteria that include individual applicant experiences and attributes to supplement academic achievement.21,22,67,128,135,143,144
Assessment of applications is subject to the introduction of reviewer biases and substantially impacts the resident selection process.9 Therefore, understanding the role of bias is inextricably interwoven with other factors in resident selection. Several studies recommend implicit bias training for those reviewing residency applications, including training to detect bias in letters of recommendation.21,22,135,143,144 Such training is associated with recognizing discrimination, personal awareness of bias, and engagement in equity-promoting behaviors.144 This review did not find any study that analyzed whether training for reviewers is effective in increasing resident diversity. One study found that personal awareness of implicit bias mitigated its effect in the selection process, even without additional training.145
Discussion
The findings of this review suggest there is minimal evidence aligning residency performance with USMLE score, grades, U.S. News & World Report ranking, attainment of additional degrees, technical skills assessment, and receipt of awards. As such, these elements may be appropriate for a limited role in the assessment of applicants. The MSPE, personal statement, research productivity, prior experience, and LORs may be incorporated in applicant review, with attention to their known limitations. Interviews should be structured, consistent, and include rater training and bias mitigation.
The best-studied parameter in this review is the interview, although limited by the absence of interview format description in most studies and minimal tracking of resident performance over time. While studies were identified to support the association between interview ratings and resident performance, it is evident that potential for bias is high. Studies reviewed did not examine potential biasing factors other than gender, such as race, ethnicity, marital or parental status, and sexual orientation. It is important to acknowledge and mitigate biases for UIM applicants.146 Supplemental assessments such as situational judgment tests are valuable and cost-effective but require significant effort and expertise to create.140,141
Holistic review of residency applications represents an effective strategy to reduce bias and increase UIM representation.22,143,144 Holistic review allows admissions committees to consider the whole applicant, rather than disproportionately focusing on any one factor. The AAMC recommends a 2-step holistic review process in which a program first identifies the experiences, attributes, and academic metrics that align with its goals and values, and then determines how to measure those they have identified.17
USMLE Step 1 and 2 CK scores are frequently cited as criteria for resident screening. Although Step 1 is now reported only as pass or fail, some applicants still have numeric scores on their applications. Given the prior reliance on Step 1 scores, it is likely the numeric score on Step 2 CK will replace Step 1 as a screening metric.
The results of this review should be interpreted in the context of its focus on recruitment and selection practices for US GME training programs. Though ample literature addresses resident recruitment and selection in international settings, the distinctive features of training in the United States informs that focus of this review.147-150 Additionally, an extensive body of research has been developed on recruitment practices for other health and nonhealth professions. However, these articles were not included due to the addition of many potential confounders.149-153
Limitations
A significant limitation of this study was the inability to provide summary statistical analysis of the findings. Given the significant heterogeneity of data, including numerous specialties, institutions, and methodologies, such analysis would not be accurate or meaningful. Further, most studies were single-institution and used small samples making extrapolation of results difficult even when pooled. Future research should include larger, multi-institutional studies that can more effectively examine the association between recruitment metrics and residents’ performance outcomes across institutions.
Conclusions
This review provides education leaders a summary of the available literature as they consider resident recruitment practices. Though many studies within this systematic review have examined the strength of association between ERAS application criteria and resident performance outcomes, well-designed research is sparse, and results regarding application criteria are mixed.
References
Editor’s Note
The online version of this article contains the full search strategy used in the study and the PRISMA summary.
Author Notes
Funding: The authors report no external funding source for this study.
Conflict of interest: The authors declare they have no competing interests.