ABSTRACT
Residency selection integrates objective and subjective data sources. Interviews help assess characteristics like insight and communication but have the potential for bias. Structured multiple mini-interviews may mitigate some elements of bias; however, a halo effect is described in assessments of medical trainees, and degree of familiarity with applicants may remain a source of bias in interviews.
To investigate the extent of interviewer bias that results from pre-interview knowledge of the applicant by comparing file review and interview scores for known versus unknown applicants.
File review and interview scores of applicants to the University of Ottawa General Surgery Residency Training Program from 2019 to 2021 were gathered retrospectively. Applicants were categorized as “home” if from the institution, “known” if they completed an elective at the institution, or “unknown.” The Kruskal-Wallis H test was used to compare median interview scores between groups and Spearman's rank-order correlation (rs) to determine the correlation between file review and interview scores.
Over a 3-year period, 169 applicants were interviewed; 62% were unknown, 31% were known, and 6% were home applicants. There was a statistically significant difference (P=.01) between the median interview scores of home, known, and unknown applicants. Comparison of groups demonstrated higher positive correlations between file review and interview scores (rs=0.15 vs 0.36 vs 0.55 in unknown, known, and home applicants) with increasing applicant familiarity.
There is an increased positive correlation between file review and interview scores with applicant familiarity. The interview process may carry inherent bias insufficiently mitigated by the current structure.
Objectives
To identify the extent to which prior knowledge of applicants influences interviewers during residency selection in a multiple mini-interview (MMI) format.
Findings
Interview scores of applicants better known to the residency program correlated more strongly with file review scores than applicants who had no previous interactions with the program.
Limitations
Small sample size, difficulty quantifying the degree to which applicants were truly familiar to their raters, and variability in scoring systems across the country limited generalizability.
Bottom Line
Residency interview processes may suffer from biases that may result in discrepancies between how known and unknown applicants are scored and ultimately selected.
Introduction
The Canadian residency selection process, overseen by the Canadian Resident Matching Service (CaRMS), follows common guidelines; however, the scoring systems and methodology used for selection are variable across programs. The process is comprised of at least 2 common phases—the file review and the interview. Each applicant's file includes their curriculum vitae (CV), personal statement, academic records of rotations completed, and letters of reference. In Canada, applications do not include standardized test results, and medical schools only disclose grades in pass/fail form. Files are reviewed and scored using criteria that is unique to each program. Based on the generated application scores, only those above a determined threshold are invited to an interview. The style of interview and scoring is at the program's discretion. The results of the interview, file review, and any other tool used in the selection process are then collated and a rank list is generated.
Several studies have demonstrated a strong correlation between file review, interview scores, and final rank.1 That said, studies have also demonstrated large variations in rank list depending on the stage at which interview scores are incorporated in the process.2 The effect of blinding the interviewer to applicant data, such as CV, academic records, and choice of electives, has consistently been shown to decrease correlation between interview and file review scores. Up to 30% of the variance in interview scores is influenced by grades and standardized test results in US studies,3 suggesting that factors external to the interview may bias the process.
Applicants and programs place importance on the interview to aid in residency selection.4,5 The interview is an opportunity to assess factors, including communication, insight, motivation, and compassion, among other personal factors that are thought to be predictors of success. Programs that emphasize the importance of these subjective criteria have reported higher degrees of satisfaction with selection processes6 ; however, evidence to support interview performance as a predictor of residency performance is limited.7 There are data to support that familiarity with applicants in residency interview settings and oral examination settings may contribute to inflated assessments,8,9 although to our knowledge these studies have not been replicated in the setting of a structure multiple mini-interview (MMI).
A review of interview formats in medical school and residency selection demonstrates great heterogeneity, but certain factors, including structured questions, multiple observers, rating scales, rater training, and blinding to cognitive application data, are thought to improve reliability.10 The MMI is one such interview format with good feasibility, reliability, and predictive value because it employs multiple raters, which is thought to decrease the effect of any one interviewer's personal biases.11 However, concerns have been raised about the MMI process and its perceived tendency to favor certain personality traits (eg, extroversion) and interviewees with an understanding of the local cultural norm.12
As such, the objectives of this study were to investigate whether, and if so to what extent, interviewer bias results from knowing applicants, and to provide a framework that would allow other programs to assess for bias within their own processes. To do so, file review and interview scores for known and unknown applicants applying to a Canadian general surgery residency program were compared. Given that file review is best at assessing more objective achievements such as productivity in research and academic awards, while interviews can more easily assess skills such as communication and collaboration, we hypothesized that the correlation would likely be weak. Therefore, if any pre-existing positive or negative sentiment toward an applicant biased the rating, we would expect to see higher levels of correlation between these scores. Determining the effect of bias on interviewers will inform our institution's residency selection process and identify strategies to mitigate these biases for future CaRMS cycles.
Methods
Setting and Participants
This retrospective cohort study included all Canadian medical graduates (CMGs) and international medical graduates (IMGs) in the 2019-2021 CaRMS cycles applying to the University of Ottawa General Surgery Residency Training Program—an urban, university-based program with 32 residents. There were 6 available residency spots each year over the 3 years, which consisted of 14 CMG and 4 IMG positions. Applicants were categorized as “home” applicants if they were enrolled in the University of Ottawa's undergraduate medical education (UME) program, “known” applicants if they were enrolled in another institution's UME program but had completed an elective in general surgery at our institution, or “unknown” if they had not.
The data for each applicant in the 2019-2021 CaRMS cycles was gathered retrospectively from the CaRMS database at our institution. Data included the candidate's name, their home school, whether they had completed an elective in Ottawa, whether they were a CMG or IMG applicant, file review score, and interview scores. All data was de-identified prior to analysis to ensure the anonymity of applicants who may be known to some of the study authors. The de-identification was performed by one author (D.D.), who is the program administrator and who already had access to the data given her role within the program. She does not have a role in the scoring of the file review or interview processes, nor does she contribute to resident rankings, acceptance, or assessments. One study author (C.T.) extracted the following information from the de-identified CaRMS database for each applicant: (1) CMG or IMG status; (2) enrolled at our institution (yes/no); (3) completed elective at our institution (yes/no); (4) file review score; and (5) interview score.
Selection Process
To generate file review scores, teams of resident and staff surgeons evaluate each applicant's personal statement, CV, letters of reference, and elective experience using a rubric. The file review process also considers residents' and staff surgeons' feedback about their experience working with any of the applicants on elective. This informal feedback was not gathered in 2021 so as not to disadvantage students whose electives were cancelled due to the COVID-19 pandemic. The applicants who score the highest on the file review are offered an interview, and then the final rank list is calculated using a weighted proportion of the interview and file review scores.
Interview scores are generated through an MMI, where applicants rotate through structured stations. There are 8 to 10 stations per year which have been designed to highlight attributes identified by the program as essential to success as a surgical resident. Raters (2 per station: one senior resident and one staff surgeon) are asked to score applicants using a standardized rubric. Rater training is performed on the day of the interview to promote reliability of scoring. Use of the full scale and the importance of assessing only the applicant's interview performance are reviewed. The MMI score is calculated by averaging both raters' scores to generate a station score and adding up the values for each station. Since the total number of MMI stations differed each year, in order to compare the interviews scores across all 3 cycles for the purposes of this study, the sum of each station's score was then converted to a score out of 100. Interviewers do not have access to the applicant files or aggregate file review score during the interview; however, some interviewers may have participated in the file review process. During the 2021 cycle, interviews were conducted on a virtual video platform; otherwise, the format was unchanged.
Statistical Analysis
Descriptive statistics were used to report the medians and interquartile ranges for file review and interview scores for the “home,” “known,” and “unknown” applicant groups. The Kruskal-Wallis H test was used to compare the median interview scores between the 3 groups of applicants. A comparison of the median file review scores was not conducted because in the CaRMS 2019 and 2020 cycles, points were afforded to applicants who had completed an elective in Ottawa unlike in 2021 due to the COVID-19 pandemic. Spearman's rank-order correlation (rs) was calculated to determine the correlation between file review and interview scores for each group. Interpretation of Spearman's rs was based on precedents set in psychology literature, seeing as education research is similarly focused on human factors as compared to clinical research. Accordingly, <0.3 is considered a weak association, 0.4-0.6 is a moderate association, and >0.7 is a strong correlation.13 P values <.05 were considered statistically significant. SPSS 25 (IBM Corp, Chicago, IL) software was used for all analyses.
Ethics approval was waived by the Ottawa Health Science Network Research Ethics Board because the primary purpose was to identify bias within our selection process to inform and enable quality improvement.
Results
The study cohort included 169 applicants who were interviewed between 2019 and 2021. The majority were “unknown” applicants (62%, 104 of 169), while “known” (31%, 53 of 169) and “home” applicants (6%, 12 of 169) were the minority. The 2021 cycle was a clear outlier due to the COVID-19 pandemic, when 88% (51 of 58) of applicants were unknown compared to previous groups where the known to unknown ratio was close to 1:1 (Table 1). In 2020 and 2021, of the 9 home applicants who applied to the program, all 9 (100%) were interviewed. Of the 34 known applicants, 28 (82%) were interviewed. Of the 244 unknown applicants, 78 (32%) were interviewed. These data were not retained from 2019. File review scores of the home applicants were above the cut-off for those offered an interview.
There was a statistically significant difference (H=8.51, P=.01) between the median interview scores, with home applicants scoring highest with a median (IQR) of 76.0 (13.8) compared to known applicants at 73.0 (10.0) and unknown applicants at 68.0 (10.3) out of a total 100 (Table 2). Comparing interview and file review scores, unknown applicants had a weak positive correlation between scores (rs=0.15, P=.14). The strength of the positive association was greater with increasing familiarity between applicants and program, with known applicants showing a moderate-weak correlation (rs=0.36, P=.006) and home applicants showing a moderate correlation (rs=0.55, P=.06) between interview and file review scores (Table 2).
Discussion
This study demonstrates that there is greater correlation between file review and interview scores for applicants who are known to our program and that the extent of correlation increases with increasing level of familiarity. It also demonstrates that there is a statistically significant difference in interview scores between “home,” “known,” and “unknown” applicants, with “home” applicants scoring the highest. Identifying applicant familiarity as a source of bias may help optimize residency selection processes. To our knowledge this is the first study isolating the potential impact of familiarity on the residency selection process when an MMI interview is used.
Residency interviews are supposed to measure different characteristics of the applicants as compared to the academic dossier.10 The ideal interview format has not been established, but current literature suggests that the interview should contribute to the final rank list by identifying and assessing applicants on specialty-specific traits.10
The MMI has been proposed as a solution to certain rater bias found in unstructured interviews14 ; however, the MMI has also been found to be susceptible to bias.11,12,15 Recently, authors have suggested that local institutions should attempt to collect validity evidence for their own MMI.15 Our study isolates applicant familiarity as a confounding factor that contributes to higher levels of correlation between file review and interview scores, which suggests that this bias may dilute the value of the interview in residency selection. This is in keeping with the well-established halo effect phenomenon, where a rater's overall perception of an applicant impacts the assessment of their attributes, and has been demonstrated at many levels of medical student and resident assessment.16,17 This study provides a framework for programs to analyze their file review and interview scores and determine Spearman's rs, as it may provide clarity on the degree of halo effect intrinsic to their process. The online supplementary data provides a sample data set programs can use to perform their own analysis.
Unblinded interviews, where interviewers access applicant files, have higher levels of correlation between file review components and interview scores as compared to blinded interviews.3,18,19 In our MMI, although some interviewers may have been involved in the file review process in preceding months, they do not have direct access to the applicants' files. Overall, our interview process is reflective of a structured and blinded interview format, which according to existing literature should mitigate the risk of rater bias.
Positive bias toward familiar applicants in interviews is partially mitigated by the file review process where a more directed and analytic comparison is performed.9 Higher interview scores noted in home applicants is in keeping with previous studies that suggest a degree of positive bias toward better known applicants.16,17,20 Known applicants who are viewed favorably (as evidenced by higher file review scores) do seem to score higher as compared to strong unknown applicants, suggesting that familiarity may yield a positive bias in interview scores. Higher interview scores among known and home applicants may also be secondary to the applicants' greater understanding of the program's cultural norms.
Assessment of unknown applicants is particularly relevant for the upcoming selection cycle given that in Canada visiting medical student electives continue to be on hold. The dynamics of the applicant pool has shifted as a result of these changes, with a small in-group of students who are very well known and a very large out-group of students who will interact with the program only virtually.21 Performance during visiting electives has been identified as the single most important variable in the selection process by certain programs.22,23 Our study shows that elective rotations can give an advantage to strong applicants. However, elective rotations are costly, so equity is a concern,24 and the criteria used by programs to attribute them remained poorly defined.25 As programs have been asked to revisit their selection process,26 they should be aware of the potential risk of bias linked to the familiarity with the known applicants and the implications on diversity, equity, inclusion, and justice. Efforts to decrease bias in the interview process have shown that rater training and raising awareness regarding implicit bias can help decrease their influence.27
Limitations of our study relate to the small sample size, especially of our “home” students. This has been mitigated by pooling data from 3 years when the selection process was comparable. Given the small number of home students, demographic information (ie, age, gender) was not collected, as it would have been identifying, which also limited the potential to conduct further statistical analyses. Analysis of correlation tests carries intrinsic limitations and specific difficulties given that even minor correlations can be statistically significant. In this study, the comparison of increasing strength of correlation between each of the 3 groups provides the most meaning to this analysis. Further limitations included an inability to quantify the degree to which applicants were truly familiar to their raters. With respect to generalizability of this framework, we do note that, even though all programs have access to the same information, internal scoring systems may vary.
Efforts to reduce bias in favor of known applicants could include de-identifying files for review and integrating anecdotal feedback regarding applicants after interview scores have been collected to decrease interviewer bias. Future studies should investigate the impact of additional mitigation measures on interviewer bias, as well as study the degree to which secondhand knowledge of an applicant may bias interviews.
Conclusions
The degree of familiarity with applicants to a residency program corresponded to higher levels of correlation between file review scores and interview scores during the residency selection process. There were statistically significant higher interview scores seen among the “home” applicants most familiar to the residency program.
References
Author notes
Editor's Note: The online version of this article contains a sample data set programs can use to perform their own analysis.
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.