ABSTRACT
While program director (PD) letters of recommendation (LOR) are subject to bias, especially against those underrepresented in medicine, these letters are one of the most important factors in fellowship selection. Bias manifests in LOR in a number of ways, including biased use of agentic and communal terms, doubt raising language, and description of career trajectory. To reduce bias, specialty organizations have recommended standardized PD LOR.
This study examined PD LOR for applicants to a cardiology fellowship program to determine the mechanism of how bias is expressed and whether the 2017 Alliance for Academic Internal Medicine (AAIM) guidelines reduce bias.
Fifty-six LOR from applicants selected to interview at a cardiology fellowship during the 2019 and 2020 application cycles were selected using convenience sampling. LOR for underrepresented (Black, Latinx, women) and non-underrepresented applicants were analyzed using directed qualitative content analysis. Two coders used an iteratively refined codebook to code the transcripts. Data were analyzed using outputs from these codes, analytical memos were maintained, and themes summarized.
With AAIM guidelines, there appeared to be reduced use of communal language for underrepresented applicants, which may represent less bias. However, in both LOR adherent and not adherent to the guidelines, underrepresented applicants were still more likely to be described using communal language, doubt raising language, and career trajectory bias.
PDs used language in a biased way to describe underrepresented applicants in LOR. The AAIM guidelines reduced but did not eliminate this bias. We provide recommendations to PDs and the AAIM on how to continue to work to reduce this bias.
We examined program director (PD) letters of recommendation (LOR) for applicants to a cardiology fellowship program to determine the mechanism of how bias is expressed and whether the 2017 Alliance for Academic Internal Medicine (AAIM) guidelines reduce bias.
Bias against underrepresented in cardiology (URC) applicants was expressed in all types of LOR through different forms of language use, but letters following AAIM guidelines appeared to have reduced use of communal language, possibly representing less bias.
This was a study at a single cardiology fellowship program, and the coders were not blind to race or gender.
While language is used in a biased pattern toward URC applicants in all types of LOR, there are opportunities to reduce this bias, including anti-bias training, expansion of AAIM guidelines, and widespread adoption of AAIM guidelines for PD LOR.
Introduction
Despite the well-documented bias in program director (PD) letters of recommendation (LOR) against people who are underrepresented in medicine,1–4 these letters persist as an important factor in fellowship selection.5–7 The Alliance for Academic Internal Medicine (AAIM) put forth guidelines in 2017 to standardize the PD LOR to decrease bias and increase quality, but it remains unclear whether these standardizations reduce bias (provided as online supplementary data).8 In-depth analyses could determine how and where the language of bias appears in LOR and whether standardization mitigates this bias.
The presence of bias through language in LOR prior to the AAIM guidelines is well documented. In medical student LOR, descriptive words differ based on race and gender. Men and White applicants are more likely to be described as “exceptional” or “leaders,” Black applicants as “competent,” and women as “empathetic” or “compassionate.”9–11 These patterns persist when applicants apply to fellowship.12
Bias can manifest in many ways in LOR. First, authors use agentic and communal terms differently to describe applicants. These describe 2 interconnected fundamental qualities of human existence. Agency reflects concerns about meeting one's own needs (eg, behaviors of leadership and confidence), while communalism reflects concerns with interpersonal issues (eg, behaviors of empathy and interpersonal skills).13 In LOR, men and White applicants tend to be described using agentic terms, while women and people of color tend to be described using communal terms. Communal terms used in LOR negatively affect hiring in academia, despite controlling for objective measures of productivity and performance.14,15 Second, bias manifests through the use of doubt raising language to describe underrepresented applicants. Examples of doubt raising include negative language, hedges (eg, he appears to be motivated), and faint praise (eg, she is better than average).13 Third, bias can manifest through career trajectory bias, where non-underrepresented groups are described as researchers or professionals, while underrepresented groups are described as students.13 These patterns reflect current societal racial and gender stereotypes as well as a long history of highly prevalent bias against women and people of color entering the health professions.16–19
Specialty organizations have recognized issues with traditional open narrative LOR and have recommended standardized LOR with predetermined elements.20–22 We have observed variable adherence to these guidelines by internal medicine PDs, despite the AAIM guidelines for standardized LOR.8 This study examined PD LOR for applicants to a highly regarded cardiology fellowship program to explore how bias is expressed, including mechanisms for the expression of bias and potential mitigation of bias by the AAIM guidelines.
Methods
Study Design
This was a directed qualitative content analysis,23 which we chose because the study involved interpreting meaning from text data and because theory and prior research already existed about bias. We examined LOR for applicants selected to interview in 2019 and 2020 at a cardiology fellowship program ranked within the top 20 in the world by US News & World Report. We chose this program in a single quaternary care teaching hospital to reduce variability and to provide a large pool of LOR for underrepresented in cardiology (URC) applicants and non-URC applicants. URC was defined as self-identified Black, Latinx, and female applicants (as extracted from their ERAS application). We included women in our definition of URC applicants because in the United States in 2018, only 25% of first-year cardiology fellows were women, and 11.6% of cardiology fellows self-identified as underrepresented in medicine by race/ethnicity.24,25
Data Source
We selected a convenience sample of LOR to obtain an even distribution of URC applicants, as well as LOR adhering to AAIM guidelines (LOR-AAIM) and LOR not adhering to AAIM guidelines (LOR-NonAAIM). One author (A.Q.) reviewed all letters for applicants chosen to interview and categorized them as LOR-AAIM based on the presence of key sections from the AAIM guidelines. LOR were not considered if they inconsistently completed recommended sections by AAIM guidelines of program description, achievement in core competencies, and overall assessment. This categorization was reviewed by 2 coders (N.Z., S.B.) without disagreements. All LOR-AAIM were included in the study as well as all LOR-NonAAIM for Black and Latinx applicants. Finally, a comparable number of random LOR-NonAAIM were selected with slight oversampling of letters for URC applicants. Identifying features other than race and gender were anonymized by 2 of the authors (A.Q., D.A.) who were not involved in the coding process. The study was approved by the University of California, San Francisco Institutional Review Board. The 2017 AAIM guidelines for standardized LOR are provided as online supplementary data.8
Analysis
A PubMed literature review for bias in medical LOR identified key concepts for preliminary coding categories (including agentic vs communal terms, doubt raising, and career trajectory bias). Initial exploratory coding was performed reviewing the letters with the generation of new codes until theoretical saturation.26 A codebook was created and iteratively refined. Codes regarding structure of LOR and format of evaluative comments were developed during coding. Our final codebooks can be found in the online supplementary data. A primary coder (N.Z.), an Asian man and internal medicine resident, and a second coder (S.B.), a White woman and congenital cardiology fellow, who were non-experts in bias, coded all of the transcripts in Dedoose (SocioCultural Research Consultants, Los Angeles, CA). Because transcripts were anonymized, coders could not identify if they had ever interacted with any applicants. Coders were not blind to race/gender and intentionally looked for supporting and non-supporting evidence of bias. A senior author (P.O.) who analyzed alignment of selected quotes with themes was blinded to race/gender. Disagreements in coding were resolved by consensus. Data were analyzed using outputs from these codes, analytical memos maintained, and themes summarized.
Results
Fifty-six LOR were studied. Figure 1 provides a distribution of letters by compliance with guidelines, gender, and if URC. We had more LOR-NonAAIM than LOR-AAIM due to sample constraints. We purposefully oversampled LOR-NonAAIM for URC applicants. We had more LOR-AAIM for non-URC applicants due to sample constraints, including no letters from Latinx applicants.
In LOR-NonAAIM, PDs typically described an applicant's pre-residency story, scholarly contributions, clinical performance, and overall assessment. PDs often included a fifth section: special attributes, such as personal characteristics, contributions to residency, passion for education, or unique background. In LOR-AAIM, PDs wrote letters with 5 sections consistent with the guidelines. When discussing a resident's achievement in the core competencies section, PDs used different strategies: (1) providing only numerical ratings for each core competency; (2) separate narrative description for each core competency; (3) separate description for each core competency mixing quotations and narrative; (4) combined description of all core competencies in a single narrative with a separate section for quotations; and (5) combined narrative description of all core competencies with no quotations. All 6 core competencies were rarely addressed when descriptions were combined in narrative form or when quotations were used to describe competencies. The scholarly contributions, personal characteristics/skills, and performance-related extensions in training sections were completed inconsistently.
We identified 3 themes from these LOR: what and where agentic and communal language were used, doubt raising, and career trajectory bias. Each theme will be described as follows.
Agentic and Communal Language: What and Where
What:
We identified different patterns of agentic and communal language use based on presence (whether terms were used), mechanism of delivery (whether language was used in narrative descriptions or evaluative quotations), and location (where the language was used in LOR). Regarding presence, both agentic and communal language were used to describe all applicants in both letter formats; all letters had at least one instance of both types of language, though typically multiple instances. However, URC applicants were described more frequently using communal language whereas non-URC applicants were described more frequently using agentic language. This pattern remained similar for both LOR-NonAAIM and LOR-AAIM (Table 1). The mechanism of delivery for communal language occurred through both PD narrative description and selected attending quotations from residents' evaluations. Examples of these 2 formats are in Table 1.
Where:
The location of communal language varied in LOR-NonAAIM and LOR-AAIM.
LOR-NonAAIM:
Communal language was used in the clinical performance, special attributes, and overall assessment sections. Throughout the clinical performance section, PDs relied on communal language to describe URC applicants because they focused on these applicants' interpersonal skills. In the special attributes section, PDs discussed personality traits for URC applicants, especially communal characteristics, compared to non-URC applicants. In some cases, the entire paragraph only described communal characteristics, focusing the reader on these attributes.
On a personal level, X has a calm demeanor that places patients at ease. His friendly smile conveys his desire to help the patient… He has an unending enthusiasm for medicine and a positive attitude that resonated with his peers. (4: Black man, special attributes)
For non-URC applicants, these narrative paragraphs were often about a passion for education or unique background. Beyond being an excellent researcher, leader, and clinician, X is a well-known, outstanding teacher, having received excellent reviews for teaching medical students and residents. (16: Asian man, special attributes)
The overall assessment paragraph ending most NonAAIM letters included a description of the most notable aspects of each applicant. In URC applicants, these sentences described and focused attention on communal characteristics as opposed to agentic characteristics.
In summary, we are delighted to present X to you for consideration for your rigorous fellowship in cardiology. X is an exceptional young physician who has excelled in every stage of her medical career. She is energetic, compassionate, and committed… Her enthusiasm, dedication, and warm personality have been valued assets to our department. (6: White woman, overall assessment)
In summary, X is a compassionate and conscientious physician who has shown aptitude and research throughout her career. She is an energetic dedicated clinician who is an outstanding communicator and a pleasure to interact with due to her enthusiasm for all she does. (5: Black woman, overall assessment)
PDs tended to describe non-URC applicants in the overall assessment with agentic characteristics.
In summary, X is a highly motivated and extremely bright outstanding young physician. His engineering background, commitment to academic pursuits, and superior clinical acumen make him well poised to become a leader in cardiac electrophysiology. (15: Asian man, overall assessment)
LOR-AAIM:
In LOR-AAIM, communal language appeared in the core competencies, personal characteristics, and overall assessment sections. Communal language was used less for URC applicants in these structured LOR as compared to LOR-NonAAIM.
For the core competencies, PDs often used communal language in the patient care and interpersonal sections, but rarely in the medical knowledge, systems-based practice, practice-based learning and improvement, and professionalism sections. This confined use contrasted with PDs who used communal language throughout LOR-NonAAIM. Non-URC applicants continued to be described primarily with agentic language.
X involves all members of the clinical care team effectively. He communicates well with consultants, nurses, primary care providers, patients, and families. He has a unique ability to connect with patients on a personal level when they are at their most vulnerable. (34: Asian man, narrative from interpersonal and communication skills)
Not all PDs included the personal characteristics/skills portion of the LOR-AAIM. Similar to the special attributes paragraphs from LOR-NonAAIM, these paragraphs tended to focus on communal characteristics of URC applicants as compared to non-URC applicants.
X has a warm, welcoming demeanor that helps him connect with patients… his peers consider him a great role model of compassionate care and repeatedly comment about his kindness towards team members, patients, and everyone around him. (24: Black man, personal characteristics)
However, in contrast to LOR-NonAAIM, when PDs did include a description of personal characteristics, they typically also included a description of skills mastered beyond residency requirements.
In the overall assessment section, PDs focused on communal characteristics when describing URC applicants. The final 2 sentences of the LOR from the following excerpt focus on 2 communal characteristics, humility and integrity, by calling attention to them as the applicant's “strongest characteristics.”
She will impress you with her compassion and kindness, as well as with her powerful intellect and reasoning skills. Humility and integrity are her strongest characteristics; she is highly receptive to feedback and never needs to be told anything twice. (29: White woman, overall assessment)
Doubt Raising
The 3 kinds of doubt raising found in both letter formats were hedging, faint praise, and negative language. Doubt raising was less common than the ubiquitous use of agentic and communal language.
The few examples of faint praise and hedging only occurred in letters for URC applicants. In the excerpt below from a LOR-NonAAIM, 2 sentences raise doubt. First, the discussion of the applicant's difficulty with the electronic record and lack of interest in general medicine is seemingly resolved by the next sentence which describes his improvement with feedback. Second, the sentence discussing his newfound insight into how individual patients differ from those in trials is an example of faint praise, as these are insights most applicants glean in medical school.
Early in internship, X was challenged by the extensive amount of clinical data presented in the electronic record and the necessity to focus his management plans in areas outside his interest in cardiology. He improved with feedback from our academic hospitalist team, and he developed excellent work habits to help him prioritize and streamline his problem list… Over time, he gained understanding about how the individual patient may differ from patients in research trials, especially from the psychosocial or socioeconomic aspect. (4: Black man, narrative from clinical skills)
In the excerpt below from a LOR-AAIM, the bolded phrase is an example of faint praise that suggests the applicant does not complete all required tasks in a timely manner.
“X frequently completes most required tasks within the expected timeframe including documentation, responding to calls from teammates and patients as well as completing required documentation and paperwork for administrative purposes.” (10: Black man, quote from professionalism)
In both LOR-NonAAIM and LOR-AAIM, we found a common interaction in letters for URC applicants with the use of communal terms framed negatively, whereas for non-URC applicants, communal terms tended to be framed positively. This occurred in instances when applicants were described using both agentic and communal language within the same narrative, typically linked by a conjunction or preposition which served to frame the communal characteristic as negative (eg, “but,” “despite”) or positive (eg, “and”). In the following excerpt from a LOR-AAIM, the applicant is described as a person who does not call attention to herself (ie, humble, a communal characteristic) and is intelligent (agentic characteristic). The conjunction “but” subtly casts the humble descriptor as negative language and also serves to broadly undervalue this communal characteristic:
“X is not the type of resident who calls attention to herself, but her medical knowledge, commitment to patients, and work ethic are readily apparent.” (17: Asian woman, narrative paragraph about core competencies)
In contrast, a non-URC applicant is described as both intellectual (an agentic term) and compassionate (a communal term): “X is simultaneously a compassionate caregiver and an intellectually curious scientist” (8: White man, scholarly contributions). The conjunction “and” serves to elevate both characteristics as positive. Table 2 provides additional examples of conjunctions and prepositions as doubt raising devices.
Bias in Career Trajectory
In both LOR-NonAAIM and LOR-AAIM, while PDs tend to describe URC applicants as earlier in their career, non-URC applicants were described as advanced in their career. One non-URC applicant is not only described as having a future career in academic medicine, but also is described with active verbs that frame him as a researcher.
X has already demonstrated an interest in cardiology and research that shows he will be successful in a future career in academic medicine. While in medical school, he conducted research to improve the quality of care for patients with A… He designed an analysis that measured [this quality]… He identified areas for QI… (16: Asian man, scholarly contributions)
In contrast, an URC applicant is described using passive and weak verbs.
Over the course of her academic training, X has been involved in a significant amount of research… X has worked on several accomplished cardiology research teams including a project looking at A. She has also been working on a project in the use of B echocardiography to evaluate C. (33: Asian woman, scholarly contributions)
Table 3 shows additional examples of bias in career trajectory. Notably, all applicants in Table 3 were rated in the top tier of research productivity. Despite this, URC applicants were framed as students or participants “working” with others, while non-URC applicants were framed as either already being scientists or having high potential to becoming scientists/researchers.
Discussion
We observed that with AAIM guidelines, there appeared to be a reduced use of communal language for URC applicants, which may represent less bias. We observed that bias still existed in both types of letters. PDs described URC applicants using communal language, and non-URC applicants using agentic language, regardless of format. This pattern existed in both narrative descriptions and selected quotations. Both letter types had examples of doubt raising and bias in career trajectory. This language was readily apparent even to non-experts in the field with minimal bias training. Finally, both letter types varied widely in format despite the structure suggested by the AAIM guidelines. We will discuss our main findings illustrating the helpfulness of AAIM structured guidelines to reduce bias, the persistence of bias despite these guidelines, and the potential sources of this bias.
Two components of structure created by the AAIM guidelines appeared to reduce bias. First, core competencies sections forced PDs to elaborate on clinical performance areas not traditionally covered. Second, the personal characteristics and skills sections reminded PDs to discuss both aspects about an applicant. Our findings are consistent with results from a previous study of LOR-AAIM, where fellowship PDs felt that structured LOR were clearer in communicating residents' performance across 6 core competency domains than LOR-NonAAIM.5
Bias persisted within LOR-AAIM despite the AAIM guidelines. This finding aligns with previous literature for otolaryngology residency where standardized LOR reduced but did not eliminate bias, especially between men and women.21 In our analysis, we saw bias persist in 3 different forms. First, it occurs when AAIM guidelines were only partially followed, as exemplified by PDs describing all clinical competencies in the same section rather than in 6 separate sections, thus incompletely addressing the competencies and straying into the pattern of biases. Second, we uniformly observed patterns of bias in the scholarly contributions and overall assessment sections. The lack of structure in the AAIM guidelines for the scholarly contributions and overall assessment sections contributed to this pattern. Third, since hedging or faint praise was only used to describe URC applicants, writers should be extra vigilant in this area.
Evaluative quotations and written narratives implicitly bring bias in both letter formats. The use of evaluative quotations in LOR is a long-standing practice requiring careful application. Selecting others' words introduces additional possibilities of bias. Furthermore, our finding that communal terms were framed as negative language for URC applicants but more positively for non-URC applicants exemplifies the perpetuation of communal language as a negative characteristic.
Our analysis generates recommendations for PDs and for the AAIM guidelines (see Table 4). We recommend the creation of a new “section for growth” in LOR-AAIM. Researchers report the pervasiveness of hedging in evaluations of residents.27 To rank-order residents, faculty must “read between the lines” of these evaluations, but the lack of a standard “hidden code” risks variable interpretation of evaluations.28 Our experience is that a de facto system to report trainee areas for growth is in use, often communicated with doubt raising language. A required section regarding areas of strength and for growth could diminish use of doubt raising language by requiring comments for all applicants. The business world uses such a section.29 We recommend further work by the Accreditation Council for Graduate Medical Education (ACGME) to create a Milestones-based system to track resident competency in research or scholarly activities. The ACGME Internal Medicine Subspecialty Milestones have a scholarship subsection (MK3) that does not exist as a Residency Milestone.30 Expanding residency clinical competencies to include scholarly activities would help PDs systematically evaluate applicants. Finally, we wish to acknowledge that the bias in these letters is part of a long history of oppression against women and people of color in the United States. Despite our recommendations, as long as there remains systemic racism and sexism, bias will continue to make its way, both overtly and insidiously, into letters of recommendation.31
Our study has limitations. First, the LOR were for applicants accepted to interview at a single cardiology fellowship program. We feel that the existence of biased language in the LOR for applicants to this program shows that bias toward URC applicants is likely omnipresent. Second, our study did not consider the gender or race of the letter writers, which can impact language, letter length, and overall appraisal of the applicant being evaluated.32,33 Third, coders were not blind to race/gender, introducing the possibility of confirmation bias. Fourth, given our sample, we could not comment on intersectionality of gender and racial bias, which has previously been shown to influence achievement word use in LOR.34
Conclusions
We found that language, including communal and agentic terms, doubt raising, and bias in career trajectory, was used in a biased pattern toward URC applicants. This bias appeared reduced, though not eliminated, when PDs followed the AAIM guidelines. We have provided recommendations on how to continue to work to reduce this bias.
References
Author notes
Editor's Note: The online version of this article contains the 2017 AAIM guidelines for standardized LOR and the codes for LOR-NonAAIM and LOR-AAIM.
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.