Background Despite existing guidelines for writing clerkship summative assessment narratives, their quality, structure, and utility remain variable. Categorizing Medical Student Performance Evaluation (MSPE) narratives using a framework can reveal patterns and gaps in content, offering actionable insights.
Objective This study aimed to (1) categorize MSPE narrative comments using the PRIME+ framework (professionalism, reporting, interpreting, managing, and educating, and areas for improvement [+]), and (2) examine differences in length and content by gender, race, origin of medical school, and final clerkship grade.
Methods Seven hundred twenty applications to our obstetrics and gynecology (OB/GYN) residency program in 2023 were reviewed, focusing on the OB/GYN core clerkship narrative. Narratives were categorized using the PRIME+ framework, and differences in length and content were assessed by gender, race, origin of medical school, and final grade. Differences between groups were evaluated with nonparametric tests.
Results Six hundred fifty-three narratives from 231 medical schools were included. Fifty-one unique grading systems were reported. PRIME+ domains were represented as follows: professionalism (94.8%, 619 of 653), reporter (71.1%, 464 of 653), interpreter (37.5%, 245 of 653), manager (69.1%, 451 of 653), educator (69.7%, 455 of 653), and areas for improvement (3.7%, 24 of 653). For each domain, <13% of narratives included ≥1 specific example. Median word count differed between US-based (155 words; 95% CI, 148-162) and international (61 words; 95% CI, 51-75) applicants (P=.001). Students earning “honors” had longer narratives (median words 149; 95% CI, 131-164 vs 117; 95% CI, 97-134; P=.001) with more specific examples (1.2 examples; 95% CI, 0.97-1.4 vs 0.88; 95% CI, 0.53-1.2; P=.024) and advanced PRIME+ domains, specifically educator (P=.016). The number of specific examples differed by race (P=.02) but not gender.
Conclusions MSPE narratives for the OB/GYN clerkship demonstrate variability in content and length.
Introduction
Despite the availability of clear national guidelines on writing summative clerkship narratives for the Medical Student Performance Evaluation (MSPE; Box 1),1 the structure and utility of these narratives remain variable.2-6 Along with MSPE variability and lack of completeness,7 the transition to pass/fail scoring for the United States Medical Licensing Examination (USMLE) Step 1 and inconsistency in grade reporting across medical schools8 have created a system wherein distinguishing students’ clinical performance is often challenging. Eight years since the latest revision of the Association of American Medical Colleges (AAMC) recommendations for the MSPE,9 experts continue to call for additional best practice guidelines to enhance the ability to accurately and fairly compare residency applicants.10 The increase in the relative importance of narrative assessment data necessitates an evaluation of the national adoption of the AAMC practice guidelines when writing narrative summaries of student clinical performance.
Box 1 Summary of the AAMC’s Best Practice Guidelines for Writing MSPE Narrative Summaries
Provides specific comments
Avoids bias
Comments seem to be based on direct observation of learner by evaluator
Describes areas of strength with clear examples
Describes areas for growth/improvement
Comments related to specific competencies (eg, ACGME Competencies, PRIME+, EPA)
The MSPE is a traditional component of the residency application. In the National Resident Matching Program (NRMP) 2020 Program Director Survey, the MSPE was identified as the fifth most important factor in selecting applicants to interview, surpassing other factors including USMLE Step 1 and 2 scores, letters of recommendation, and the personal statement.11 Based on a survey of internal medicine program directors, the “Academic Progress” section was the most influential portion out of 12 components of the MSPE. Although this section traditionally includes written narratives of student performance for each required clerkship, program directors rated narrative comments as seventh most important, with concerns about honesty and transparency emerging as common themes.12 Historically, these narratives have been muddled by vague language and bias, ultimately limiting the reader’s ability to accurately assess the students’ true clinical performance.2-6 To address the lack of specific guidance in the 2016 MSPE revision, the AAMC’s 2019 Writing Effective Narrative Feedback Working Group developed resources that are now widely available online.1
In reviewing obstetrics and gynecology (OB/GYN) summative narratives from MSPEs received at our institutions, we observed significant variability in language, length, and the frameworks used to present performance data. To investigate this inconsistency, we adopted the PRIME+ framework, a widely utilized tool for assessing medical students in clinical settings. PRIME+ assesses professionalism (P), reporting (R), interpreting (I), managing (M), educating (E) skills, and areas for improvement (+), providing a consistent lens for analyzing narrative summaries.2 Understanding the categories into which MSPE narrative comments fall can provide insight into the content and consistency of these evaluations. By systematically analyzing how different domains are described using an established framework, we may identify areas where narratives lack clarity or alignment with guidelines, paving the way for more standardized and actionable evaluations. For residency program directors, the findings could provide clearer guidance on interpreting narrative content, particularly when comparing applicants across institutions.
The objectives of this study were to: (1) categorize MSPE narrative comments using the PRIME+ framework, and (2) examine differences in the length and content of these narratives by gender, race, origin of medical school (United States or international), and final clerkship grade. We hypothesized that the use of the PRIME+ framework would reveal variability in narrative content and highlight potential biases in the inclusion of specific examples based on student demographics.
KEY POINTS
What Is Known
Despite existing guidelines for its composition, the content and structure of Medical Student Performance Evaluation (MSPE) narratives in clerkship assessments are inconsistent and prone to potential bias.
What Is New
One OB/GYN residency program used the PRIME+ framework to categorize content from 653 clerkship narratives from applicant MSPEs. Variations were found in narrative length and content based on the origin of medical school, final clerkship grades, and racial backgrounds, with US-based students and those receiving higher grades having longer narratives.
Bottom Line
This study adds evidence to the argument that there is more work to do toward standardizing MSPE narratives and including examples of behavior, in hopes of creating consistent and fair assessments across diverse student backgrounds.
Methods
This was a retrospective cohort study of OB/GYN clerkship MSPE summative narratives received at the New York Presbyterian Hospital–Columbia Campus OB/GYN residency program in 2023. Out of 720 applications, 67 (9.3%) were excluded for the following reasons: (1) no MSPE letter included (14 applications); (2) the applicant had not completed or been assessed for the OB/GYN clerkship (2 applications); (3) the curriculum included multiple OB/GYN clerkships (5 applications); or (4) the MSPE letter lacked OB/GYN clerkship narrative comments (46 applications).
For each MSPE letter, the following medical school characteristics were recorded: name, type (allopathic or osteopathic, public or private), and location. Applicant demographics, including gender, race, and ethnicity, were also recorded. For the OB/GYN clerkship portion of “Academic Progress,” the clerkship grading system, grade distribution, and student grade were recorded. Grading systems were categorized as unique if their specific terminology differed from that of other schools or the 4 main systems (honors/high pass/pass/fail; honors/pass/fail; pass/fail; letter). AAMC best practice guidelines suggested adopting a particular framework when writing narrative summaries, including PRIME+, Accreditation Council for Graduate Medical Education (ACGME) Core Competencies, and entrustable professional activity (EPA)-based frameworks.1 We selected PRIME+ over other alternatives for its relative simplicity, the universal and accessible language it employs, and because clerkship directors in OB/GYN have demonstrated its utility as a framework for assessing clinical performance within our specialty.13
The clerkship narrative was de-identified and quantitatively coded by 1 of 3 reviewers (M.S., M.A., S.S.) for the number of PRIME+ domains and subdomains2 and specific examples using Qualtrics (Qualtrics) for data entry and management. Each narrative was analyzed line by line, with individual sentences coded for one or more PRIME+ domains. A single sentence could reflect multiple domains if the described behavior aligned with more than one competency. Similarly, each narrative could contain multiple instances of the same domain. The number of specific examples for each subdomain were also counted. A specific example was defined as the inclusion of a unique instance in which a student demonstrated the broader behavior described by the PRIME+ subdomain. An initial in-person scoring session allowed the reviewers to practice and clarify questions for each other. After extracting data from 15 narratives and reaching consensus on the definitions of the domains and subdomains, the MSPE paragraphs were divided among the researchers, and the reviewers coded them independently. A secure electronic group chat was established for posting and resolving categorization questions through group consensus. Box 2 details a sample coding scheme for a student summative narrative.
Box 2 Sample Coding of MSPE Paragraph
“Throughout her OB/GYN clerkship, X conducted herself in a professional manner, demonstrating genuine intellectual curiosity in women’s health topics and a superb work ethic.1 She obtained accurate histories that were appropriately tailored to the clinical context,2 honing in on pertinent elements to help refine her assessment.3 She delivered well-organized and concise presentations,4 and similarly, her notes were strong for her level.5 Her physical examination and technical skills were appropriate, and she could be relied upon to perform level-appropriate components of the patient assessment independently.6 She did a great job with vaginal deliveries, requiring little redirection.7 She advocated in her patients’ care and even provided prenatal education to patients unprompted.8 Multiple evaluators applauded X for her depth of medical knowledge9 and sound clinical reasoning.10 She clearly prepared ahead of time by reading relevant material and asking questions to solidify her understanding.11 In a busy clinic, she evaluated patients promptly and efficiently12 while still coming up with well-thought-out assessments and plans.13 Evaluators encourage her to continue to increase her confidence in her role on a busy inpatient team, as she clearly had much to offer to the clinical dialogue.14”
Word Count: 187. 1Professionalism. 2Reporter (obtain information from interview). 3Interpreter (interpret data from history, physical, labs, imaging). 4Reporter (report findings in an oral presentation). 5Reporter (report findings in written note). 6Manager (performing simple procedures). 7Manager (performing simple procedures); 1 specific example. 8Educator (educate patients or their team members); 1 specific example. 9Educator (other: fund of knowledge). 10Interpreter (interpret data from history, physical, labs, imaging). 11Educator (educate themselves via self-directed learning). 12Manager (managing one’s own time). 13Manager (formulate therapeutic plan). 14An area for improvement is suggested; 1 specific example.
Descriptive statistical tests were employed to summarize applicant, medical school, and narrative characteristics. While the data for specific examples were not normally distributed, means were reported instead of medians to facilitate interpretation due to the small range of values and the nature of the data. Differences between groups were evaluated with chi-square, Fisher’s Exact, Kruskal-Wallis, and Mann-Whitney nonparametric tests. Bonferroni corrections were applied for multiple comparisons following post-hoc pairwise comparisons.
The study was reviewed and deemed exempt by the Columbia University Institutional Review Board.
Results
The final analysis included 653 MSPE narrative assessments for the core OB/GYN clerkship, representing approximately 30.5% of the 2143 applicants to OB/GYN residency programs in 202314 from across 231 medical schools. Most applicants identified as woman (88.8%, 580 of 653) and White (39.7%, 259 of 653) and were from US-based (70.6%, 163 of 231), allopathic (60.6%, 140 of 231), publicly funded (36.4%, 84 of 231) institutions (Table 1).
Fifty-one unique grading systems were reported across 231 medical schools. Of the traditional grading systems, the most common was “honors/high pass/pass/fail” (32.9%, 76 of 231) and the least common was “pass/fail” (5.2%, 12 of 231; Table 2). In schools that used an “honors” grade designation (N=98), 64% (186 of 290) of students earned a final grade of “honors” in the OB/GYN clerkship, and the distribution of “honors” grades in academic year 2023 ranged from 1% to 99% (mean 37.5%, SD 18.8).
PRIME+ domains across all summary narratives were represented as follows: professionalism (94.8%, 619 of 653), reporter (71.1%, 464 of 653), interpreter (37.5%, 245 of 653), manager (69.1%, 451 of 653), educator (69.7%, 455 of 653), and areas for improvement (3.7%, 24 of 653; Table 3). The proportion of narratives that included at least one specific example varied by domain, ranging from 1% to 12%. The number of PRIME+ domains did not differ by race (P=.49) or gender (P=.31). The number of specific examples differed by race (P=.02) but not gender (P=.55). Pairwise comparisons with an appropriate Bonferroni correction identified the significant difference to be only between those identifying as Asian (154 specific examples) and those that “preferred not to state race” (45 specific examples), with a small effect size (r=-0.242). Twenty-four narratives (3.7%) representing 13 medical schools (5.6%) listed an area for improvement. Of these, 6 (25%) originated from the same school. Median word count differed between US-based (155 words; 95% CI, 148-162) and international (61 words; 95% CI, 51-75) applicants, P<.001). The median number of PRIME+ domains (P=.001), but not the number of specific examples (P=.07), differed between US-based (4 domains, 95% CI, 4-4) and international applicants (2 domains; 95% CI, 3-4). While not a formal part of the framework, 383 narratives (58.7%) commented on “medical knowledge” and 8 (1.2%) commented on involvement in OB/GYN research.
Among schools with “honors/high pass/pass/fail” and “honors/pass/fail” grading systems (N=98), students who earned a final grade of “honors” had longer narratives (median words 149, 95% CI, 131-164 vs 117; 95% CI, 97-134; P=.001) with more specific examples (1.2 examples; 95% CI, 0.97-1.4 vs 0.88; 95% CI, 0.53-1.2; P=.024) and advanced PRIME+ domains than students who did not earn “honors.” Specifically, a greater percentage of narratives for students earning “honors” contained comments on the “educator” (73% vs 61%, P=.016) domains.
Discussion
Our quantitative analysis demonstrated that the professionalism domain was consistently represented in almost all narratives, which were otherwise widely variable in structure and length. Behavioral comments within PRIME+ were seldom backed by specific, illustrative examples. The number of unique grading systems (N=51) utilized by the medical schools is higher than previously reported in the United States,8 partially due to the inclusion of international programs. Clerkship summative assessment narratives suggested differentiation between “honors” and “non-honors” students based on content and length. The distribution of PRIME+ domains and specific examples did not differ by race or gender.
The apparent emphasis on professionalism in narrative summaries is not surprising. This focus may reflect the relative ease of commenting on these behaviors compared to other domains. Additionally, it may satisfy the needs of program directors who are willing to overlook deficiencies in certain domains but not violations of professional conduct.15 While the prevalence of the reporter, manager, and educator domains was nearly identical, fewer narratives commented on the interpreter domain. We suspect this is because providing quality feedback on this domain requires substantial direct observation and engagement from clinical evaluators whose time is scarce and divided among multiple responsibilities. The relatively high prevalence of comments in the manager and educator domains is likely driven by comments on “procedures” (under which the ability to perform a pelvic examination was coded) and the “ability to self-direct learning.” It may also be easier to observe and comment on educator subdomains, including the ability to ask for feedback and engage in self-directed learning. Lastly, most narratives lacked specific areas for improvement, as writers may find it counterintuitive and potentially disadvantageous for students. The relative lack of areas for improvement in our cohort may be the result of sampling bias introduced by the type of applicant, a direct result of our program’s reputation as being highly competitive. In our cohort, the inclusion of areas for improvement tended to cluster around specific institutions, suggesting an institutional-level practice.
The Standardized Letter of Evaluation (SLOE) is now strongly recommended for applicants in multiple specialties, including OB/GYN, and has become an integral component of the application process by offering a structured, standardized evaluation based on ACGME Core Competencies.16,17 Its introduction may address many of the challenges identified in MSPE narratives. The interplay between the 2 is particularly important in understanding how these documents complement one another to meet the needs of programs and applicants. The SLOE provides a focused, specialty-specific evaluation that allows for direct comparison of applicants within the specialty. The MSPE offers a longitudinal, institution-specific perspective on an applicant’s performance across all clerkships, often contextualizing their broader trajectory as a medical student. In a residency application landscape increasingly shaped by changes like pass/fail Step 1 scoring and program signaling, the role of the MSPE must evolve to complement tools like the SLOE. Our PRIME+ analysis highlights specific practices that could enhance the utility of the MSPE by making the “Academic Progress” section more transparent, actionable, and comparable. Specific practices that could enhance the utility of the MSPE include ensuring consistency in the inclusion of PRIME+ domains, providing clear and specific examples to illustrate student performance, and aligning narrative content with established guidelines. Additionally, incorporating areas for improvement, which were notably absent in most narratives, would offer a more balanced and transparent evaluation of student performance.
Such improvements would benefit applicants and program directors by contextualizing the student’s performance outside their specialty of interest and providing greater differentiation among candidates. This synthesis invites a broader reflection on the existential purpose of the MSPE in an era where the SLOE and other tools address many of the problems it traditionally sought to solve. Furthermore, we must also consider the impact of the variability in existing grading schemes and distributions. As it stands, the existing variety in grading systems and standards provide limited information in distinguishing student performance, an issue that could potentially be explored further by examining Match outcomes.
Our study has several limitations that impact the generalizability and interpretation of the results. Only one-third of the applicant pool in 2023 is included, representing a cohort that was interested in one specific residency program, thereby introducing sampling bias. A multi-institutional approach capturing a broader applicant pool could mitigate this bias and enhance generalizability. We utilized the PRIME+ framework exclusively to analyze the narratives. While this provided a consistent lens, it confined our interpretations to the specific parameters of this framework. Adopting multiple frameworks, such as combining ACGME Core Competencies or EPA frameworks with PRIME+, could offer a more comprehensive understanding of the narratives and reveal different insights. Lastly, we have included only OB/GYN narratives and as such have a narrow, specific view into the MSPE.
Conclusions
MSPE narratives for the OB/GYN clerkship demonstrate variability in content and length. Length and content did not differ by gender or race but varied by medical school of origin (United States or international) and final clerkship grade.
References
Author Notes
Funding: The authors report no external funding source for this study.
Conflict of interest: The authors declare they have no competing interests.