Background The integration of entrustable professional activities (EPAs) within objective structured clinical examinations (OSCEs) has yielded a valuable avenue for delivering timely feedback to residents. However, concerns about feedback quality persist.
Objective This study aimed to assess the quality and content alignment of verbal feedback provided by examiners during an entrustment-based OSCE.
Methods We conducted a progress test OSCE for internal medicine residents in 2022, assessing 7 EPAs. The immediate 2-minute feedback provided by examiners was recorded and analyzed using the Quality of Assessment of Learning (QuAL) score. We also analyzed the degree of alignment with EPA learning objectives: competency milestones and task-specific abilities. In a randomized crossover experiment, we compared the impact of 2 scoring methods used to assess residents’ clinical performance (3-point entrustability scales vs task-specific checklists) on feedback quality and alignment.
Results Twenty-one examiners provided feedback to 67 residents. The feedback demonstrated high quality (mean QuAL score 4.3 of 5) and significant alignment with the learning objectives of the EPAs. On average, examiners addressed in their feedback 2.5 milestones (61%) and 1.2 task-specific abilities (46%). The scoring methods used had no significant impact on QuAL scores (95% CI -0.3, 0.1, P=.28), alignment with competency milestones (95% CI -0.4, 0.1, P=.13), or alignment with task-specific abilities (95% CI -0.3, 0.1, P=.29).
Conclusions In our entrustment-based OSCE, examiners consistently offered valuable feedback aligned with intended learning outcomes. Notably, we explored high-quality feedback and alignment as separate dimensions, finding no significant impact from our 2 scoring methods on either aspect.
Introduction
While a traditional objective structured clinical examination (OSCE) primarily focuses on assessing specific clinical skills, an entrustment-based OSCE assesses learners’ readiness to independently carry out essential professional activities.1 The inclusion of entrustable professional activities (EPAs) in OSCEs, in addition to its primary purpose of assessing autonomy, establishes a solid framework for delivering timely verbal feedback to residents.1-3 Promptly provided after performance, observation-based feedback closely aligned with learning objectives has the potential to enhance residents’ autonomy and professional growth.4-7 Concerns have been raised, particularly with the introduction of new scoring methods, regarding the quality and alignment of feedback in OSCEs, underscoring the need for a comprehensive analysis.8,9
In a study by Martin et al,10 residents emphasized the significance of verbal feedback, even over their performance scores, particularly when the feedback is constructive and fosters their development as professionals. However, despite the existence of numerous methods for assessing feedback quality, few disclose strong psychometric properties, underscoring the necessity of utilizing validated instruments.11 Examiners’ self-assessment and students’ perception have limitations in providing objective and reliable measures. Physicians often lack self-awareness regarding their abilities,12 while students tend to misremember the content of feedback8 and to favor praise over constructive feedback.13
Another aspect of feedback is its content coherence with the principle of alignment stressing consistency between assessment, learning activities, and intended learning outcomes.14 If EPAs represent the expected learning outcomes and competency milestones are indicators of their achievement, feedback should consequently be congruent with these elements.15 Further, it is essential that feedback align with task-specific abilities that encompass the necessary knowledge and skills, since such abilities make up the fundamental criteria for entrusting autonomous practice.16 Failure to provide aligned feedback not only overlooks residents’ needs but also contradicts the purpose of a progress test within competency-based education.1
The scoring method used by the examiner to assess the clinical performance of the resident might influence the frequency, specificity, and timeliness of feedback.10,17 Previous studies on OSCEs have encountered limitations in determining the impact of scoring methods on feedback due to the simultaneous utilization of global scales and checklists.8,18
To address these gaps and concerns, our study aims to conduct a comprehensive analysis of the verbal feedback provided by examiners in an entrustment-based OSCE. We will assess the quality of feedback and its alignment with EPAs’ learning objectives (competency milestones and task-specific abilities).16 To assess residents’ clinical performance, examiners will employ either a 3-point entrustability scale or a task-specific checklist. Through a randomized crossover experiment, we will explore the effect of these 2 scoring methods on feedback quality and alignment.
KEY POINTS
What Is Known
Educators in graduate medical education continue to strive for ways to improve the quality of feedback given to residents and identify potential barriers to providing quality feedback.
What Is New
This study of internal medicine objective structured clinical examination (OSCE) examiners showed that feedback quality was high, aligned with learning objectives, and was not affected by choice of grading rubric framework.
Bottom Line
Programs directors incorporating OSCEs can feel reassured, based on this evidence, that both entrustability scales and task-specific checklists can produce equally valuable feedback as measured by Quality of Assessment of Learning scores.
Methods
Setting
Our study was conducted at Laval University School of Medicine in an urban Canadian setting. Since 2019, the internal medicine residency program has embraced EPAs as part of its competency-based teaching and assessment approach. The OSCE serves as an annual mandatory progress test. While OSCE outcomes are primarily formative, contributing to ongoing learning without standalone consequences, they are incorporated into programmatic assessments by a competency committee. The 2022 OSCE marked the institution’s inaugural implementation of an entrustment-based OSCE with immediate feedback. Sixty-seven out of 76 postgraduate year (PGY) 2 and PGY-3 residents participated across 3 waves. The OSCE comprised 10 stations: 7 for EPA assessments, 2 for questionnaire completion and 1 designated break station. Immediate feedback, lasting 2 minutes, was provided to residents by examiners after each EPA station. At that time, residents were unaware of the study’s focus, the scoring methods utilized, or their individual performance scores.
A committee of 4 clinician educators in internal medicine determined the clinical scenarios, task-specific abilities (2 or 3 per EPA), and checklist items (on average 20 per EPA). An example of the 2 scoring methods used in our study (3-point entrustability scale or task-specific checklist) is provided as online supplementary data. As shown in Table 1, the committee selected 7 core EPAs (3 complex medical situations, 2 acute care situations, and 2 shared decisions with patients and families) from the Royal College of Physicians and Surgeons of Canada framework, encompassing competency milestones in medical expertise, communication, collaboration, scholarship, and professionalism for PGY-2 and PGY-3 residents in internal medicine.19 The inclusion of these milestones is based on their universal presence across multiple competency frameworks.20
Participants
Clinical preceptors, PGY-4 and PGY-5 residents from the department of medicine, volunteered as examiners and were assigned to stations based on their expertise. Throughout the OSCE, examiners had access to a modified version of Table 1 detailing EPAs, competency milestones, and task-specific abilities. A written questionnaire asked about their familiarity with clinical content and milestones on a 5-point Likert-type scale (5=very familiar; 4=sufficiently familiar; 3=moderately familiar; 2=somewhat familiar; 1=not familiar), aiming to identify potential confounding factors influencing feedback provision. Two videos were presented to all examiners, explaining best feedback practices and proper use of entrustability scales (online supplementary data). Examiners were unaware of the study’s focus on feedback or the instruments employed for feedback measurement.
Intervention
In a randomized crossover design detailed in the Figure, every examiner assessed two-thirds of residents using a 3-point entrustability scale (nonautonomous, partially autonomous, or autonomous) and one-third using a task-specific checklist. Examiners did not use both scoring methods simultaneously.
Crossover Study Design: Examiners Assessed Residents With Entrustability Scales (N=45) or Checklists (N=22)
Crossover Study Design: Examiners Assessed Residents With Entrustability Scales (N=45) or Checklists (N=22)
Outcome
We assessed feedback quality using recordings of the feedback, randomly distributed to 2 blinded educators in internal medicine (A.L., D.T.L.). Each educator listened to 60% of the audio recordings, with a 20% overlap to calculate interrater agreement. A.L. and D.T.L. rated the quality of the feedback using the Quality of Assessment of Learning (QuAL) score,21 with 3 questions for a total of 5 points:
Does the examiner comment on the performance? (0=no comment at all; 1=no comment on performance; 2=somewhat; 3=yes/full description)
Does the examiner provide a suggestion for improvement? (0=no; 1=yes)
Is the examiner’s suggestion linked to the behavior described? (0=no; 1=yes)
The QuAL score has validity evidence as a workplace-based assessment and offers a promising framework for evaluating feedback quality based on best practices.21
To assess feedback quality through different lenses, examiners’ perspectives were collected through a single written question. Following each interaction with a resident, examiners were prompted with the question: “How confident are you that the feedback provided will enhance this student’s autonomy?” answered using a 5-point Likert-type scale (5=very confident; 4=sufficiently confident; 3=moderately confident; 2=somewhat confident; 1=not confident).
Assessing feedback alignment, A.L. and D.T.L. counted the number of competency milestones and task-specific abilities (detailed in Table 1) mentioned in the examiner’s feedback. For example, the feedback “Among the exams to prescribe, what was not mentioned was the abdominal ultrasound” contained the milestone “ME2: Select and interpret investigations based on clinical priorities,” and the task-specific ability “Ask for imaging.”
Residents’ perspectives were collected through a single written question. Following their participation in both the spondyloarthritis and diabetes stations, residents were prompted to provide an answer to: “Following the feedback received in the previous station, your autonomy level to accomplish the same task in a clinical setting is…” using a 5-point Likert-type scale (5=clearly increased; 4=increased; 3=similar; 2=decreased; 1=clearly decreased).
Analysis
We used SPSS Statistics version 21 (IBM Corp) for paired t tests when comparing scoring methods. We used intraclass correlation coefficient (2-way mixed-effects model) to calculate interrater absolute agreement.
The Research Ethics Committee of Laval University approved this study (No. 2019-390).
Results
All OSCE examiners—a total of 21—agreed to take part in the study. They provided feedback to 67 residents, all of whom consented to take part in the study. All the 2-minute feedback sessions were recorded (469 audio recordings). Four recordings were excluded due to technical issues. All 469 examiners’ questionnaires and 198 of 201 residents’ questionnaires were collected (3 printed questionnaires were lost during collection). Examiners’ characteristics are presented in Table 2. On average, across all cases and examiners, examiners demonstrated a familiarity score of 4.6 of 5 (SD=0.7) with the clinical content (eg, initiating insulin) and 3.5 of 5 (SD=1.1) with the milestones (eg, developing patient-centered management plans) of the EPA.
Study results pertaining to feedback quality and feedback alignment are detailed in Table 3. Regarding feedback quality, the mean QuAL score was 4.3 of 5 (SD=0.4) with an interrater agreement of 0.68 (substantial). Table 4 showcases illustrative feedback quotes corresponding to each competency milestone alongside their respective QuAL scores. Examiners self-assessed the quality of their feedback at mean 4.2 of 5 (SD=0.4).
Examiners’ (N=21) Feedback Quality and Alignment in an Entrustment-Based OSCE Assessing Residents with Entrustability Scales or Checklists

In terms of feedback alignment, examiners on average gave feedback on 2.5 of 4.1 (SD=0.5) competency milestones and 1.2 of 2.6 (SD=0.5) task-specific abilities with an interrater agreement of 0.60 (moderate) and 0.71 (substantial), respectively. In other words, examiners gave feedback on 61% of the preestablished milestones and 46% of the task-specific abilities of the EPA. There were no notable distinctions observed in feedback quality or alignment across all types of EPAs (complex medical situations, acute care situations, and shared decisions).
The 2 scoring methods used, 3-point entrustability scales vs task-specific checklists, had no significant impact on QuAL score (95% CI -0.3, 0.1, P=.28), alignment with competency milestones (95% CI -0.4, 0.1, P=.13), or alignment with task-specific abilities (95% CI -0.3, 0.1, P=.29).
Residents rated the quality of feedback at an average of 3.4 of 5 (SD=1.0) when assessed using entrustability scales and 3.5 of 5 (SD=0.8) when assessed with checklists.
Discussion
In the transition from traditional to entrustment-based OSCEs, attention is directed not only toward its assessment characteristics but also toward the resulting feedback.7,10,22 This study yielded 3 key findings. First, examiners consistently provided high-quality feedback across various types of EPAs, offering practical suggestions based on direct observations. Second, there was strong alignment between examiners’ feedback content and the intended learning outcomes of the OSCE, covering multiple competency milestones for each EPA assessed. Finally, the 2 scoring methods had no noticeable impact on feedback quality or alignment.
Prior studies assessing feedback quality have employed different approaches. Moineau et al18 employed a 5-point rating scale encompassing 7 questions, while Humphrey-Murto et al8 assessed feedback quality based on the examiner’s discussion of neutral, positive, or negative points. Well adapted to analyze brief feedback, the QuAL score uses 3 objective questions (2 of which are dichotomous), and it demonstrates significant interrater agreement.21 A high QuAL score not only involves a comprehensive performance description and suggestions for improvement but, more importantly, establishes a crucial link between suggestions and the observed performance with the aim of fostering autonomy.1
Our study quantified the alignment with intended learning outcomes (ie, competency milestones at 61% and task-specific abilities at 46%). These findings are consistent with an analysis of case discussions in our internal medicine residency program, showing 56% alignment of supervisors’ feedback with intended learning outcomes.23 We believe our OSCE’s constructive alignment with an entrustment-based curriculum utilizing EPAs as learning outcomes and milestones as indicators of achievement contributed to our results.14,16 Knowing the importance of examiners’ backgrounds, it is noteworthy that our examiners demonstrated a considerable familiarity with the content and milestones of their respective stations.24,25
Our crossover design allowed us to isolate the effect of 2 scoring methods from other variables (clinical scenario, examiners’ ability, examiners’ familiarization). We allocated two-thirds of the assessments to be conducted utilizing the entrustability scale. This decision stemmed from our principal objective to conduct a thorough analysis of an entrustment-based OSCE. Renting et al26 suggested that a competency-based scoring method in the right clinical scenario should align supervisors’ feedback. However in most studies, including ours, the scoring methods tested had minimal impact on feedback quality, which primarily reflects the abilities of the examiners.27,28
Our study takes into account the limitations of relying solely on examiners’ self-assessment or residents’ perception of feedback, particularly in high-pressure assessment scenarios.12,13,29 Even when feedback is delivered effectively, various emotional and cognitive factors can impact its reception.22 Therefore, we opted against directly comparing residents’ and examiners’ perspectives for 3 reasons: QuAL score results offer greater validity; the 2 questionnaires differed; and residents were surveyed after only 2 EPAs, which were both complex medical situations.
The generalizability of the study findings may be limited due to the study’s single-institution design, hence a relatively small, yet typical, group of examiners. The large number of recordings offsets, at least partially, this limitation. However, the knowledge of being recorded may have influenced the examiners’ feedback responses. Training videos and EPA descriptions encapsulated our expectations and possibly contributed to improving feedback. Future OSCEs should involve similar training interventions to yield comparable results. Our examiners, being volunteers, might have exhibited higher motivation and familiarity with EPAs and feedback compared to the average clinician, potentially explaining the higher quality of feedback in our study compared to the one by Marcotte et al27 and a recent analysis of EPA’s written comments by Madrazo et al.30 This divergence might also be due to our study’s use of audio recordings which could have captured a more comprehensive view of feedback’s content than written comments. The QuAL score, being a relatively new scoring tool, has not been previously employed to analyze feedback in OSCEs and may have shortcomings in capturing all dimensions of feedback quality, especially if the feedback is longer and complex.
As part of a quality assurance process, or in future studies looking at the discrepancies stated above, QuAL scores could be used during the OSCE by students and/or examiners to self-assess feedback. Furthermore, the incorporation of qualitative data through resident focus groups would delve deeper into the complex factors that influence residents’ receptivity to feedback during OSCEs.22,31 Future studies on programmatic assessment could aim to promote residents’ progression by strategically selecting EPAs to target low-opportunity situations of clinical practice within the design of an OSCE, expecting to deliver high-quality feedback that aligns with specific learning objectives.28
Conclusions
In our entrustment-based OSCE, examiners consistently offered valuable feedback aligned with intended learning outcomes. Notably, we explored high-quality feedback and alignment as separate dimensions, finding no significant impact from our 2 scoring methods on either aspect.
The authors wish to thank all the examiners and residents for their contribution to this project; the OSCE expert committee: Gabriel Demchuk, Rémi Lajeunesse, Marie-Josée Dupuis, Rémi Savard-Dolbec, and Julie Bergeron; the Internal Medicine Residency Program (Dr. Isabelle Kirouac), the Department of Medicine (Dr. Jacques Couët), and the Assessment Unit Team at the Faculty of Medicine of Université Laval for supporting this project (Dr. Christine Drouin); Ms. Sarah-Caroline Poitras and Julie Bouchard for helping with many logistical aspects; Dr. Gabriel Demchuk, Dr. Marianne Bouffard-Côté, and Jérôme Bourgoin for helping with the training videos; and Douglas Michael Massing for copyediting the first version.
References
Editor’s Note
The online supplementary data contains a 3-point entrustability scale and task-specific checklist for the diabetes station and further descriptions of the videos used in the study.
Author Notes
Funding: The authors report no external funding source for this study.
Conflict of interest: The authors declare they have no competing interests.
This work was previously presented at the Canadian Society of Internal Medicine Annual Meeting, Quebec City, Quebec, Canada, October 11-14, 2023.