Background Competency-based medical education (CBME) has been implemented in many residency training programs across Canada. A key component of CBME is documentation of frequent low-stakes workplace-based assessments to track trainee progression over time. Critically, the quality of narrative feedback is imperative for trainees to accumulate a body of evidence of their progress. Suboptimal narrative feedback will challenge accurate decision-making, such as promotion to the next stage of training.
Objective To explore the quality of documented feedback provided on workplace-based assessments by examining and scoring narrative comments using a published quality scoring framework.
Methods We employed a retrospective cohort secondary analysis of existing data using a sample of 25% of entrustable professional activity (EPA) observations from trainee portfolios from 24 programs in one institution in Canada from July 2019 to June 2020. Statistical analyses explore the variance of scores between programs (Kruskal-Wallis rank sum test) and potential associations between program size, CBME launch year, and medical versus surgical specialties (Spearman’s rho).
Results Mean quality scores of 5681 narrative comments ranged from 2.0±1.2 to 3.4±1.4 out of 5 across programs. A significant and moderate difference in the quality of feedback across programs was identified (χ2=321.38, P<.001, ε2=0.06). Smaller programs and those with an earlier launch year performed better (P<.001). No significant difference was found in quality score when comparing surgical/procedural and medical programs that transitioned to CBME in this institution (P=.65).
Conclusions This study illustrates the complexity of examining the quality of narrative comments provided to trainees through EPA assessments.
Introduction
Competency-based medical education (CBME) has been implemented in many residency training programs across Canada through a staged cohort approach, which is anticipated to be complete by 2027.1 Transitioning to CBME is mandated across Canada; however, Pan-Canadian specialty committees with representation from all provinces meet to determine the best timing for all schools across Canada to launch each specific discipline. A key component of the CBME curriculum is the documentation of frequent low-stakes workplace-based assessments in addition to high-stakes assessments in order to track trainee progression over time.2,3 Entrustable professional activities (EPAs) are designed to guide programs and trainees to relevant experiences by breaking down what it means to be a competent, safe, and effective practitioner into smaller, measurable, and achievable practice components, and they are one way that learner performance on workplace-based assessments is tracked.4-6 Critically, the quality of narrative feedback is imperative for trainees to accumulate a body of evidence that tracks their progression to competence. Suboptimal narrative feedback used by committees responsible for reviewing a trainee’s portfolio will challenge accurate decision-making, such as promotion to the next stage of training.7
Recent evidence suggests that the philosophy behind the development of a system of EPA-based observations is logical and promising; however, programs have encountered challenges with on-the-ground implementation.8-11 Evidence suggests that challenges and variance in the quality of narrative feedback documented on EPA assessments may include preceptor-to-trainee ratio, size of program and specialty type (ie, surgical/procedural versus medical), level of supervision required to complete EPA observations, preceptor interpretation of the scale anchors, and perceived administrative burden of providing feedback.12-21
A noteworthy reflection about the role of EPAs as core building blocks of CBME is the uncertainty about translating the activities of CBME from theory into practice and speculating if they will lead to anticipated outcomes. More specifically, there is uncertainty about the influence of EPA assessments on the competency of graduates entering unsupervised practice.19 The purpose of this study is to explore the quality of documented feedback provided on workplace-based assessments at one institution in Canada by examining and scoring EPA narrative comments using a published quality scoring framework. Given previous literature outlining the challenges of providing feedback to trainees, this work measures the quality of feedback on EPAs in relation to CBME launch year, program size, and program type, and provides insight into the narrative comments that committees review when tasked with making decisions about a trainee’s progression (or not) to the next stage of training.
KEY POINTS
Narrative feedback is a significant component of competency-based medical education, but the quality of such feedback has not been robustly reported at this scale and method to date.
This quantitative study of entrustable professional activity (EPA) narrative comments from multiple specialties at a single institution in Canada showed that smaller programs and those with an earlier launch year were associated with higher-quality feedback; medical and surgical specialties had no difference in quality score.
Understanding patterns where high-quality feedback can be found may help programs identify best practices as they look to improve their own EPA feedback quality.
Methods
Data Sources
To explore the quality of narrative feedback on EPA observation forms, we employed a retrospective cohort secondary analysis of existing data using de-identified EPA observations from trainee portfolios.
Data included in the analysis are from 24 programs in the July 2019 to June 2020 academic year at one institution in Canada. The institution is in an urban setting with more than 60 residency training programs. At the time of this study, there were more than 980 trainees enrolled in training. This study included only records from trainees that were in active Royal College of Physicians and Surgeons of Canada CBME programs (approximately 23 000 records) (Table 1). Program size ranged from 1 to 117 trainees, and all programs had been active in CBME for 1 to 3 years at the time of data collection.
List of Included Medical and Surgical Programs, CBME Launch Year, and Number of Residents During the July 2019-June 2020 Academic Year

EPA observation form adaptations at this institution include mandatory fields and prompts to guide documentation of quality feedback. For example, each EPA observation form contains one text box prompting the observer to describe what the trainee is doing well and why they should keep doing it, and another prompting the observer to describe something the trainee can do to improve their performance next time. The included narrative comments on EPA observation forms represent feedback provided to trainees from 24 specialty programs that have transitioned to CBME, representing a broad range of clinical disciplines, including medical, surgical, and diagnostic programs, as well as both primary specialty and subspecialty programs and, as such, also represent a large variety of EPA topics and contexts (see Table 1 for details). All narrative comments were exported into a Microsoft Excel (Excel version 16.61.1) document, and every fourth entry was scored, representing a selection of 25% of all EPA observations. Three raters (D.M.H., E.A.C., S.G.), 1 senior trainee and 2 PhD scientists, independently scored a subset of 10 narrative comments captured through EPA observations to establish initial agreement. An a priori minimum interrater agreement value of 80% was calculated using a commonly used approach of percentage agreement score (the number of agreements divided by the total number of EPA narrative assessed).22 The raters met frequently to discuss quality scores, and discrepancies were resolved through consensus and group scorings of an additional 5 EPA narratives to calibrate a working understanding between team members. The senior scientist (D.M.H.) audited a subset of ratings (10%) to ensure rater agreement of the scoring (82% agreement score).
Quality Scoring Tool
A recently published quantitative framework was used to score the quality of narrative feedback. The Quality of Assessment of Learning (QuAL) score developed by Chan et al is specifically designed to rate short, workplace-based narrative comments on trainee performance, evaluating narrative comments based on 3 questions intended to capture evidence of quality feedback (Figure).23 The QuAL score yields a numerical quality rating, with 5 being the highest score possible. A recent study explored the utility of the QuAL score based on narrative feedback provided on workplace-based assessments.24
Criteria Used to Assess the Quality of Entrustable Professional Activity Observation Narratives Using the Quality of Assessment of Learning (QuAL) Score
Criteria Used to Assess the Quality of Entrustable Professional Activity Observation Narratives Using the Quality of Assessment of Learning (QuAL) Score
Statistical Analyses
The Kruskal-Wallis rank sum test for nonparametric data was used to explore variance between programs, and epsilon squared (ε2) was used to identify the magnitude of variance. Correlations to explore the association between quality scores and program size, CBME launch year, and medical versus surgical/procedural specialities were calculated using Spearman’s rho (ρ).
This study was approved by the institution’s Research & Ethics Board.
Results
A total of 5681 documented EPA observations (every fourth narrative comment scored; 25% selected from each program) from 24 CBME programs were examined using the QuAL score. Table 2 illustrates examples of narrative feedback with QuAL scores.
Examples of Low, Moderate, and High-Quality QuAL Scores on Narrative Feedback in EPA Observation Forms

Mean QuAL scores, as calculated within programs, ranged from 2.0±1.2 to 3.4±1.4 (± represent margin of error, 95% confidence intervals) out of 5 across programs. Overall, a significant and moderate difference in the quality of feedback provided to trainees across programs was identified (χ2=321.38, P<.001, ε2=0.06). The quality of feedback is significantly associated with program size as well as launch year; smaller programs and those with earlier launch year performed better (P<.001). This association, however, is very weak (ρ=0.08, 0.05 respectively). Further, there was no significant difference in performance when comparing surgical and medical specialty programs (P=.65, ρ=-0.08).
Discussion
Results from scoring 5681 de-identified EPA observation records from 24 CBME programs indicate that the quality of documented feedback on EPA observation forms demonstrate a wide range of quality scores within and between programs, yet scores overall are on the lower to middle range based on the QuAL scoring tool (2.0±1.2 to 3.4±1.4 out of 5) and could be improved. These results build on the published literature about the implementation of CBME in which the importance of quality documented feedback is reported by both trainees and clinical teachers, yet the actual documentation of consistent quality feedback can be challenging.16,19,25,26 Considering the critical role documented feedback plays in the CBME training model, such as supporting learner growth over time and facilitating group decision-making about learner progression through training, these results add rigour to the existing evidence suggesting a discrepancy between the grand aspirations behind the design and the subsequent local implementation of this model that may challenge long-lasting change of CBME activities.27
Despite the results of this study, it is important to note that the quality of documented feedback is not the sole indicator of the quality of the feedback exchange between trainees and clinical teachers. Recent literature suggests that, in some cases, trainees prefer the informal feedback conversation that occurs prior to the completion of an EPA observation form, and that this feedback tends to be richer than what is documented.16,28 Faculty development initiatives have been created to support the new practices and the shift to a culture of CBME. These often aim to develop supervisors’ skills in providing rich verbal coaching conversations between trainees and clinical teachers while also supporting these supervisors’ ability to sufficiently document these encounters to support the requirements for progression recommendations by competence committees.18,29,30 Other areas to consider include (1) the administrative burden of EPA assessment on both preceptors and trainees, specific to balancing timeliness of feedback with quality of content; (2) similarities and differences in how quality feedback is conceptualized by preceptors and trainees; and (3) subsequent utility of the content provided in different contexts (eg, feedback in order to make progress decisions; feedback to foster actionable growth of the trainee).19,31,32
While the QuAL scoring tool demonstrates validity evidence for assessing the quality of brief workplace-based assessments, like EPA observations, the interpretation of the criteria by raters is subjective and may not align with constructs of helpful or useful feedback in all contexts. As such, a different group of raters could score aspects of the QuAL, like the evidence criterion, differently. There is also a limitation to the commonly used percentage-agreement method to calculating interrater reliability, as the value does not account for agreements that may be due to chance22,33 ; however, chance agreements are more likely to occur in binary yes/no decisions. The authors acknowledge the subjective nature of assessing the quality of narrative comments and have detailed their processes for transparency.
Another limitation is that EPA observation narrative comments included in this study are from one institution. Additionally, narrative comments examined include those collected during the COVID-19 pandemic. For privacy and confidentiality, timestamps of when EPA narrative comments were logged were removed from the dataset by external data stewards, and we are unable to perform an analysis that accounts for changes during this time. Finally, this methodology cannot determine the reasons why quality of narrative feedback may be lower than desired, such as electronic system barriers, skills deficits in the feedback providers, availability of resources, and potential administrative burden. However, assessing the quality of documented feedback provides valuable information about the real-world experiences of group decision-makers, as they are responsible for making progress recommendations about trainees and subsequent targets for improvement using EPA narratives as a source of evidence.
Future studies that include demographic information about learners and preceptors, and that explore the contexts and factors contributing to the quality of documented workplace-based assessments are necessary. These could potentially be multi-institutional studies to promote the transferability of findings or qualitative studies to better understand the reasons why quality was lower than desired. Further, it would be helpful to examine how efforts to improve supervisors’ narrative comments affect subsequent quality scores over time, as well as what influence external factors, such as COVID-19, have on the quality of narrative comments. It would also be of practical importance to see the potential of automated narrative comment quality rating systems and their comparability with the QuAL score to allow for incorporation of live quality reporting in electronic portfolio systems. Additional studies exploring ways to balance the challenges of implementing the practices and activities of CBME and the ambitions for the change initiative are needed, along with mobilizing these findings into practice.
Conclusions
The results suggest that smaller programs and those that have implemented CBME training for a longer time have somewhat higher-quality narrative comments on EPAs, with no difference observed between medical and surgical/procedural specialities.
References
Author Notes
Funding: The authors report no external funding source for this study.
Conflict of interest: The authors declare they have no competing interests.
This work was previously presented at Resident Research Day, University of Alberta, April 22, 2021, Edmonton, Alberta, Canada, and the virtual International Conference on Residency Education, October 22, 2021.