ABSTRACT
Whether written comments in entrustable professional activities (EPAs) translate into high-quality feedback remains uncertain.
We aimed to evaluate the quality of EPA feedback completed by faculty and senior residents.
Using retrospective descriptive analysis, we assessed the quality of feedback from all EPAs for 34 first-year internal medicine residents from July 2019 to May 2020 at Western University in London, Ontario, Canada. We assessed feedback quality on 4 domains: timeliness, task orientation, actionability, and polarity. Four independent reviewers were blinded to names of evaluators and learners and were randomized to assess each EPA for the 4 domains. Statistical analyses were completed using R 3.6.3. Chi-square or Fisher's exact test and Cochran-Armitage test for trend were used to compare the quality of feedback provided by faculty versus student assessors, and to compare the effect of timely versus not timely feedback on task orientation, actionability, and polarity.
A total of 2471 EPAs were initiated by junior residents. Eighty percent (n=1981) of these were completed, of which 61% (n=1213) were completed by senior residents. Interrater reliability was almost perfect for timeliness (κ=0.99), moderate for task orientation (κ=0.74), strong for actionability (κ=0.81), and moderate for polarity (κ=0.62). Of completed EPAs, 47% (n=926) were timely, 85% (n=1697) were task oriented, 83% (n=1649) consisted of reinforcing feedback, 4% (n=79) contained mixed feedback, and 12% (n=240) had neutral feedback. Thirty percent (n=595) were semi- or very actionable.
The written feedback in the EPAs was task oriented but was neither timely nor actionable. The majority of EPAs were completed by senior residents rather than faculty.
The purpose of this study was to evaluate the quality of entrustable professional activity (EPA) feedback completed by faculty and senior residents.
The written feedback in the EPAs was task oriented but was neither timely nor actionable; most were completed by senior residents rather than faculty.
Study findings were from a single program and institution which may limit generalizability.
This study offers an approach to assessing the quality of written EPA feedback that can be adapted to other institutions that implement EPAs.
Introduction
Residency training in Canada has shifted to competency-based medical education (CBME) to restructure curricula around physician competencies and better prepare clinicians to serve patients.1 This transition introduced entrustable professional activities (EPAs), which are specialty-specific clinical tasks that can be entrusted to trainees once they demonstrate competence in completing the task independently.2,3 Because tasks outlined by EPAs are always contextualized by an assessment, we use the term EPA to encompass both. EPAs are distinct from the traditional in-training evaluation report (ITER), which is an overall rotation-based summative evaluation.4-6 EPAs are meant to both increase and capture formative, timely, and task-specific7 feedback in addition to existing ITERs. One of the goals of CBME is to provide opportunities for feedback and coaching for residents. However, it is unclear whether use of EPAs results in high-quality feedback. From the literature, high-quality feedback is timely, task oriented, and actionable.8,9 Moreover, feedback that is corrective tends to provide more useful information than those that are positive or neutral in their polarity.10
Few studies have assessed the quality of written feedback captured through EPAs. Two studies from Queen's University School of Medicine medical oncology program showed that both faculty and residents valued high-quality written feedback captured in EPAs.7 Their pilot study showed that 33% of feedback from 17 EPAs analyzed were actionable,7 which later increased in a follow-up study showing that 56% of 157 EPA feedback was actionable,11 suggesting an increased prevalence of actionable feedback over time. In a different center, a psychiatry residency program evaluated a newly implemented mobile app to facilitate EPAs and found that 95% (94 of 99) of comments were task specific.12 Additionally, focus groups composing residents from multiple specialties at McMaster University revealed a perceived higher frequency of feedback with EPAs but poor-quality feedback as a result of “assessment fatigue.”13 These studies in smaller programs suggest a mixed quality of feedback received through EPAs. Ascertaining the quality of EPA feedback from larger residency programs, where CBME implementation is likely to be most challenging, may better gauge practical application of EPAs more generally. A standardized approach to assessing feedback quality was also lacking. The Canadian Excellence in Residency Accreditation requires demonstration of ongoing continuous quality improvement (CQI) program initiatives. As part of this CQI initiative, and with the implementation of CBME for internal medicine (IM) programs across Canada in July 2019,14 our objective was to assess the quality of the written feedback being documented within EPAs for postgraduate year (PGY)-1 IM residents at Western University in the first year of CBME implementation. Specifically, our study sought to examine EPA feedback for timeliness, task orientation, actionability, and polarity and differences between feedback provided by faculty members and senior residents. In doing so, we also sought to develop a method of feedback analysis that would be translatable to other institutions using EPAs.
Methods
Setting
The IM program at Western University in London, Ontario, Canada began a preliminary implementation of CBME in the 2018-2019 academic year. This included faculty and resident education throughout the year on the use and purpose of EPAs as well as education regarding the qualities of high-quality feedback. Faculty development also included meetings with each division and out-of-town elective sites in the lead-up year to provide specialty-specific education and examples. Official implementation commenced on July 1, 2019.
EPAs at our institution are requested by a junior resident and completed by an assessor (senior resident or attending physician) electronically through an online platform (Elentra) accessed through computers or mobile devices. Residents must be assessed on a specific number of each EPA as required by the Royal College of Physicians and Surgeons of Canada.15 Only faculty or residents more senior to the learner may complete EPAs. Each EPA has a 5-point rating scale for entrustability and has 2 sections available for narrative feedback: (1) Comments where assessors can provide feedback on learner performance on the task in question, and (2) Next Steps where assessors can provide recommendations for further development (see online supplementary data). These sections were the focus of our analyses on feedback quality. There are 10 unique EPAs across the 2 stages of training in the first year (see online supplementary data). There are multiple contextual variables for each EPA, and residents are required to obtain multiple observations.
Study Population
We analyzed all EPAs completed between July 2019 and May 2020 for 34 PGY-1 residents in the IM program.
Feedback Analysis
From the literature, we reviewed several examples of feedback analysis7,10,12,16 to assess the quality of written feedback in the EPAs. We identified and defined the following domains as important and measurable qualities of good feedback: timeliness, task orientation, actionability, and polarity. Modification and final agreement of variable definitions was achieved after adjudication of a test set of 30 EPAs independently graded by all 4 investigators (L.M., N.C., J.D., S.K.).
Timely feedback was defined as EPA completion within 7 days of the clinical encounter. Our data captured the number of days between the date of the clinical encounter and the date the EPA was triggered by the learner (time from encounter to trigger [TET]). We ascertained the number of days between the trigger date and the date of EPA completion by the assessor (trigger to completion [TTC]). We gauged timeliness as the sum of TET and TTC. The 7-day measure for timeliness was based on the measure used by Tomiak and colleagues.7
Written feedback was labeled as “task oriented” if it commented on specific tasks or actions. Feedback was labeled as “very actionable” if recommendations gave targeted specific actions or behaviors, and “not actionable” if feedback gave no recommendations for development. Through our adjudication process, we identified comments that held value as feedback but fell in between our a priori definitions of very actionable and not actionable. We therefore thought it important to distinguish these comments and categorized them as “semi-actionable.” Finally, narrative feedback was analyzed for polarity and was deemed as “reinforcing” if feedback complimented learners' performance, “corrective” if feedback identified problematic performance, “mixed” if comments contained both reinforcing and corrective elements, and “neutral” if no feedback was given or if comments did not address learner performance. Because feedback in the Next Steps section is meant to provide constructive recommendations, we based polarity only on feedback written in the Comments section. Please see Table 1 for a summary of these definitions and relevant examples.
EPAs were randomized and assigned to 2 among 4 independent reviewers (L.M., N.C., J.D., S.K.). All identifying data of the assessors and residents were removed by the program administrator prior to the study. Reviewers read the narratives within each EPA and assigned a code for each domain of quality feedback—timeliness (yes or no), task orientation (yes or no), actionability (very, semi-, or not actionable), and polarity (reinforcing, corrective, mixed, or neutral). Coding was completed in Microsoft Excel. Reviewers then met to discuss any disagreements in coding, which were resolved through consensus.
Statistical Analysis
Statistical analyses were completed using R 3.6.3 (The R Foundation). Interrater reliability for each domain within the feedback analyses was determined using Cohen's kappa (for nominal/binary variables) or weighted kappa (for ordinal variables). The level of agreement was interpreted as no (≤0.20), minimal (0.21-0.39), weak (0.40-0.59), moderate (0.60-0.79), strong (0.80-0.90), or almost perfect (>0.90).17
Chi-square or Fisher's exact test (for nominal/binary variables) and Cochran-Armitage test for trend (for ordinal variables) were used to compare the type of feedback provided by faculty vs resident assessors, and to compare timely vs not timely feedback.
Lastly, to evaluate whether there was a timeliness by polarity interaction in the type of feedback received, a multivariable logistic regression model was used, including timely, polarity, and their interaction as covariates. Separate models were run for task oriented and actionable feedback as the outcome. A significant interaction term (P<.05) was indicative of an interaction. Given the sparse data for some categories, the polarity and actionable variables were dichotomized for the multivariable models.
As this study was conducted as part of a programmatic CQI initiative, ethics approval was not required according to local policy.18
Results
A total of 2471 EPAs were initiated by PGY-1 residents. Of these, 1981 (80%) were completed by assessors and were included in our analyses. Of these EPAs, 1213 (61%) were completed by senior resident or fellow physician supervisors, and the remainder were completed by attending physicians.
Interrater reliability of adjudicators was almost perfect for timeliness (κ=0.99), moderate for task orientation (κ=0.74), strong for actionability (κ=0.81), and moderate for polarity (κ=0.62). Senior resident assessors were all PGY-2 to PGY-5 residents. Analysis of the feedback showed that 47% (926 of 1981) of EPAs were timely. Median time for TET was 3 days (25th and 75th percentiles: 1 and 10 days). Median time for TTC was 2 days (25th and 75th percentiles: 0 and 10 days). Eighty-five percent (1679 of 1981) of feedback was task oriented. Regarding polarity of feedback, 83% (1649 of 1981) was reinforcing, 4% (79 of 1981) was mixed, and 12% (240 of 1981) was neutral.
Differences Between Resident and Faculty Assessors
Table 2 presents the type of feedback provided by residents and faculty advisors. Resident assessors were associated with providing more reinforcing feedback compared to faculty assessors (P=.007) based on Fisher's exact test. Residents and faculty did not differ with respect to timeliness (χ2(1)=0.5, P=.48) or task orientation (χ2(1)=0.4, P=.52). There was no difference between faculty and resident assessors in TTC. The Cochran-Armitage test for trend for ordinal data showed no difference in actionability of feedback between residents and faculty (z=0.37, P=.71).
Differences Between Timely and Not Timely Feedback
Table 3 presents the type of feedback provided, stratified by timeliness. The Cochran-Armitage test for trend showed that timely feedback was associated with feedback that was very actionable (z=3.11, P=.002). No difference in task orientation (χ2(1)=0.16, P=.69) or polarity (χ2(3)=1.76, P=.62) was identified between timely and not timely feedback. Lastly, the multivariable logistic regression did not identify a significant interaction between timeliness and polarity of feedback in terms of whether the feedback was task oriented (P=.16) or actionable (P=.40).
Discussion
Our study showed that, while most written feedback in EPAs was task oriented, fewer than half of the EPAs were completed in a timely manner. Moreover, timely feedback was correlated with greater actionability. Lastly, a greater percentage of mixed or corrective feedback was given by faculty, although faculty completed fewer EPAs compared to senior residents.19 That only 47% (926 of 1981) of EPAs were completed in a timely manner suggests that this parameter can be improved. This finding, though concerning, is not surprising in that it likely reflects previously reported difficulties with allotting time to complete the forms themselves.7 The recent national survey of Canadian residents by the Resident Doctors of Canada on the implementation of CBME reported that 32.9% of respondents perceived a lack of time in completing evaluations,20 with written survey comments describing the time-consuming process of completing EPAs. Notably, 66.9% named evaluation fatigue as another barrier to CBME implementation.20 Previous research has demonstrated how EPA completion for small programs, such as radiology, can add a significant administrative burden on those involved in the assessment process.21 Thus, one potential way to alleviate this burden is to make the process itself more efficient by way of improved technology and dissemination process.21 A survey of Canadian neurological surgeons showed that staff neurological surgeons were willing to complete an EPA if it took less than 3 minutes and if it was accessible through a mobile application.22 One study of a mobile app for EPAs among psychiatry residents showed that the average time to complete an EPA via a mobile app was 76 seconds.12 These improved technologies are promising avenues to increase the efficiency of EPA completion.
However, we note that the speed in completing EPAs may not correlate with the quality of feedback and may in fact compromise it. Therefore, an important yet more challenging approach to the issue of timeliness would be to reconsider the balance between the number of EPAs required and the quality of feedback/data each required EPA yields. Even though our data affirms the intuition that timely feedback correlates with actionable feedback, this does not account for the time it takes to complete multiple EPAs at a time.
Regardless of timeliness, the overall prevalence of actionable feedback was low in our study. This appears similar to the studies from Queen's University during initial phases of CBME implementation.7,11 The finding of increased prevalence of actionable feedback over time may reflect a learning curve with CBME implementation. In the meantime, further faculty and resident development may be needed to develop their roles as coaches and assessors and to standardize the actionability of feedback.23 Simple interventions such as the addition of prompts to elicit richer narrative feedback may also be effective.5
Improving the actionability of feedback—especially with corrective feedback—remains important because of evidence showing a lack of improvement in this area with CBME.24 This is supported by our findings, as only 1% had corrective polarity. Moreover, this lack may reflect tensions assessors may have between their role as assessor and mentor/coach23 as well as a prevailing culture of “failure to fail”25,26 described in the literature.
Lastly, we note that a greater proportion of EPAs in our study were completed by senior residents compared to staff physicians. Some explanations for this may include the ability of senior residents to complete EPAs that require direct observation while on call. Residents may also have an increased level of comfort and trust9,23 when asking for feedback and EPA completion. And while there is evidence to suggest that near-peer assessors provide similar ratings to staff physicians in low-stakes settings,27 peer assessors also tend toward giving more favorable ratings28—a finding that we observed in our study. Thus, in the context of CBME, this tendency raises the question of whether senior residents can be relied upon as the prevailing drivers of completing EPA assessments and in gauging the competence of their junior colleagues. Moreover, it also raises the question of whether the burden of assessment is disproportionately placed on residents rather than on attending physicians who were meant to give more feedback with CBME implementation.
Our study has several limitations. It was done in a single program and institution; therefore, our results may not be generalizable to other settings. Furthermore, assessing the quality of feedback remains subjective and context dependent. As reviewers, we interpreted EPA feedback apart from the original clinical context. We recognize that the quality of written feedback in EPAs does not necessarily reflect the feedback conversations that may have taken place during the respective clinical encounters. While our study was done within the context of IM, our methodology is translatable to other specialties that use EPAs to provide an approach to evaluating the quality of written feedback. Future studies should explore factors that contribute to the timeliness of EPA completion. Whether the proportion between faculty and resident assessors differs between institutions and specialties and the reasons why would also be important to explore further based on our study.
Conclusions
Overall, the written feedback in the EPAs we analyzed was task oriented but was neither timely nor actionable. Most of these EPAs were completed by senior residents rather than faculty.
The authors would like to thank Ms. Ana Malbrecht for her work in preparing the data for analysis and to Dr. Christopher Watling for his input in conceptualizing this paper.
References
Author notes
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
Some of the content discussed in this article was previously presented at the virtual Canadian Conference on Medical Education, April 2021; the Western University Competency Based Medical Education Innovators Half Day, April 2021, London, Ontario, Canada; the Royal College 2021 Competency Based Medical Education Program Evaluation Summit, October 2021; and the virtual International Conference on Residency Education, October 20-22, 2021.
Editor's Note: The online version of this article contains the 5-point rating scale for entrustability and sections for narrative feedback, and an analysis of the entrustable professional activities.