ABSTRACT
Research suggests that workplace-based assessment (WBA) tools using entrustment anchors provide more reliable assessments than those using traditional anchors. There is a lack of evidence describing how and why entrustment anchors work.
The purpose of this study is to better understand the experience of residents and faculty with respect to traditional and entrustment anchors.
We used constructivist grounded theory to guide data collection and analysis (March–December 2017) and semistructured interviews to gather reflections on anchors. Phase 1 involved residents and faculty (n = 12) who had only used assessment tools with traditional anchors. Phase 2 involved participants who had used tools with entrustment anchors (n = 10). Data were analyzed iteratively.
Participants expressed that the pragmatic language of entrustment anchors made WBA (1) concrete and justifiable; (2) transparent as they explicitly link clinical assessment and learning progress; and (3) align with training outcomes, enabling better feedback. Participants with no prior experience using entrustment anchors outlined contextual concerns regarding their use. Participants with experience described how they addressed these concerns. Participants expressed that entrustment anchors leave a gap in assessment information because they do not provide normative data.
Insights from this analysis contribute to a theoretical framework of benefits and challenges related to the adoption of entrustment anchors. This richer understanding of faculty and resident perspectives of entrustment anchors may assist WBA developers in creating more acceptable tools and inform the necessary faculty development initiatives that must accompany the use of these new WBA tools.
What was known and gap
Research suggests that workplace-based assessment tools using entrustment anchors provide more reliable assessments than traditional anchors, but there is a need for evidence that describes how and why entrustment anchors work.
What is new
Semistructured interviews with residents and faculty as to why they think entrustment anchors are better than traditional anchors.
Limitations
Interviews were conducted at a single site and focused on specific anchors used at that site, limiting generalizability.
Bottom line
The pragmatic language used in entrustment anchors makes them more concrete and transparent, and aligns them with training outcomes.
Introduction
The medical education system is transitioning from a time-based, linear curriculum to a competency-based medical education (CBME) curriculum. The Accreditation Council for Graduate Medical Education (ACGME) Milestones and the Royal College of Physicians and Surgeons of Canada Competence by Design are examples of this. The goal of this change is to ensure that trainees progress after demonstrating competence in a given area instead of simply completing a rotation. This represents a significant shift in pedagogical values and leaves educators searching for better ways to assess trainees within revised learning frameworks. Research suggests that workplace-based assessment (WBA) is an optimal method of assessing professional competence.1 Consequently, there is a meaningful emphasis on this type of assessment within CBME curricula.2 With the growth of CBME comes the need to develop and deploy WBA tools that accurately reflect the workplace performance of medical students and residents.3,4
There are numerous tools available for WBA, including the mini-clinical evaluation exercise (mini-CEX) and direct observation of practical skills.5 These tools typically have a list of items to be assessed on an anchored or partially anchored rating scale and a space for narrative comments. Traditional anchors for these scales include rating trainees according to the stated expectations of performance (1, rarely meets expectations, to 5, consistently exceeds expectations) or the quality of the performance (1, poor, to 5, excellent).
The reliability of WBAs that use these traditional rating scale anchors is a concern.6–8 Reliability issues have been attributed to several sources of error, including distributional rater errors such as the leniency/severity error (“dove/hawk”) and range restriction error (central tendency), which is a failure to use the whole scale.9,10 Recently, various WBA assessment tools have been published that use the standard of competence, or independent performance, as their “top score” for their rating scale anchors.11–18 Assessment tools with these anchors have demonstrated increased reliability when compared with traditional anchors.11–18 These anchors have various names in the literature, including enstrustability, entrustment, and independence anchors. Although the specific wording of the behaviorally descriptive anchors vary, they are all conceptually based around the ordinal assessment of a progression to competence.19 In this article we will refer to them as entrustment anchors.
The Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) was developed to assess the performance of a trainee on a single surgical procedure.14 The rating scale developed for this tool uses entrustment anchors. Multiple sources of validity evidence, including high reliability, were demonstrated for this tool.14,15 Based on these results, the O-SCORE entrustment anchors were used with 2 other WBA tools: the Ottawa Clinic Assessment Tool (OCAT) and the Ontario Bronchoscopy Assessment Tool (OBAT).20–22 Given the success of these entrustment anchors, national assessment programs are using them for various WBA tools.23
It is important to note that, while WBA tools with entrustment anchors have seemed to improve on previous WBA tool anchors, they are not perfect. A dearth of evidence exists for how, when, and why entrustment anchors work or do not work. The objective of this qualitative study was to better understand the experience of residents and faculty with respect to traditional WBA anchors and O-SCORE entrustment anchors. This understanding is key to assisting medical educators in optimizing the use of entrustment anchors in WBA.
Methods
This study was conducted at a single Canadian institution where some residency training programs have adopted WBA tools using the O-SCORE entrustment anchors (O-SCORE, OCAT, OBAT, and a daily assessment tool in anesthesia that has not been published). These programs continue to use some WBA tools with traditional anchors as well. Other programs only use traditional WBA tools. We invited physicians involved in residency education, as well as residents (via the postgraduate office), to participate in 1-hour semistructured interviews exploring their experiences with WBA tool anchors (March–December 2017). We used constructivist grounded theory because it is well-suited to generate a social theory operating as a relevant explanation for a phenomena based on iterative analysis and generation of themes.24
All participants completed a short questionnaire to help establish whether they were from specialties using traditional anchors only (phase 1) or specialties using traditional and entrustment anchors (phase 2). Twelve participants agreed to an interview in phase 1 and 10 agreed in phase 2. The 22 participants were from 12 different specialties (see table 1 for participant demographics).
A research assistant trained in qualitative data collection techniques25 led participants through the phase 1 and phase 2 semistructured interview scripts (each with minor variations for resident versus faculty participants; provided as online supplemental material). Both phase 1 and phase 2 interviews began with a discussion of traditional anchors used in WBA tools (box 1, scales a&b), followed by a discussion of entrustment anchors (box 1, scale c). The traditional and entrustment anchors used for this study (box 1) have been used previously at this institution, so that all participants would be familiar with the 2 traditional anchor examples and phase 2 participants would have used WBA tools with these specific entrustment anchors (box 1).
box 1 Ratings Scales
Scale A (Traditional)
Rarely meets expectations
Inconsistently meets expectations
Meets expectations
Sometimes exceeds expectations
Consistently exceeds expectations
Scale B (Traditional)
Poor
Fair
Good
Very good
Excellent
Scale C (Entrustment)
Requires complete guidance: “I had to do.”
Able to perform but requires repeated direction: “I had to talk them through.”
Some independence but intermittent prompting required: “I had to direct them from time to time.”
Independent for most things but requires assistance for nuances: “I had to be there just in case.”
Complete independence: “I did not need to be there.”
The research team analyzed the data using constructivist grounded theory methodology.26 The study's primary investigator (N.D.) and a qualitative methodologist (A.M.) analyzed data with the support of a trained qualitative research assistant (R.A.). Our analytic process featured 6 steps: (1) complete independent reading of each transcript (N.D., A.M., R.A.); (2) open/line-by-line coding (N.D., A.M., R.A.); (3) thematic coding (N.D., A.M., R.A.); (4) iterative thematic review (N.D., A.M., R.A.); (5) peer debriefing (co-investigators W.G. and J.R.); and (6) producing findings (N.D., A.M., W.G., J.R.).27,28 Data were analyzed iteratively over the course of 1 year using NVivo 11 qualitative data analysis software.29 The team met repeatedly to discuss progressive analytic insights throughout the year.
Rigorous grounded theory requires a documented reflective process for data analysts.28 The 3 data analysts also captured reflective memos throughout the analytic process. One analyst (N.D.) is actively involved with research on assessment anchors. We frequently discussed her interpretive lens as an expert on entrustment-based assessment in medical education versus the other 2 analysts' experiences making entrustment-based decisions outside of a medical education context. We ceased recruitment when participants' reflections sufficiently informed our initial inquiry.
The Ottawa Health Science Network Research Ethics Board approved the study.
Results
Pragmatic Language
Participants expressed that entrustment anchors work because they use pragmatic language, which expresses clinically contextual judgments. Because entrustment anchors are grounded in clinical judgments, they are (1) concrete, making them more justifiable to trainees; (2) transparent, making explicit the link between clinical assessment and learning progress; and (3) align with training outcomes, enabling better feedback conversations (table 2).
All participants agreed that the vague language of traditional anchors challenged raters to interpret the true meaning of the scale, which in turn made it difficult to express their assessments accurately. On the other hand, entrustment anchors were described as more concrete, because they are anchored in clinical judgments (table 2). Phase 2 participants felt that the transparency of entrustment anchors led to WBA tools that were easier to understand as evaluations of actual clinical competence, which makes low scores more acceptable to trainees. These sentiments were echoed by faculty who expressed that they did not fear residents' reactions to receiving a lower score (table 2).
Phase 1 participants who had not been exposed to entrustment anchors felt that these anchors had the potential to provide more useful feedback to learners because they describe their current performance relative to the professional goal of independent practice. Phase 2 residents agreed and expressed that the feedback was in fact more useful because it is aligned with training outcomes. Phase 2 faculty also suggested that the conversations they were having with trainees based on these anchors were more productive, given that trainees accepted where they had been rated on the scale (table 2).
Contextual Concerns
Phase 1 and phase 2 participants expressed different opinions on the extent to which entrustment anchors would productively operate across 3 distinct assessment contexts: (1) procedural versus non-procedural; (2) junior versus senior trainees; and (3) direct versus indirect observation. Table 3 illustrates these varying perspectives.
A common sentiment among phase 1 participants was that entrustment anchors would work well in procedural contexts but not in other clinical settings. However, participants who had been using entrustment anchors in outpatient clinics reported that these anchors worked well. Faculty described how they operationalized the anchors in a clinic setting (table 3).
Another area of concern raised by the phase 1 participants was that junior residents would feel frustrated at consistently receiving lower scores, as they would not be expected to achieve “top scores” until later on in their training. They acknowledged that, while there would be some clinical activities that they would quickly become independent on, many activities would take more time to achieve scores on the higher end of the scale. They thought that residents would feel bad when they received a score on the low end of the scale. However, phase 2 residents expressed that because the lower scores made sense, they were not disheartening: “… It's preposterous that a first-year resident would be able to do all this independently” (phase 2 resident). Given that, they did not feel bad when receiving a score of 1 or 2. Notably, phase 2 participants commented that entrustment anchors represent a substantial change from their prior assessment experience where “low scores” had a negative connotation, but given the clarity afforded by the entrustment anchors, they readily adapted to receiving lower scores (table 3).
The question of whether entrustment anchors require more direct observation was raised by phase 1 participants with the majority indicating that “… to use [entrustment anchors] it's almost like you have to actually observe the whole interaction” (phase 1 faculty). However, phase 2 faculty did not seem to identify any difference with regard to the amount of direct observation required to use either type of anchor (table 3).
Unforeseen Gaps
When comparing entrustment anchors to traditional anchors, participants noted that traditional anchors provide information about how a trainee is progressing relative to their peers while entrustment anchors do not. Many residents in this study said they desire this type of information. Faculty in this study placed far less value on peer comparison and seemed to suggest that we should not be promoting this type of comparison (box 2).
box 2 Perspectives on the Role of Normative Data
… You don't want to be below your peers, I think that for me is more important than being above my peers… I want to be in line with everyone else moving forward, because if there's clear gaps… that's what I want to work on. So, yeah definitely that part is missing from entrustability scales. (phase 2 resident)
People say what's the typical PGY-2 supposed to be, well I don't think that there's anyone who's typical truthfully, so I think people have to stop worrying about that. (phase 2 faculty)
… You don't have a real sense of where you should be and whether or not you're doing well or not doing well. (phase 1 resident)
… If I feel like my evaluation was less than I expected, that would be an opportunity that I would probably speak to them and say, “I thought I was doing better, but is my performance appropriate for my level?” (phase 2 resident)
… In the last 6 months… you should be 4 and 5. If you're below that, then you're below where you should be… (phase 2 faculty)
In addition to lacking peer comparison anchors, participants also expressed that entrustment anchors do not provide residents with information about their expected rate of progress. Residents noted that this lack of information is a challenge. Residents and faculty with experience using these entrustment anchors identified that, while we likely should try to divert the culture away from peer comparison, there is value in knowing if they are progressing appropriately through their training. Residents described alternate approaches to obtaining this information (box 2). Some faculty suggested that they would provide context to their ratings by providing adjacent normative information (box 2).
Although the philosophy of entrustment is shifting trainees away from relativistic, individual assessment and toward competency-based assessment, resident participants expressed a persistent desire to know that they were progressing as expected. They felt that this information was provided when they were compared against a “typical” trainee with similar experience as them. This information is not provided in a WBA tool using entrustment anchors.
Discussion
This study provides valuable insight into why residents and faculty theorize that entrustment anchors are better than traditional anchors. The more pragmatic language of entrustment anchors makes them concrete, and therefore more easily justifiable, and more transparent, enabling use of the entire scale as both faculty and residents have a better understanding of how rating levels relate to clinical practice. The alignment of the language of entrustment anchors with training outcomes promotes better feedback conversations as it emphasizes progress toward independence over personal evaluation. These 3 key advantages may help explain the improved reliability of WBA tools that use entrustment anchors, as noted in the literature.11–18,20–22
Recent research supports our study participants' perceptions that entrustment anchors promote use of the entire scale and better feedback. In a study examining the impact of adding entrustment language to a rating scale, Dolan and colleagues noted that raters used the lower end of the scale, whereas they previously had noted a restriction to using only the “higher end” of the scale.30 This research also noted an increase in the written feedback once entrustment language was incorporated.30 There is a concern that resistance to giving and receiving low scores in medical education can lead to biased assessments (ie, use of only the higher end of the scale),31 potentially contributing to a “failure to fail.”32 It is important to find approaches to mitigate this impact. The combined results of the study by Dolan et al30 and our current study suggest that entrustment anchors promote use of the entire scale, and therefore, may offer a piece of the solution.
Given the improved reliability of WBA tools using entrustment anchors, many residency training programs are moving to adopt these tools. Our study provides insight into issues that faculty and residents who are unfamiliar with these new anchors may raise. In this study, there were concerns raised by participants without experience using entrustment anchors about the context in which these scales can be effectively used (procedural versus non-procedural, junior versus senior trainees, and direct versus indirect observation). However, those with experience provided contrasting perspectives for these contextual concerns. Our study results suggest that these concerns need to be acknowledged and addressed. Training may be helpful for both faculty and residents prior to the introduction of WBA tools using entrustment anchors to maximize their effectiveness.
While the majority of data from this study provides evidence to support the use of entrustment anchors, participants did express that a lack of normative data is a significant unforeseen gap. Residents with and without experience being assessed using WBA tools with entrustment anchors expressed a desire for peer comparison, which they felt was lost when using entrustment anchors. It is interesting to note that residents seem to believe that when they receive an assessment of “above expectations,” it is an accurate depiction of their performance, indicating they are progressing beyond their peers. Most faculty in the study questioned the reliability of assessments using these types of anchors, as their personal experience was consistent with the literature, which demonstrates a significant range restriction using only the high end of the scale.9,10,32
The majority of residents also expressed a desire to know if they were progressing appropriately. Residents exposed to entrustment anchors discussed alternate methods of receiving this information, such as asking faculty to comment as to whether or not their performance was appropriate for their level of experience. Faculty provided examples explaining why a learner received a certain score, but framed it for the learner with a verbal statement that reassured them that their performance was appropriate for their trainee level. Although competency-based medical education curricula decrease the emphasis on the time spent acquiring abilities, there is clearly still a time-based element to residency training. Residents in this study expressed a strong desire to receive the information that either their progress is appropriate, or that they should spend more time and effort on this milestone. Thus, it would seem imperative for programs using WBA tools with entrustment anchors to explicitly identify where and how residents will receive information about their overall rate of progression.
This study has limitations. The use of a single site and the assessment culture at this institution does not necessarily represent other institutions, limiting the generalizability of these findings. This study also focused on the specific anchors that were used at our institution; therefore, the results may not generalize to WBA tools that use another specific type of entrustment anchor such as milestone-based rating scales. Finally, the sample size was small.
This research presents an opportunity for further exploration into the experience of different types of entrustment anchors, such as milestone-based rating scales describing the specific performance required at each level on the way toward competent independent practice. Although the levels for these scales are often described in greater detail, the rater must still make a judgment that the trainee has performed at the particular level described. Our core themes may offer a framework for survey-based approaches aimed at larger sample populations to assist in deepening the understanding of how entrustment anchors are used by faculty and residents and for comparison studies at institutions using a different type of entrustment anchor.
Conclusion
Insights gained from this analysis contribute to a theoretical framework of benefits and challenges related to the adoption of entrustment anchors. A richer understanding of faculty and resident perspectives on entrustment anchors can assist WBA developers in creating more acceptable tools, including optimizing the language for entrustment anchor descriptors. Faculty and residents who have not used WBA tools expressed concerns that those exposed to the WBA tools do not have. As residents in this study expressed concerns regarding the lack of normative data for the expected rate of progression, there may be a need for training programs to ensure that this information continues to be provided.
References
Author notes
Editor's Note: The online version of this article contains the interview guides used in the study.
Funding: This study was funded by a Physicians' Services Foundation Inc Health Research Grant.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
Research was presented in part at the International Conference on Residency Education, Halifax, Nova Scotia, Canada, October 18–20, 2018.
The authors would like to thank Katherine Scowcroft for her excellent work on this project as a research assistant, Tanya Horsley and the Research Unit of the Royal College of Physicians and Surgeons of Canada for their early guidance, and Tyson Gofton for his thoughtful review and suggestions regarding the manuscript.