ABSTRACT
In 2018, Canadian postgraduate emergency medicine (EM) programs began implementing a competency-based medical education (CBME) assessment program. Studies evaluating these programs have focused on broad outcomes using data from national bodies and lack data to support program-specific improvement.
We evaluated the implementation of a CBME assessment program within and across programs to identify successes and opportunities for improvement at the local and national levels.
Program-level data from the 2018 resident cohort were amalgamated and analyzed. The number of entrustable professional activity (EPA) assessments (overall and for each EPA) and the timing of resident promotion through program stages were compared between programs and to the guidelines provided by the national EM specialty committee. Total EPA observations from each program were correlated with the number of EM and pediatric EM rotations.
Data from 15 of 17 (88%) programs containing 9842 EPA observations from 68 of 77 (88%) EM residents in the 2018 cohort were analyzed. Average numbers of EPAs observed per resident in each program varied from 92.5 to 229.6, correlating with the number of blocks spent on EM and pediatric EM (r = 0.83, P < .001). Relative to the specialty committee's guidelines, residents were promoted later than expected (eg, one-third of residents had a 2-month delay to promotion from the first to second stage) and with fewer EPA observations than suggested.
There was demonstrable variation in EPA-based assessment numbers and promotion timelines between programs and with national guidelines.
What was known and gap
Studies evaluating competency-based medical education (CBME) assessment for postgraduate emergency medicine programs in Canada have focused on broad outcomes using data from national bodies and lack data to support program-specific improvement.
What is new
An evaluation of the implementation of a CBME assessment program within and across programs to identify successes and opportunities for improvement at the local and national levels.
Limitations
The study includes only the initial quantitative data for the first year of our implementation. The small sample size reduces generalizability.
Bottom line
Involving and engaging program-level educational leaders to collect and aggregate data can yield unique analytics that are useful to both local and national stakeholders and leaders.
Introduction
As competency-based medical education (CBME) is being implemented around the world,1 it is also being evaluated to quantify its impact and support its improvement. Evaluation studies published to date focus on broad outcomes using data from national bodies such as the Accreditation Council for Graduate Medical Education (ACGME)2–4 or emphasize the outcomes from local5–9 and regional10,11 implementation. While national analyses can inform the evolution of an overall assessment program, they provide insufficient data to support program-specific improvement.2–4 Conversely, local or regional initiatives reveal insights within their context, but it is unclear whether they represent a broader systemic challenge.2,10,11 Neither type of database is able to detect variability or fidelity of implementation7,12,13 across individual programs, an essential first step in evaluating higher-level educational and clinical outcomes.14 Regardless of the specialty, this is a problem that any program must face when implementing CBME.
Emergency medicine (EM) residency programs accredited by the Royal College of Physicians and Surgeons of Canada (RCPSC) officially implemented their CBME assessment program for the cohort of residents beginning postgraduate training in July 2018 (the 2018 cohort).15 This assessment program consists of 28 entrustable professional activities (EPAs) assessed on a 5-point entrustment scale16,17 that are organized sequentially into 4 stages (Transition to Discipline, Foundations of Discipline, Core of Discipline, and Transition to Practice) spread across 5 years of training (Table 1), all of which were predetermined centrally by the RCPSC EM specialty committee.15 The specialty committee also suggested a target number of assessments for each EPA. These targets were determined by the specialty committee members.18 While the EM CBME assessment program has a consistent design across sites, the roll-out of the program was site-specific.
List of Entrustable Professional Activities (EPAs) and Suggested Number of Observations for Each and Stage Length

We evaluated the short-term outcomes of the national implementation of this assessment program for Canadian RCPSC EM training programs through the creation of a specialty-specific database of program-level assessment data.14 This evaluation aimed to identify successes and opportunities for improvement at local and national levels, investigate the fidelity of implementation13,19 of the new program of assessment, evaluate the variability of implementation between training programs and the fidelity of the implementation relative to the national design, and present analyses that support the improvement of local programs and the national assessment program.
Methods
The RCPSC has directed the implementation of CBME20 sequentially by specialty in concert with national specialty committees.15 As required by the RCPSC for each specialty, the EM specialty committee was founded in the early 1980s when EM was established as a training program. It consists of an executive (chair, vice-chair), representatives from 5 geographic constituencies across Canada, and the program directors from all institutions.
As part of the CBME rollout, each program established a competency committee charged with making decisions regarding promotion between stages by aggregating, analyzing, and reviewing each residents' assessment data. The RCPSC competency committees are structurally similar to the clinical competency committees used by the ACGME.21–23 The methods the committees used to arrive at their decisions are idiosyncratic and locally derived.24
Enrollment of Programs
The program director or CBME faculty lead of each of the 14 Canadian institutions that host specialty EM residency programs was contacted and asked to participate. Representatives from 12 institutions overseeing 15 of the 17 programs agreed to participate. The 4 University of British Columbia's training sites were considered independent residency programs for the purpose of the analyses because their schedules differ, and their promotion decisions are conducted by independent competence committees.
Data Collection
Deidentified EPA assessment data was collected for residents who began residency in the 2018 cohort. We designed a 3-tab data extraction spreadsheet (provided as online supplemental material) to collect CBME data and relevant program characteristics from each program lead. The first tab contained the details of EPA observations (the number of observations of each EPA that occurred at each level of the 5-point entrustment scale16,17) from the included residents that were collected between July 1, 2018, and June 30, 2019. The second tab amalgamated data from the first tab into program-level metrics, including the total and mean (standard deviation [SD]) number of each EPA observed at each level of the entrustment scale. The third tab contained program characteristics, including the number of eligible residents in the 2018 cohort, the number of EM and pediatric EM training 4-week blocks within the first year, the number of shifts per EM training block, the number of residents in each stage of training as of the first day of each month (July 1, 2018, to July 1, 2019), and any additional information that each program lead felt was important to contextualize the data.
Ethics and Confidentiality
Our protocol was submitted to the Research Ethics Board at 12 institutions and deemed exempt by each as a program evaluation activity under article 2.5 of the national Tri-Council Policy Statement.25 All data were deidentified by home program, and only program-level data were analyzed. One contact (K.C.) extracted data from all 4 UBC programs.
Data Analysis
Stage-specific analyses and visualizations excluded the final stage of residency (Transition to Practice) because it contained minimal data. Descriptive statistics were calculated using Microsoft Excel 14.7.0 (Microsoft Corp, Albany, NY) and SPSS Statistics 25.0 (IBM Corp, Armonk, NY). Graphs were created using Microsoft Excel 16.0.1 (Microsoft Corp, Albany, NY). The relationship between the average number of EPA observations per resident within each program and the number of training blocks spent on EM and pediatric EM training blocks was evaluated with a Pearson's correlation.
Results
Descriptive Data on Program Sites
Data from 15 of 17 (88%) RCPSC EM programs containing 68 of the 77 (88%) residents in the 2018 cohort were analyzed. Combined, the residents received 9842 EPA observations in the study period. Table 2 outlines the characteristics of each of the programs, which demonstrated variability in the mean number of EM blocks (6.2, SD 1.5), pediatric EM blocks (1.4, SD 0.5), and shifts per EM block (16.0, SD 1.2).
Program-Level Data Analysis
Figure 1 demonstrates the variability in the average number of EPA observations across the 15 programs with a range of 92.5 to 229.5 EPA observations per resident. The variability in the average number of EPA observations completed within each stage is also represented within each bar of this Figure. The average (SD) values across the 15 programs were 45.6 (SD 8.7) Transition to Discipline EPAs observations per resident, 70.4 (SD 25.8) Foundations of Discipline EPA observations per resident, and 29 (SD 23.2) Core of Discipline EPA observations per resident.
Modified Stack Chart Demonstrating Average Number of EPA Observations per Resident Within Each Program (Total and Each Stage of Training)
Modified Stack Chart Demonstrating Average Number of EPA Observations per Resident Within Each Program (Total and Each Stage of Training)
Figure 2 is a stack chart representing the proportion of the 68-resident cohort in each stage of training on the first day of each month of the year. Although the specialty committee estimated that the Transition to Discipline stage would take approximately 3 months, one-third of residents were not promoted to the Foundations of Discipline stage for at least 5 months. Similarly, it was anticipated that the Foundations of Discipline stage would last until the end of the first year of residency, but over 60% of residents were not promoted to the Core of Discipline stage by the end of the year.
Stack Chart Demonstrating Percentage of First-Year Residents in Each Stage on First Day of Each Month (July 1, 2018–July 1, 2019)
Stack Chart Demonstrating Percentage of First-Year Residents in Each Stage on First Day of Each Month (July 1, 2018–July 1, 2019)
Aggregate Performance Analytic
Figure 3 outlines the average number of EPA observations per resident within each stage of training compared to the provided guidelines. All residents were promoted to the Foundations of Discipline stage, and the average number of observations of the Transition to Discipline EPAs was less than the number recommended by the specialty committee. The average number of EPA observations prior to promotion to the Core of Discipline could not be assessed as most residents did not enter this stage before the end of the data collection period.
Bar Chart Demonstrating Average Number of EPAs Observed per Resident After 1 Year of Assessment Relative to Targeted Number Required for Promotion to Next Stage
Note: Descriptions of each EPA are shown in Table 1.
Bar Chart Demonstrating Average Number of EPAs Observed per Resident After 1 Year of Assessment Relative to Targeted Number Required for Promotion to Next Stage
Note: Descriptions of each EPA are shown in Table 1.
As individual resident assessment data were not obtained, we were unable to report traditional learning curves for individual EPAs. In lieu of a learning curve, Figure 4 represents the relative difficulty of each of the EPAs by presenting the proportion of all assessments that were scored at each level of the 5-point entrustment scale (provided as online supplemental material). A small number (< 10%) of EPA observations within the Transition to Discipline and Foundations of Discipline stages were rated “I had to do” (1 of 5) or “I had to talk them through” (2 of 5). Most (> 60%) of the EPAs observed at this stage were rated as “I had to be there just in case” (4 of 5) or “I didn't need to be there” (5 of 5). Substantially less data were available for the Core of Discipline stage, but the pattern was similar.
Stack Chart Demonstrating Percentage of Observations of Each EPA Rated at Each Level of Entrustment on the Ottawa Score
Stack Chart Demonstrating Percentage of Observations of Each EPA Rated at Each Level of Entrustment on the Ottawa Score
Correlation Data
The number of EM and pediatric EM rotations within each program demonstrated a strong correlation (r = 0.83, P < .001), with the average number of EPAs observed per resident.
Discussion
This article describes the first Canadian dataset representative of the national CBME rollout in any RCPSC specialty. Key findings include a substantial variability in the number of EPA observations and promotion timelines across programs, the promotion of most residents prior to achieving the recommended number of EPA observations, few ratings at the low end of the entrustment scale, and a strong correlation between the average number of EPA observations per resident and time spent on EM rotations.
Our findings may inform individual program improvement and the modification of our national assessment framework. For example, local implementation leaders with lower-than-expected EPA observations may identify ways to increase observation frequency by seeking advice from other programs. Simultaneously, programs may identify practical obstacles that will inform modifications of national standards. Overall, the frequency with which individual EPAs are assessed will have important implications for the operational aspects of this new assessment program.
The variability that we have identified highlights the possibility that trainee experience is highly heterogeneous. There could be numerous explanations for this (eg, varying levels of engagement, differences in teaching skillsets, amount of faculty development, etc), but compared to the previous time-based model where this variability was largely an undocumented problem, this new system allows us to quantify trainee experiences and work toward greater standardization across programs.3,26 Because this article outlines a single year of data from a single specialty, it is a starting point from which to evolve the assessment program, rather than an inculcation of the lack of fidelity of implementation with CBME in general.
Our data collection approach was different than those described elsewhere,2–4,10,11 due to limitations in our ability to access the assessment data and the engagement of members of each programs' leadership in the research. Direct involvement of these key stakeholders in this process is likely to have focused our analysis on program-level metrics that are of relevance to them26–28 and increased buy-in in the program evaluation process.27–29 This will increase the likelihood that the results will be used by stakeholders as intended—to support the improvement of the participating programs.20,22
Our findings are also unique in that they incorporate unprocessed program-level assessment data (ie, EPA observation numbers and scores) and trainee progression data (ie, when trainees were promoted between levels). Previous literature from the ACGME utilized national data that was amalgamated from the reports of individual clinical competency committees after they had determined achievement for trainees.2–4 As demonstrated recently in a subset of EM programs in the United States, there are discrepancies between reported data regarding trainee promotion2 and the data acquired for local decision making.10,11 This may suggest that human judgement allows for better representation of performance, adjusting for local culture and nuances. We feel that by monitoring both sets of data in tandem, broader questions about idiosyncratic or systemic biases could be elucidated.
The collection of unprocessed data also demonstrated a substantial amount of program-level variation. While some variability in EPA numbers is expected given local contexts, a 2-fold difference in the number of EPAs observed suggests substantial heterogeneity. This may be due to local engagement with CBME, or other factors may be at play as well (eg, in our analysis on the number of EM rotations in the first year was a key factor). Additional variability may have also resulted from piloting the assessment program, previous use of a workplace-based assessment program (3 sites), an earlier rollout date of the assessment program (2 sites), and technical difficulties with various learning management systems (reported by several programs). The use of a modified 5-point entrustment score at the University of Toronto (provided as online supplemental material) may have impacted EPA observation metrics from that site.
Similar to the work of Conforti and colleagues,4 these early analyses may inform our specialty committee's evolution of our assessment program (eg, modify the EPA observation suggestions). However, with the additional context provided by seeing other programs' data and structural elements, this report may also inform local program-level reflections and changes to explore what program facets have positive or negative effects on EPA observations. For instance, data sharing and comparisons may help to identify successful local innovations that can be scaled nationally.
Our results raise additional questions. For example, there was a substantial delay in the promotion for many residents. While variability in promotion timelines is a feature of CBME,15,30 the observed degree of variability suggests that either the assessment program is identifying residents who are falling behind early, or, perhaps more likely, variability in competence committee practices or promotion standards are impacting the rate of resident progress at this early stage. Promotions occurred more often in September, December, March, and June, suggesting that the timing of competence committee meetings may have impacted resident promotion timelines. Notably, very few EPAs were scored at low levels of the entrustment scale. This could be due to leniency or range restriction by assessors,31 resident “gaming” of assessments to avoid low scores,32,33 excellent preparation of learners by undergraduate medical training programs, or the assessment culture.34,35
Limitations
Our study contained only the initial quantitative data for the first year of our implementation. Moreover, manual data extraction can be error prone despite the efforts taken to ensure that it was checked locally prior to compilation. We also anticipate that our relatively small sample size, advances in faculty development,36 and increasing comfort with the program of assessment may reduce the generalizability of our results over time. Another issue surrounded learning management systems: due to computer database interface issues 2 programs recorded ultrasound EPAs (Core of Discipline EPA 14), which were inaccessible to us at the time of this analysis. Inclusion of these items would have slightly increased the total number of core EPAs and EPAs per resident observed in these programs. Finally, 2 programs declined to participate—one due to philosophical differences surrounding data governance and another due to a transition in leadership (ie, no site lead was available to participate at the time of data collection). The total number of trainees within this group of non-participatory programs was low (n = 9, or 11.5% of the total number of trainees nationally), and we believe it is unlikely that it would change our analyses.
Next Steps
The collection and analysis of program- and national-level assessment data is an important first step in evaluating the impact of our assessment program on training. While the investigation of higher-order outcomes in the educational (eg, pursuit of fellowships, etc) and clinical (eg, clinical competence, attending practice metrics, etc) realms has been proposed,14 substantive variation in the fidelity of the implementation of CBME programs may make it difficult to attribute outcome differences to the assessment program.7,12 The defining of educationally important and measurable outcomes will be critical for establishing a robust plan for evaluating CBME systems and has been initiated in parallel to this work.14
Moving forward, we hope to analyze person-level and narrative data. Person-level data could allow the evaluation of systemic biases (eg, race or gender bias) in the assessment data, determine the number of promotion data points that competency committees use to promote trainees, or evaluate the effects of curricular differences on EPA observations. The narrative data generated from a national assessment system may offer additional insights.37–40 We anticipate that other specialties may utilize our data amalgamation methods to evaluate their own CBME assessment programs. Beyond program evaluation, the collected dataset could have significant research value, especially if linked to other datasets (eg, medical school training records, clinical outcome databases).41,42
Conclusions
In efforts to improve both program and national-level CBME assessment programs, we have shown that involving and engaging program-level educational leaders to collect and aggregate data can yield unique analytics that are useful to both local and national stakeholders and leaders. The findings in our evaluation study represent a new approach to integrating national and local program data to allow for improvement processes at both levels.
References
Author notes
Editor's Note: The online version of this article contains a data extraction spreadsheet completed by each of the 15 participating Canadian emergency medicine residency training programs and the O-SCORE and University of Toronto entrustment scales used to assess each entrustable professional activity.
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
This work was previously presented as part of a podium presentation at the International Conference of Residency Education, Ottawa, Canada, September 26–28, 2019.