Competency-based medical education (CBME) is changing the way we teach and assess residents, yet many educators feel ill-prepared to function in this new environment. The daunting task of developing entrustable professional activities (EPAs) and milestones is arguably surpassed only by our trepidation about the need for new assessment strategies.1 As a group of MD and PhD medical educators who develop local- and national-level examinations and teach a national-level assessment course, we have identified the need for basic principles to guide clinicians who have been tasked with developing assessment strategies in CBME. The 10 practical “assessment pearls” we offer in this perspective reflect an evidence-based foundation that should serve as a user-friendly guide to these clinician educators. A glossary of terms is provided in the box for readers new to the field.
Clinical Competency Committee (CCC): In the context of competency-based medical education, the CCC is a committee that includes members of the faculty who will use a combination of assessment data gathered from multiple sources to evaluate learners' progress and make high-stakes decisions.
Competence: An array of abilities that enables the trainee or physician to do all tasks of practice effectively and consistently. It is considered a complex construct.
Competency: An observable ability of a trainee or physician. Example: perform a complete and accurate physical examination.
Competency-Based Medical Education: An approach to educating physicians that is oriented to outcome abilities and organized around competencies. It de-emphasizes time-based training and promises more flexibility and learner-centeredness.16
Construct: An intangible collection of abstract concepts that are inferred from behavior.5 For example “clinical competence” and “professionalism” are constructs that may be of interest to assess, but are inferred from the trainee's behavior in the workplace.
Entrustable Professional Activity (EPA): Units of professional practice, defined as real-life tasks essential for a particular specialty and can be entrusted to a trainee once competence has been attained.3 Example of an EPA: “Manage care of patients with acute common diseases across multiple care settings.”17
Low-Stakes and High-Stakes Assessments: Low-stakes assessments have limited consequences for the trainee in terms of promotion, selection, or certification, whereas high-stakes assessments can have far-reaching consequences such as failure to become certified.11
Milestones: The expected ability of a trainee at a stage of expertise, as he or she moves from novice to expert. Example of neurology physical examination: level 1—performs complete neurological examination—to level 4—efficiently performs a relevant neurological examination accurately incorporating all additional appropriate maneuvers.18
Programmatic Approach to Assessment: The use of several assessment methods arranged longitudinally and constructed deliberately to optimize learning and assessment. The program would include several low-stakes assessment data points that are aggregated for higher-stakes pass/fail decisions.11
Rater Cognition: The mental processes that occur during scoring, at either a conscious or unconscious level.
1. All Assessments Are Samples
It is not possible to assess everything residents are expected to demonstrate, so we must deliberately sample representative knowledge and skills using a carefully constructed blueprint.2 A blueprint defines what is being assessed; it also provides evidence that the assessment strategy is valid by ensuring sufficient and appropriate sampling. For example, 1 EPA for internal medicine is “admit and manage a medical inpatient with a new acute problem on a medical floor.”3,4 Several competencies are required for a resident to complete this EPA: knowledge of basic science, clinical features and management strategies, as well as communication skills and the ability to perform an appropriately focused physical examination. For each competency, a sampling strategy is required. It might seem evident that one could not assess a resident on every type of patient who requires a focused physical examination, but one could sample deliberately—for example, based on the primary system involved, such as a patient presenting with a respiratory, cardiovascular, or musculoskeletal problem.
2. The Higher the Stakes, the More Samples Are Needed
All measurements have error, and the higher the stakes, the more assessment points (or samples) are required.5 For example, if the purpose of the assessment is to provide feedback to residents on the management of patients in ambulatory clinic, a single assessment would be appropriate. On the other hand, if the Clinical Competency Committee (CCC) wanted to make decisions on learner promotion (ie, the assessment is for higher-stakes pass/fail decisions), then using only 1 faculty assessment would not be defensible. The small sample of observed behavior is insufficient to make a meaningful interpretation of the resident's performance score for such an important decision. Defining the purpose (high/low stakes) of the assessment can help determine how many samples are needed.
3. Assess What Is Important, Not Just What Is Easy
Educators often default to assessing what is easy, rather than assessing what is important. We know that using a written examination to assess all aspects of clinical competence is not adequate. The intrinsic (nonmedical expert) roles in frameworks such as CanMEDS 20156 and the Accreditation Council for Graduate Medical Education (ACGME) competencies7 can be difficult to assess (eg, professionalism), yet they are important elements of physician competence. Newer tools have been developed to help meet these challenges, such as multi-source feedback for team skills and communication,8 narrative feedback for patient-centered care, communication, and professionalism,9 and the Ottawa Surgical Competency Operating Room Evaluation for operative competency.10 Clearly, medical educators have made progress on developing tools to assess different areas of competence, not just the easier competencies to assess such as medical knowledge (ACGME) or the CanMEDS medical expert role.
4. All Assessment Involves Judgment
Perfect objectivity and standardization is neither possible nor desirable. Considering that in order for a test score to be generated, a performance must be observed and then converted into a score, which then must be interpreted—every step involves judgment. At an individual level, judgment occurs when the preceptor (or rater) assesses a resident's performance in a particular context, and determines what feedback to provide. Rater cognition has received increased attention as a strength, where multiple raters provide different perspectives and have the potential to provide richer data about a resident's performance.11 On the other hand, excessive variability in ratings by different raters has raised concerns about reliability and validity.12 Training raters may improve assessment quality, but results are mixed.13 Judgment also occurs collectively when CCC members review a resident's entire portfolio. Judgment in that context will determine what elements of the portfolio should be provided to the committee for review, and how to weigh individual elements. For example, the committee may judge a professionalism item to be so egregious that it would override excellent knowledge assessments and delay promotion to the next stage. Determining the threshold or standard for promotion clearly requires collective judgments and the consensus of experts.11,14
5. Quantitative and Qualitative Methods Complement One Another
Quantitative data have traditionally been considered more desirable for their supposed objectivity, yet a limitation of numbers is that they do not provide specific data about how to improve. Narratives have been shown to capture elements of performance that an accumulation of numbers may mask15 ; further, in unstandardized situations—such as most workplace assessment situations—narrative provides much better data for feedback and learning.11
6. No Single Assessment Tool Can Capture All Aspects of Clinical Competence
Clinical competence is a complex construct necessitating a diverse set of assessment tools and strategies. To illustrate, competence involves knowledge, which may be best assessed with written examinations; clinical skills require direct observation, such as an objective structured clinical examination; managing patients on an inpatient unit involves use of workplace-based assessment tools, such as 360-degree feedback; and finally, assessing diagnostic reasoning skills may require multiple tools for assessment in the clinical setting using retrospective case analysis, and for rare events using computer-based case simulations.19
7. Feedback Is an Essential Element of Assessment
Assessment should inform learners how they are progressing toward becoming experts, and formative feedback is an essential part of that. Van der Vleuten and colleagues,11,20 in describing programmatic assessment, indicated that best assessment practice is not only about doing well enough to pass a unit of instruction, but also providing an opportunity for formative feedback that contributes to improved performance.
8. Assessment Drives Learning
Learners will “study to the test” whenever possible, focusing their study strategies on concepts they know will be examined. Test-enhanced learning involves learning as trainees prepare for the test, complete it, and then receive feedback. Individuals tested on material have improved recall, compared to those who simply studied the material.21 Similarly, assessment drives learning in the workplace: when learners see that all aspects of being a physician are being assessed, the importance of mastering those nonmedical expert competencies becomes apparent. Assessment strategies should therefore be designed with this in mind.
9. Validity Is the Most Important Characteristic of Assessment Data
Simply put, validity is the overall judgment of the degree to which theory and evidence support the interpretation of assessment scores for a specific purpose.5,22 If a resident scores perfectly on a multiple-choice examination of knowledge, can we conclude that the resident is ready to take first call for all consultations coming to internal medicine? Although 1 interpretation of this learner's multiple-choice question score is Yes, call ready, many argue that demonstrating knowledge is not enough. We also need to know the resident's abilities in history taking, physical examination, management, and procedural skills. Thus, the interpretation of the multiple-choice test score as proof of readiness to take first call is not valid: the evidence (knowledge testing) does not support the purpose (overall competence). This highlights an important concept: that there is no such thing as a valid or invalid test. Validity always refers to the appropriateness of inferences or judgments based on test scores for a specific purpose. It is beyond the scope of this article to elaborate further, but validity is a unitary concept and requires multiple sources of evidence to support or refute meaningful score interpretation.5
10. Perfect Assessment Is an Illusion
There are many criteria that are relevant to any assessment: validity, reproducibility, equivalence, feasibility, educational effect, catalytic effect, and acceptability.23,24 Ultimately, assessment always involves some degree of compromise. Consider a low-stakes assessment where the purpose is to provide residents with progress data and feedback. In this instance, feasibility (ease of administration), acceptability (for residents and faculty), education effect (facilitates feedback), and catalytic effect (provides results that enhance education) would all be considered important. If this was a high-stakes assessment with significant consequences, then reproducibility (statistically reliable test) and equivalence (every resident is tested in the same way) would be paramount to lead to defensible results.
We have outlined 10 assessment pearls that can help guide clinician educators and program directors tasked with transforming their programs of assessment to meet the new requirements of CBME. Important concepts in CBME will have direct implications for assessment: EPAs will demand more rigorous workplace assessments; milestones will necessitate ongoing direct observation and feedback as well as a continuous program of assessment; and CCCs will be challenged with determining how to integrate data for decision-making purposes. We hope these basic principles can serve as guideposts to ease transition during this exciting time in medical education.
References
Author notes
The authors would like to thank the Department of Innovation in Medical Education at the University of Ottawa for ongoing research assistant support.