ABSTRACT
Point-of-care ultrasound (POCUS) is increasingly used in a number of medical specialties. To support competency-based POCUS education, workplace-based assessments are essential.
We developed a consensus-based assessment tool for POCUS skills and determined which items are critical for competence. We then performed standards setting to set cut scores for the tool.
Using a modified Delphi technique, 25 experts voted on 32 items over 3 rounds between August and December 2016. Consensus was defined as agreement by at least 80% of the experts. Twelve experts then performed 3 rounds of a standards setting procedure in March 2017 to establish cut scores.
Experts reached consensus for 31 items to include in the tool. Experts reached consensus that 16 of those items were critically important. A final cut score for the tool was established at 65.2% (SD 17.0%). Cut scores for critical items are significantly higher than those for noncritical items (76.5% ± SD 12.4% versus 53.1% ± SD 12.2%, P < .0001).
We reached consensus on a 31-item workplace-based assessment tool for identifying competence in POCUS. Of those items, 16 were considered critically important. Their importance is further supported by higher cut scores compared with noncritical items.
Point-of-care ultrasound (POCUS) is increasingly used in a number of medical specialties. Workplace-based assessments are essential, and there is a need to establish what checklist items are critical when assessing POCUS skills.
A consensus-based assessment tool for POCUS skills was developed.
The tool provides guidance on which assessment items are critically important; it does not specify to educators how a learner must successfully complete those items.
Consensus was reached on a 31-item workplace-based assessment tool for identifying competence in POCUS, with 16 items considered critically important.
Introduction
Point-of-care ultrasound (POCUS) is increasingly being integrated into patient care in many specialties, such as emergency medicine,1,2 critical care,3–5 anesthesiology,6–8 and internal medicine.9,10 To support competency-based education,11 training programs need to establish a programmatic approach to assessments.12 Recurrent workplace-based observations are essential to help trainees achieve competence and to support decision-making and judgments regarding their competence.13,14 To date, multiple assessment tools for POCUS skills have been published, with varying amounts of validity evidence to support the interpretation of scores.15–23 Assessment tools are primarily checklists, global rating scales, or a combination of both. While data suggested that reliability measures and sensitivity to expertise may be higher for global rating scales,24,25 in the hands of untrained raters, checklists may be easier and more intuitive to use.26,27 However, checklists risk “rewarding thoroughness,” allowing the successful completion of multiple trivial items while masking the commission of a single serious error.27–31 As such, there is a need to establish which checklist items are critical in POCUS, such that incompetent performances are appropriately identified.
This study sought to develop a consensus-based assessment tool for POCUS skills and to determine which items are critical for competence.
Methods
Assessment Tool Construction
Draft assessment items were collated by 2 authors (I.W.Y.M. and V.E.N.) based on a review of the relevant literature regarding directly observed POCUS assessments.16,19,32–40 Items were then grouped according to key domains (introduction/patient interactions, use of the ultrasound machine, choice of scans, image acquisition, image interpretation, and clinical integration). For each item, respondents were asked its importance for inclusion into a rating tool, and whether learners must successfully complete that item to be considered competent in POCUS (yes, critical; no, noncritical). Importance was rated using a 3-point Likert scale (1, marginal; 2, important; 3, essential to include). This draft survey was then reviewed by all coauthors for item relevance and completeness. It was subsequently piloted for survey content, clarity, and flow on 5 faculty members who taught POCUS in an educational setting (1 emergency physician, 1 general internist, 2 surgeons, and 1 anatomist) and 2 postgraduate year-5 internal medicine residents who had completed 1 month of POCUS training. This piloted survey became the instrument used in the first round of the consensus process.
Consensus Process
Between August and December 2016, using a modified Delphi technique,41 we conducted 3 rounds of an online survey to establish consensus from an expert panel of diverse POCUS specialists and sought their input on the draft assessment items identified in the prior construction stage. Specifically, we sought to achieve consensus on which of the items should be included in a POCUS assessment tool and which items should be considered critical.
The POCUS experts were identified using nonprobability convenience sampling based on international reputation and recruited via an e-mail invitation. Inclusion criteria included completion of at least 1 year of POCUS fellowship training and/or a minimum of 3 years of teaching POCUS.
Consensus to include was defined as 80% or more experts agreeing that an item was essential or important to include in the tool, and consensus to exclude was 80% or more agreeing that the item was marginal. Similarly, consensus for a critical item was defined as 80% or more agreeing that the item must be successfully completed to be considered competent. Items for which the experts had not reached consensus but had ≥ 70% agreement were readdressed in subsequent rounds in which items were rated in a binary fashion (yes, should include, versus no, should not include).
Standards Setting
To set cut scores for the tool to distinguish between POCUS performances that are competent from performances that are not competent, we invited 12 experts to attend a 3-hour standards setting meeting on March 6, 2017, either in person or via teleconferencing. For this meeting, ≥ 50% of these subject matter experts had to have been new (ie, did not participate in the initial expert panel).
At the start of the meeting, we oriented experts to the standards-setting task involved (modified, iterative Angoff method).42,43 Experts then discussed the behaviors of a borderline POCUS performance to establish a shared mental model of minimally competent performances, defined as those performed unsupervised and considered minimally acceptable. For each item, experts anonymously estimated the percentage of minimally competent POCUS learners who would complete the item successfully. In other words, on an item level, experts were asked to consider a group of 100 borderline learners and estimate how many would successfully complete the item. Experts were blinded to whether or not the item was previously determined by the consensus process to be critically important. Modification to the Angoff method included the use of an iterative process: items with large variances (SD ≥ 25%) were discussed and readdressed in subsequent rounds.44 We decided in advance that no more than 3 rounds of standards setting would be conducted. The final cut score for the entire tool was then derived from the mean of individual-item expert estimates. The final cut score for the critical items was derived from the mean critical-item expert estimates.
This study was approved by the University of Calgary Conjoint Health Research Ethics Board.
Statistical Analysis
Standard descriptive statistics were used in this study. Comparisons of measures between groups were performed using Student's t tests. A 2-sided P value of .05 or less was considered to indicate statistical significance. All analyses were conducted using SAS version 9.4 (SAS Institute Inc, Cary, NC).
Results
Of the 27 experts invited to the panel, 25 (93%) agreed to participate. Their baseline characteristics are presented in table 1.
Assessment Tool
All 25 experts (100%) completed round 1. Experts reached consensus for 31 of the 32 items (97%) for inclusion. The remaining item “Ensures machine charged when not in use” was readdressed in round 2.
The experts reached consensus for 14 of the 32 items (44%) in round 1 as being critically important. The group also reached consensus for 2 additional items as not being critical (“Ensures machine charged when not in use” and “Scans with efficiency of hand motion”). Experts did not reach consensus for critical importance on the remaining 16 of 32 items (50%).
Round 2 was completed by 24 of the experts (96%). For the item “Ensures machine charged when not in use,” only 10 of the 24 (42%) felt it should be included in the tool. That item was dropped and not considered further.
In round 2, consensus was achieved on the critical importance of 1 of the 15 items (7%) that the group had not reached consensus on in round 1–20 of the 24 experts (83%) would fail the learner who does not “appropriately clean the machine and transducers.” The 2 items that had ≥ 70% agreement for being critical (“Able to undertake appropriate next steps in the setting of unexpected or incidental findings” and “Explains procedure—explain ultrasound, its role, and images—where applicable”) were readdressed in round 3.
Round 3 was completed by 22 of the 25 experts (88%) who reached consensus on the item “Able to undertake appropriate next steps in the setting of unexpected or incidental findings” as being critically important (18 of 22, 82%). The group did not achieve consensus on the item “Explains procedure—explain ultrasound, its role, and images—where applicable” (16 of 22, 73%).
The final 31 items included into the assessment tool and the 16 determined to be critical are listed in table 2.
Standards Setting
Twelve experts participated in the standards-setting exercise (table 1). Of those, 6 (50%) served in the panel on tool construction.
In round 1, cut scores were established for 27 of the 31 items (87%). Four items with an SD ≥ 25% were discussed and readdressed in round 2 (“Washes hands,” “Appropriately enters patient identifier,” “Appropriately cleans machine and transducers,” “Able to ensure safety of transducers”). After discussion and rerating of those 4 items in round 2, only 1 item continued to have an SD ≥ 25% (“Able to ensure safety of transducers”). In round 3 postdiscussion, that item achieved an SD < 25% (mean 42.8% ± SD 24.1%).
Final cut score of the tool was established at 65.2% ± SD 17.0% (table 2). Cut scores for critical items were significantly higher than those for noncritical items (76.5% ± SD 12.4% versus 53.1% ± 12.2%, P < .0001). Cut scores for critical items were also significantly higher than the cut score for the full assessment tool (P = .022).
Discussion
In this study, using consensus group methods,45 our experts agreed on 31 items to be included in the workplace-based POCUS assessment tool. POCUS is a complex skill, involving image acquisition, image interpretation, and clinical integration of findings at the bedside.46 Our tool included items on those domains.16,46 In addition, it included items emphasizing the importance of appropriate patient interactions as part of POCUS competence,47 serving to articulate for educators the breadth of key tasks relevant to the assessment of bedside POCUS skills.
Of the 31 items on the tool, only 16 (52%) were felt to be critically important. Although critical items on clinical and procedural skills have previously been published,30,48–51 to our knowledge, they have not been established for general POCUS skills. Delineating what items are critical is important for POCUS for 2 reasons. First, POCUS is a relatively new skill. For general medicine, its role has only recently been officially recognized.9 Having few faculty trained in this skill continues to be the most significant barrier to curriculum implementation for general medicine.52,53 In Canada, only approximately 7% of internal medicine faculty54 and 30% of family medicine physicians are trained in POCUS.55 Without trained faculty, appropriate assessment of trainee skills is highly challenging. Critical items can help guide faculty development efforts by helping them better focus on key essential tasks, thereby more effectively managing rater workload56 and improving rater performance.57 Secondly, using key items in assessments may potentially result in higher diagnostic accuracy30,51 and superior reliability measures,58 training, and patient safety.29
In the era of competency-based medical education,11 mastery-based learning is associated with improved clinical outcomes.59,60 Achievement of minimum passing scores set by an expert panel is associated with superior skills and patient outcomes.61–63 While expert panel cut scores are commonly used for standards setting, others have argued that traditional standards-setting methods result in learners being able to miss a fixed percentage of assessment items, without attention to which items were being missed, resulting in patient safety concerns.29 We have noted similar concerns in procedural skills assessments in which learners may achieve very high checklist scores, despite having committed serious procedural errors.27,31 In our present study, we first established which items were considered critical by consensus group methods. We then applied standards-setting procedures to evaluate cut scores. Blinded to whether or not an item was considered critical, our expert panel's established cut scores for critical items were significantly higher than for noncritical items, suggesting those items may indeed be qualitatively different. Specifically, critical items dealt with key skills in image acquisition (items 7, 9, 14, and 16; table 2), interpretation (items 17, 20, 24, 25, and 26), and safe patient management, such as clinical integration (items 27, 28, 30, and 31), communication of findings (items 5 and 11), and infection control issues (item 12).
Our study has some limitations. While our tool provides guidance on which assessment items are critically important, it does not specify to educators how a learner must successfully complete that item. For example, the item “Attains minimal criteria” still requires that the faculty be able to recognize what images are of sufficient quality such that image interpretation is even possible. Therefore, faculty training will continue to be an important part of trainee assessments. Further, despite knowing which items are critical, at present, there is no clear guidance on how to assess those items. Three options have been proposed. From a patient safety perspective, many feel that learners should be required to successfully complete all critical items to be considered competent.64 However, while this approach is appealing from a patient safety perspective, it may result in greater consequences for the learner. Thus, the defensibility of that approach will require additional validity evidence data to support its use. For example, evidence demonstrating that raters can rate those items with high interrater reliability would be helpful.65 A second approach involves setting separate cut scores for critical items than for noncritical items (in the same manner as our present study).64 Finally, a third approach involves applying item weights,65 which may be challenging because experts may not agree on what weights to apply. Certainly, within our study, despite iterative discussions, the final variance on some items remained wide, suggesting disagreements among experts. Future studies should determine which of those 3 methods is superior in delineating competent performances from incompetent ones.
Conclusions
Our experts agreed on 31 items for inclusion in a workplace-based assessment tool for POCUS. Of those, 16 (52%) were felt to be critical in nature, with significantly higher cut scores than those for noncritical items. For determining competency in directly observed POCUS skills, faculty should pay particular attention to those items and ensure that they are completed successfully.
References
Author notes
Funding: This work was funded by a Medical Council of Canada Research in Clinical Assessment grant. The funding source had no role in the design or conduct of the study, its analyses, interpretation of the data, or decision to submit results.
Competing Interests
Conflict of interest: Dr Ma is funded as the John A. Buchanan Chair in General Internal Medicine at the University of Calgary. Dr Kirkpatrick has consulted for the Innovative Trauma Care and Acelity companies.
The authors would like to thank the Medical Council of Canada Research for funding this work, the experts who participated in this study, as well as Sydney Haubrich, BSc, Julie Babione, MSc, and the W21C for their assistance.