ABSTRACT
Residency programs are expected to educate residents in quality improvement (QI). Effective assessments are needed to ensure residents gain QI knowledge and skills. Limitations of current tools include poor interrater reliability and requirement for scorer training.
To provide evidence for the validity of the Assessment of Quality Improvement Knowledge and Skills (AQIKS), which is a new tool that provides a summative assessment of pediatrics residents' ability to recall QI concepts and apply them to a clinical scenario.
We conducted a quasi-experimental study to measure the AQIKS performance in 2 groups of pediatrics residents: postgraduate year (PGY) 2 residents who participated in a 1-year longitudinal QI curriculum, and a concurrent control group of PGY-1 residents who received no formal QI training. The curriculum included 20 hours of didactics and participation in a resident-led QI project. Three faculty members with clinical QI experience, who were not involved in the curriculum and received no additional training, scored the AQIKS.
Complete data were obtained for 30 of 37 residents (81%) in the intervention group, and 36 of 40 residents (90%) in the control group. After completing a QI curriculum, the intervention group's mean score was 40% higher than at baseline (P < .001), while the control group showed no improvement (P = .29). Interrater reliability was substantial (κ = 0.74).
The AQIKS detects an increase in QI knowledge and skills among pediatrics residents who participated in a QI curriculum, with better interrater reliability than currently available assessment tools.
Quality improvement (QI) skills are important for physicians, and their development is hampered by a dearth of reliable, easy to use, QI assessment tools.
The Assessment of Quality Improvement Knowledge and Skills (AQIKS) assesses residents' understanding and application of QI concepts.
Single site, single specialty study reduces generalizability.
The AQIKS detected increases in QI knowledge and skills in pediatrics residents, with improved interrater reliability over existing tools.
Introduction
The Institute of Medicine recommends that residents receive training in patient safety and quality,1 and the Accreditation Council for Graduate Medical Education has established expectations for quality improvement (QI) training in graduate medical education.2–4 Maintenance of certification requirements for practicing physicians includes ongoing development and assessment of QI skills.5 With these national efforts, reliable and useful tools for assessing trainees' QI skills and knowledge are needed.
Drawbacks of the existing approaches to QI assessment include reliance on self-reports,6,7 requirement for faculty training or expertise in QI,8,9 evaluation of only a limited subset of skills necessary to engage in QI,10,11 and limited validity evidence of instruments.12–15 Establishing strategies to measure QI skills and knowledge can help ensure that residency training programs prepare physicians to participate in and lead QI efforts. In pursuit of this goal, experts have called for more robust assessment strategies for QI curricula.16
The objective of this study was to provide validity evidence for the Assessment of Quality Improvement Knowledge and Skills (AQIKS), a tool that generates a summative assessment of residents' ability to recall QI concepts and apply them to a clinical scenario. We describe the AQIKS and its performance in assessing pediatrics trainees scored by junior faculty with limited experience in QI. We assessed the instrument's validity evidence in 3 domains: (1) content validity; (2) internal structure, measured by interrater reliability; and (3) impact of learner participation in a formal QI curriculum. AQIKS cases, questions, and scoring rubric are available at MedEdPORTAL.17
Methods
Instrument Development
The AQIKS cases and questions address the Institute of Medicine quality and safety aims—care should be timely, effective, efficient, equitable, and patient centered (STEEEP).1 The AQIKS was developed by a multidisciplinary team, including a survey methodologist and 2 pediatrics attending physicians with roles in clinical QI and education.
Glissmeyer et al15 previously described the development of pediatrics cases adapted from the QI Knowledge Assessment Tool (QIKAT). However, use of the QIKAT questions with pediatrics cases resulted in low interrater reliability, and did not discriminate well between learners with greater and lower QI knowledge.15 We designed a new question set that, together with cases developed by Glissmeyer et al,15 comprises the AQIKS. Using the “Model for Improvement” framework18 as a guide, we developed 9 questions, with each testing a unique concept or a skill central to the application of the model.18 Four questions are generally applicable to QI methods: testing learner conceptual understanding of Institute of Medicine quality aims (No. 1), aim statements (No. 2), key stakeholders (No. 6), and interpretation of a run chart (No. 9). Five questions are specific to the proposed QI intervention: test learner ability to generate a driver diagram (No. 3), describe a family of measures (No. 4), design a QI intervention (No. 5), test a QI intervention (No. 7), and develop a run chart (No. 8). All questions were pilot tested with 10 pediatrics residents.
Once the 9 AQIKS questions were selected, we developed a scoring rubric. The point total assigned to each question reflects the complexity of the concept or skill tested. The figure displays a sample question, scoring instructions, and sample responses with appropriate point assignments provided to scorers.
The AQIKS cases, questions, and scoring rubric were reviewed by a panel of 5 national QI and education experts (separate from the study team), who provided feedback to refine the instrument. The panel deemed that the final AQIKS instrument tests QI skills and knowledge used in the Model for Improvement Framework.
Instrument Testing
We conducted a quasi-experimental study using precurriculum and postcurriculum assessment of a QI curriculum taught in a large, urban, pediatrics residency program with clinical sites at a safety-net hospital and a quaternary hospital. The intervention group included 37 postgraduate year (PGY) 2 pediatrics residents participating in a longitudinal QI curriculum, and the concurrent control group included 40 PGY-1 pediatrics residents not exposed to a QI curriculum. Residents who participated in pilot testing were not included. Each participant completed the questions for 2 randomly selected cases of the 6 pediatrics case scenarios, before delivery of a QI curriculum to the intervention group, and 2 different randomly selected cases after delivery of the curriculum. No participant received any case more than once.
Three raters from different institutions and specialties (neonatology, infectious diseases, and general pediatrics) scored responses to the AQIKS. Raters were junior faculty members with 2 to 5 years of experience in clinical QI, who were not involved in delivering the QI curriculum, design of the AQIKS, or design of the study. Raters were instructed to score learners' responses to all 9 questions for each case according to the AQIKS scoring rubric. Raters received no additional training or scoring instructions, and were blinded to intervention or control and preintervention or postintervention status.
QI Curriculum
Residents in the intervention group participated in a 12-month longitudinal QI curriculum based on the Model for Improvement,18 including 20 hours of didactics and participation in a faculty-mentored group project. Over the academic year, each resident had approximately 20 hours of protected time away from clinical duties to work on a QI project with a group of 5 to 6 other residents. Projects included developing an electronic tablet–based asthma education module, improving emergency department handoff procedures, and decreasing outpatient clinic patient wait times. One group presented results from a project they conceptualized during this curriculum at a national conference,19 and another group received external grant support to expand the QI effort piloted during the curriculum.
The Institutional Review Board of Boston Children's Hospital approved this study and granted a waiver of informed consent.
Statistical Analyses
Statistical analyses included an analysis of internal structure, measured by interrater reliability in scoring and analyses of individual questions, and an analysis of overall test performance, including an analysis of the influence of completing a QI curriculum on AQIKS score over time.
Cohen's kappa measures interrater reliability, but is known to show paradoxically low kappa values if the marginal score distributions of raters are unbalanced.20 We measured interrater reliability using Brennan-Prediger's kappa, which is less influenced by unbalanced score distributions.21 The cutoff for acceptable interrater reliability was set at κ = 0.21, denoting at minimum “fair” interrater reliability.22,23
We used summary statistics to describe individual question performance among subjects who participated in a QI curriculum, compared to subjects who had not participated in a QI curriculum. We also used repeated measures linear mixed models to assess the efficacy of the intervention, for each of the 9 questions separately, and for the summary scores of the cases using the arithmetic mean of the scores of the 3 raters. We chose this method because of its flexibility to include a fixed effect to account for repeated measures within 1 group of trainees (preintervention versus postintervention assessment), as well as a fixed effect for membership in 1 of 2 groups (intervention versus control group). In addition, the models allowed for 2 random effects, 1 associated with the intercept for each subject and 1 with the intercept for the intervention. The covariance structure of the random effects was assumed to be independent. We calculated interitem correlations using Spearman rank correlation coefficients, with Bonferroni adjustment for multiple testing.
All analyses were performed using Stata version 12.1 (Stata Corp LP, College Station, TX). For all tests, P ≤ .05 was considered significant.
Results
In total, 30 of 37 residents (81%) in the intervention group and 36 of 40 (90%) in the control group completed the AQIKS at baseline and after the curriculum was delivered to the intervention group. The other residents were excluded because they did not complete the AQIKS either at baseline or at follow-up. All residents in the intervention group completed all required didactic elements of the QI curriculum and participated in a group QI effort.
Internal Structure: Interrater Reliability
Table 1 displays Brennan-Prediger's kappa values for interrater reliability of 3 independent raters for each question and the overall AQIKS score, a summation of individual point totals on each question. Interrater reliability was moderate or better for each question. For the overall AQIKS score, interrater reliability was substantial (κ = 0.74).
Individual Question and Overall Test Performance
Question performance is described in table 2. Few residents earned full points on any individual question. The intervention group had significantly higher scores after participating in the QI curriculum, both for the total score for a case (P < .001) and 8 of 9 questions (P value range from P < .001 to P = .046). Spearman rank correlation coefficients were low (range 0.009–0.37), suggesting that questions address different knowledge areas.
Relation to QI Curriculum Completion
Table 3 presents a comparison of baseline and postcurriculum mean AQIKS scores with 95% confidence intervals. There was no significant difference in baseline mean AQIKS scores between the intervention and control groups. The mean score of the intervention group increased by 42% after participating in the QI curriculum (P < .001). The control group had no difference in baseline and follow-up scores (P = .29).
Discussion
We found evidence for validity of the content and internal structure of the AQIKS, and evidence that AQIKS scores were higher in learners who had participated in a QI curriculum.
The AQIKS has several advantages compared to QI assessment tools currently in use. First, it tests ability to design a hypothetical QI intervention, drawing on skills and knowledge across multiple QI domains. This assessment strategy balances the need for an assessment to be rapidly administered in a training environment with the need for thorough assessment of skills expected after a learner leaves the training environment. Second, it performs well when scored by junior faculty raters with fewer than 5 years clinical QI experience, who have completed no training related to administering or scoring the assessment. Ease of administration without requirement for scorer training may facilitate use of the assessment tool in training programs where lack of faculty expertise is a barrier to QI education.24
Limitations of this study include that findings from this single specialty, single center study may not be generalizable to other groups of learners or other specialties. An additional limitation common to many written assessments of applied skills is that performance on assessment tools alone does not offer a comprehensive assessment of the outcomes of an education program. For QI education programs, other important outcomes include participation in QI initiatives after graduation and production of scholarly activity in QI. Areas for further study include application of the AQIKS questions and scoring rubric to cases relevant to other clinical disciplines (eg, QIKAT-R11 cases) and generalizability studies with different populations of learners and scorers. A larger, fully crossed, experimental study, where all cases are administered to each subject and rated by all raters, would facilitate the use of generalizability theory to evaluate the reliability of the AQIKS.
Conclusion
The AQIKS is a promising new tool with good discriminatory capacity and good interrater reliability. Its advantages include open-ended questions, adaptability to different clinical scenarios, and an assessment of a learner's ability to design a hypothetical clinical QI intervention as a proxy for real-world QI activities.
References
Author notes
Funding: Resident quality improvement efforts were supported by the Fred Lovejoy Research and Education Fund of the Boston Combined Residency Program and a grant from the Program for Patient Safety and Quality, Boston Children's Hospital. Dr. Doupnik was supported by a Ruth L. Kirschstein National Research Service Award institutional training grant (T32-HP010026-11) funded by the National Institutes of Health.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
Preliminary results from this study were presented at the Pediatric Academic Societies Meeting, Vancouver, British Columbia, Canada, May 3–6, 2014.