The flipped classroom model for didactic education has recently gained popularity in medical education; however, there is a paucity of performance data showing its effectiveness for knowledge gain in graduate medical education.
We assessed whether a flipped classroom module improves knowledge gain compared with a standard lecture.
We conducted a randomized crossover study in 3 emergency medicine residency programs. Participants were randomized to receive a 50-minute lecture from an expert educator on one subject and a flipped classroom module on the other. The flipped classroom included a 20-minute at-home video and 30 minutes of in-class case discussion. The 2 subjects addressed were headache and acute low back pain. A pretest, immediate posttest, and 90-day retention test were given for each subject.
Of 82 eligible residents, 73 completed both modules. For the low back pain module, mean test scores were not significantly different between the lecture and flipped classroom formats. For the headache module, there were significant differences in performance for a given test date between the flipped classroom and the lecture format. However, differences between groups were less than 1 of 10 examination items, making it difficult to assign educational importance to the differences.
In this crossover study comparing a single flipped classroom module with a standard lecture, we found mixed statistical results for performance measured by multiple-choice questions. As the differences were small, the flipped classroom and lecture were essentially equivalent.
The flipped classroom has attracted a good deal of attention, yet there is a lack of information on added learning benefits compared with a traditional lecture format.
A randomized crossover study in emergency medicine compared a 50-minute lecture from an expert educator and a flipped classroom module.
Single specialty study limits generalizability; a single lecture may be insufficient to detect differences.
Differences between the 2 formats were small; the flipped classroom and traditional lecture were essentially equivalent.
The flipped classroom model has shown benefits to knowledge gain in undergraduate education.1 In this model, lecture material is consumed at home, and in-class time is focused on application, simulation, case-based discussion, or problem solving. Medical education leaders have recently supported the flipped classroom,1,2 leading health professions schools to adopt this approach in preclinical, clinical, and graduate medical education.3–9
The theoretical benefits of a flipped classroom are rooted in social constructivism and active learning.10,11 Social collaboration enables modeling, scaffolding, and feedback that engage students' preconceptions and build on their existing understanding of a topic. This focuses learning toward Bloom's higher levels of analysis, synthesis, and evaluation.12,13
The Accreditation Council for Graduate Medical Education Review Committee for Emergency Medicine recently changed the weekly educational requirements to allow 1 hour of conference credit for asynchronous learning.14 This shift helped pave the way for emergency medicine residencies in the United States to adopt a flipped classroom approach. Though it has been widely embraced and shows theoretical promise in graduate medical education,15–17 data regarding knowledge gain performance remain sparse. The objective of this study was to explore the difference in knowledge gain performance between a flipped classroom and a traditional lecture module for emergency medicine trainees.
Setting and Sample
All emergency medicine residents at the University of California, San Francisco–Fresno; Los Angeles County/University of Southern California; and University of California, San Francisco/San Francisco General Hospital were eligible to participate. Each program has a half-day of protected didactic time once a week. Only those who could attend conference on the intervention day were included. The interventions occurred on a different day at each site in October 2014, November 2014, and January 2015.
We selected 2 commonly encountered emergency department chief complaints that are a part of the standard-setting American Board of Emergency Medicine Model of the Clinical Practice of Emergency Medicine: acute low back pain and acute headache.18 A group of experienced residency faculty (2 program directors and 2 assistant program directors) used a modified Delphi technique to develop learning objectives for each subject. The lecturer (S.P.S.), module developers (J.C., S.S., R.T., J.S.), and local facilitators had no prior knowledge of the assessment instruments.
Flipped Classroom Modules
This curriculum consisted of a 20-minute preparatory video and 30 minutes of in-class discussion for each subject. The videos were professionally recorded and edited by Hippo Education and consisted of a combination of audiovisual formats: (1) full screen of lecturer; (2) split screen with lecturer and PowerPoint (Microsoft, Redmond, WA) presentation; and (3) full screen of PowerPoint with audio voiceover. The videos were hosted on YouTube.com.
The in-class curriculum consisted of faculty-led, case-based discussions written by experienced program faculty (J.C., S.S., R.T., J.S.) with questions on clinical presentation, evaluation, diagnosis, management, and disposition. Faculty members led small groups of 5 to 12 learners through outlined discussions of common cases. Facilitators received no special training apart from a detailed instructor handout.
Standard Lecture Modules
Based on consensus objectives, an author (S.P.S.) prepared a 50-minute PowerPoint-based live lecture for each subject, recorded the short video lectures for the flipped classroom modules, and gave the live lectures at all 3 sites. The sessions included PowerPoint slides, audience interaction, humor, and the Socratic teaching method. To ensure standardization across sites, the lectures were based on the same PowerPoint presentation.
We randomized participants to receive the flipped classroom model for 1 subject and the standard lecture for the other. Each participant received an e-mail with a link to the 20-minute online video for their designated flipped classroom subject. The e-mail was sent 5 days and 1 to 3 days prior to the conference date.
On the day of the interventions, participants received either a standard lecture or the flipped classroom faculty-led discussion for the first subject. The following hour they crossed over and received the other format for the second subject (see figure). Pretests and immediate posttests were administered on paper on the day of the intervention, and 90-day retention tests were administered on paper or electronically several months later.
Validity Evidence for Interpretation of Assessment
Based on feasibility and current practice in formative evaluation of residents, 4 unique 10-item tests were developed: 1 pretest and 1 posttest for each of the 2 subjects. The same posttest was used for both the immediate and 90-day retention tests for each subject. The assessments were developed according to the Standards for Educational and Psychological Testing.19 To strengthen evidence of content validity, 2 authors (P.J. and D.J.) with extensive item-writing experience for emergency medicine board review courses developed 60 multiple-choice, National Board of Medical Examiners–style examination questions (30 for each subject). The item writers had access to and mapped their questions to the module objectives, but were blinded to the lecture content, video material, and discussion guides in order to minimize potential bias. We pilot tested the items with 17 second-, third-, and fourth-year residents, some of whom participated in the final study. Cronbach's alpha was used as a measure of internal consistency of each parallel test form to strengthen internal structure evidence.
From the 30 items for each subject, 2 parallel forms of 10-item examinations were assembled, with resulting alpha coefficients of 0.54 and 0.44 for the headache examinations and 0.86 and 0.80 for the back pain examinations, respectively. The 2 parallel forms possessed equivalent difficulty levels for headache (0.76 and 0.82) and for back pain (0.62 and 0.68) by averaging the individual item difficulty on each form. The validity of the 2 forms was established by matching the objectives covered and the diagnoses as described by the content experts who developed the items.
The Institutional Review Board at each site approved the study.
Measures and Analysis
The primary outcome, knowledge gain performance, was determined by change in test scores.
Mean scores for each module under each condition were collected at 3 times: preintervention, immediate postintervention, and at the 90- to 120-day postintervention. The immediate posttest and 90- to 120-day retention examinations were identical for each subject.
A repeated measure analysis of variance with a 2 × 3 design was used to assess the difference between the 2 teaching modalities in performance on the knowledge tests.
A prospective power analysis indicated that to detect a moderately high effect size (f = 0.3) with an 85% power at an alpha level of .05, there needed to be a total sample size of 70 subjects and 35 subjects in each study group (flipped versus lecture).
A total of 82 residents enrolled (participated in a module and completed a pretest). Only participants completing the pretest, posttest, and retention test were included in the final analysis (n = 73, 89%). Participant demographics are shown in table 1.
Descriptive information is displayed in table 2. For the back pain module, there was a significant effect of testing date (ie, preintervention, postintervention, or 90-day), but the interaction effect did not reach statistical significance. Overall mean test scores were significantly different for the pretest, immediate posttest, and 90-day test, but there were no significant differences in performance between the lecture and flipped classroom format.
For the headache module, both the effect of testing date and the interaction effect reached statistical significance (table 2). Overall mean test scores were significantly different for pretest, immediate posttest, and 90-day test administrations, and there were significant differences in performance for a given test date between the flipped classroom and lecture formats. Test performance steadily increased across the 3 testing dates for residents in the flipped classroom format. The difference was less than 1 of the 30 examination items.
In a crossover trial comparing a single flipped classroom module with a standard lecture for knowledge gain performance and retention, we found mixed statistical results that are essentially equivalent. Mean test score variation could only be attributed to differences in teaching method for the headache module. The differences between the groups on both modules were less than a single examination item, making it difficult to assign practical significance. At a minimum, performance in the flipped classroom was no worse than that in the standard lecture. This finding may be useful in light of other potential theoretical benefits of the flipped classroom.
Our study fits into a landscape of flipped classroom studies with mixed results. Studies in undergraduate education have shown a significant effect,2 while use of the flipped classroom in health professions education studies have shown mixed results.5,20–22 An extensive search of the PubMed and ERIC databases found no studies describing knowledge gain data in graduate medical education. Several possibilities exist to explain why the flipped classroom format we studied showed these results. A recent study found that the flipped classroom does not result in greater learning gains compared with a traditional classroom when both use an active learning, constructivist approach. Learning gains in either format may be a result of the active learning style of instruction rather than the order in which the instructor participated in the learning process.23
Most studies that demonstrate benefit involve implementation of the flipped classroom over an entire course rather than single days of instruction. It may be that the effect of the flipped classroom is too small to see on any given day, but there may be a cumulative effect to improve learning sustainability.24
It may be difficult to detect differences in knowledge gain performance with high-performing learners who have high academic achievement regardless of classroom design.25 Medical residents generally perform well on multiple-choice tests regardless of instructional conditions, thus obscuring and overruling most of the differences in effects of educational interventions.26
Our study has several limitations. While 10-question multiple-choice tests are common in graduate medical education to evaluate curricular goals and are a feasible way to rapidly test, the number of items on the assessment limits its reliability and generalizability. Our study was not powered for subgroup analysis, and it is unclear whether certain levels of trainees fared better in the flipped classroom modules. Comparisons between formats could be confounded by differences not related to the flipped classroom format itself. Though the same lecturer gave all live lectures based off identical PowerPoint slide decks, it is possible material was covered in varying depths at different sites. Although we had a significant dropout rate, almost 90% of enrolled participants completed the retention test, which is comparable to that of other published studies.27
There may be other benefits to a flipped classroom related to social interaction, self-regulation, and scheduling flexibility that we did not measure. Also important are studies that determine the parts of in-class discussion that are highest yield for different stages of learners, the ideal length and importance of the at-home videos, and the most effective design, features, and layout of the videos.28
In a crossover study comparing a single flipped classroom module with a standard lecture in emergency medicine programs, the differences found were small, and the flipped classroom and standard lecture were essentially equivalent.
Funding: This study was funded by a grant from the Western Group on Educational Affairs of the Association of American Medical Colleges.
Conflict of interest: Dr Jhun is a salaried content director for Hippo Education.
Preliminary findings of this study were presented as an abstract poster presentation at the Society for Academic Emergency Medicine Annual Meeting, San Diego, California, May 15, 2015, and at the Western Regional Society for Academic Emergency Medicine, Tucson, Arizona, March 28, 2015.
The authors would like to thank Svetlana Bagdasarov and Gregory Hendey, MD, for their invaluable support, and Joshua Jauregui, MD, and Jonathan Ilgen, MD, MCR, for their tremendous feedback on later versions of the manuscript.