ABSTRACT
In 2003, the Accreditation Council for Graduate Medical Education (ACGME) instituted requirements that limited the number of hours residents could spend on duty, and in 2011, it revised these requirements.
This study explored whether the implementation of the 2003 and 2011 duty hour limits was associated with a change in emergency medicine residents' performance on the American Board of Emergency Medicine (ABEM) Qualifying Examination (QE).
Beginning with the 1999 QE and ending with the 2014 QE, candidates for whom all training occurred without duty hour requirements (Group A), candidates under the first set of duty hour requirements (Group C), and candidates under the second set of duty hour requirements (Group E) were compared. Comparisons included mean scores and pass rates.
In Group A, 5690 candidates completed the examination, with a mean score of 82.8 and a 90.2% pass rate. In Group C, 8333 candidates had a mean score of 82.4 and a 90.5% pass rate. In Group E, there were 1269 candidates, with a mean score of 82.5 and an 89.4% pass rate. There was a small but statistically significant decrease in the mean scores (0.04, P < .001) after implementation of the first duty hour requirements, but this difference did not occur after implementation of the 2011 standards. There was no difference among pass rates for any of the study groups (χ2 = 1.68, P = .43).
We did not identify an association between the 2003 and 2011 ACGME duty hour requirements and performance of test takers on the ABEM QE.
Studies have found varying impacts of duty hour limits on graduates' performance on board certification examinations, a key educational outcomes measure.
A study explored whether emergency medicine graduates' performance on the American Board of Emergency Medicine qualifying examination changed after implementation of the 2003 and 2011 duty hour standards.
This was a single specialty study; other factors may have affected examination performance.
Neither set of duty hour standards resulted in a practical significant change in examination performance.
Introduction
On July 1, 2003, the Accreditation Council for Graduate Medical Education (ACGME) instituted new requirements that limited resident duty hours in all ACGME-accredited residency programs in the United States (table 1).
Proponents of these requirements stated they would enhance patient safety and improve the working conditions and education of resident physicians.1,2 Since their implementation, there have been many reports on the effects of these limits on patient care and resident education.2–7 A systematic review of the effects of the 2003 duty hour requirement on patient safety, resident well-being, and resident education found no impact on patient care or on resident well-being.8 The authors did find an unintended negative impact on resident education, including less time spent with attending physicians and decreased attendance at teaching sessions.8 There have been several studies examining the effect of the ACGME duty hour requirements on board certification examination performances.4,9–11 The findings have been inconsistent, with improved performance in some specialties, decline in others, and a third group that showed no impact. The relationship between the ACGME duty hour requirements and performance on the American Board of Emergency Medicine (ABEM) Qualifying Examination (QE) has not been examined.
On July 1, 2011, additional ACGME revised duty hour requirements were implemented for all US residency programs (table 1).12
Since the implementation of the 2011 ACGME duty hour requirements, there have been limited data published regarding their impact on educational or patient safety outcomes.
Emergency medicine (EM) established specialty-specific duty hour requirements in 1990, which were revised in 1995 and again in 2003.13 However, these duty hour restrictions only applied to EM rotations, not off-service rotations. Under the current ACGME EM program requirements, up to a maximum of 40% of an EM resident's training may take place off-service (ie, ob-gyn, medical intensive care, pediatric intensive care, etc). It is during these off-service rotations that the 2003 and 2011 ACGME duty hour requirements would have the greatest impact on EM resident education.
This study sought to identify any relationship between ABEM QE performance by graduates of ACGME-accredited EM residency programs and the 2003 and 2011 ACGME duty hour requirements.
Methods
This retrospective study used performance data for first-time candidates of the ABEM QE administered from 1999 through 2014. Data were accessible to the investigators only in deidentified, aggregate reports. A total of 5 groups were identified, only 3 of which were studied. Group A were test takers for whom their entire training preceded ACGME duty hour requirements; this involved QE results from 1999 to 2003 (provided as online supplemental material). Group C were test takers who had all of their training under the first set of duty hour requirements: QE results from 2006 to 2011 for EM 1–3 program graduates and 2007 to 2011 for EM 1–4 program graduates. Group E were test takers from EM 1–3 programs who had all of their training under the second set of duty hour requirements (2014). Groups B and D were not studied because of the variable amount of experience with periods of different duty hour standards.
Since the first administration, the ABEM has used a strictly standardized approach in QE test design, item (question) development, standard setting (the establishment of a passing score), administration, and scoring using a criterion reference. Beginning in 2004, the ABEM began to equate the examinations, which stabilizes most differences in year-to-year difficulty in the examination. Prior analysis has shown that the ABEM QE was a psychometrically stable examination prior to 2004, and has remained stable.14
Eligible participants included all graduates of ACGME-accredited EM residency programs taking the ABEM QE for the first time from 1999 through 2014. Repeat test takers were excluded to avoid potential practice effect. Only physicians who graduated from programs formatted as postgraduate years 1–3 (PGY-1 to PGY-3) and PGY-1 to PGY-4 were included, because these program types were consistently represented throughout the years studied. Primary outcome measures included the mean QE scores with 95% confidence interval (CI) and passing rates.
This study was granted a waiver for human subject research by the Eastern Virginia Medical School Institutional Review Board.
We performed 1-way analysis of variance (ANOVA) to compare mean scores. To further define any statistical difference among the study groups, Tukey's studentized range test was performed. For passing rates, a 2 × 3 chi square test was used. A pre hoc level of significance was determined to be ∝ < .01 for all analyses.
Results
For the period between 1999 and 2014 there were 26 753 test takers. Groups A through E consisted of 20 189 (75.5%) first-time test takers. For the groups studied, the numbers of first-time test takers were: Group A (5690), Group C (8333), and Group E (1269).
Group A had a mean score of 82.8 (95% CI 82.7–83.0), with a 90.2% pass rate; Group C had a mean score of 82.4 (95% CI 82.2–82.5), with a 90.5% pass rate; and Group E had a mean score of 82.5 (95% CI 82.1–82.8), with an 89.4% pass rate.
One-way ANOVA comparing mean scores among the 3 groups was significant (P ≤ .001). Tukey's studentized ranged test demonstrated a statistically significant difference between groups A and C (P < .001). The Tukey comparisons for groups A and E (P = .10) and for groups C and E (P = .83) were not significantly different. A 2 × 3 chi square test demonstrated no statistically significant difference in pass rates (P = .43).
Discussion
Our study demonstrated no significant difference in pass rates on the ABEM QE for first-time test takers before and after implementation of the 2003 ACGME study hour standards, despite a small but statistically significant difference in mean scores (−0.04) for the pre-2003 and post-2003 cohorts. While statistically significant, there was no practical difference in the context of a passing rate or examination score, since the difference was less than half a point on a 0 to 100 scale. There was no statistically significant difference in pass rates and mean scores on the ABEM QE for the cohorts before and after the 2011 ACGME duty hour requirements. However, the complete impact of the 2011 ACGME duty hour requirements may be unknown until the 2015 QE, the first examination for graduates fully trained under these duty hour requirements in a PGY-1 to PGY-4 program format.
The effect of the ACGME duty hour requirements on board certification performance has been variable across specialties. On the American Board of Internal Medicine board certification examination, there was no significant difference in scores before and after implementation of the 2003 ACGME duty hour standards.9 This is in contrast to the improving trends on the American Board of Orthopaedic Surgery, Part I (written) Certification Examination.11 Candidates taking the American Board of Urology QE demonstrated similar improvement,11 while performance on the American Board of Obstetrics and Gynecology written examination had a downward trend in the pass rate.11
Many factors may affect performance on board certification examinations, including the residency curriculum, clinical experience, innate ability, self-study, residency size, and individual motivation. All of the studies discussed above, including our own, were designed to look at associations and could not demonstrate a cause and effect relationship between the implementation of the ACGME duty hour standards and performance on board certification examinations.
This study only examined the results of the ABEM QE. The ABEM QE is a 305 single-best-answer, multiple-choice question examination designed to measure medical knowledge recall (approximately 33% of all items) and clinical synthesis and diagnostic reasoning (approximately 66% of all items).19,20 Between 10% and 15% of items include pictorial stimuli. The ABEM QE is a criterion-referenced examination; it does not use a performance curve or quota for passing or failing scores. The passing score is determined by ABEM directors who are informed by a modified Angoff standardized setting process. We did not examine performance on the ABEM Oral Certifying Examination in our study because not all boards require an oral examination. One potential reason EM did not see a significant change in performance on the ABEM QE could be specialty-specific duty hour limits that for the specialty predate the ACGME common standards (table 2). Compared with other specialties, EM training programs have had a longer experience under duty hour limits, and the most significant impact of the 2003 and 2011 ACGME standards for EM occurred in the off-service rotations. Finally, there could be important effects resulting from the duty hour requirements that are not captured on a multiple-choice question examination.
Self-Imposed, Specialty-Specific Duty Hour Restrictions by the Residency Review Committee for Emergency Medicine (RRC-EM)a

There are several limitations to this study. First, an analysis of means must be interpreted cautiously because the QE results were not equated prior to 2004. Equating is a psychometric process that accounts for any year-to-year variation in the intrinsic difficulty of an examination. The size of the study cohorts, the adherence to best practices for examination development, and the general physician performance trends over decades of test administration should offer some assurance that the use of non-equated examinations would not have greatly affected study results. Second, the cohort of first-time candidates following the 2011 ACGME duty hour requirements in our study includes only graduates of EM 1–3 programs, not graduates of 4-year programs. A post hoc internal analysis of EM 1–3 program graduates in groups A, C, and E demonstrated similar results. Third, the ABEM QE changed from paper and pencil to an electronic format in 2004. Also, it is not known to what degree individual programs complied with the ACGME duty hour requirements. Finally, other variables may have affected QE scores, such as the number of EM programs, the number of EM residents, changes to the Model of the Clinical Practice of Emergency Medicine, changes in residency program leadership, or other factors related to residency training.
Conclusion
In conclusion, we did not identify an association between the 2003 and 2011 ACGME duty hour requirements and the performance of first-time test takers on the ABEM QE.
References
Author notes
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
Editor's Note: The online version of this article contains a table of definitions of each group based on the year in which the qualifying examination was administered.