ABSTRACT
The fourth year of medical school has come under recent scrutiny for its lack of structure, cost- and time-effectiveness, and quality of education it provides. Some have advocated for increasing clinical burden in the fourth year, while others have suggested it be abolished.
To assess the relationship between fourth-year course load and success during internship.
We reviewed transcripts of 78 internal medicine interns from 2011–2013 and compared the number of intensive courses (defined as subinternships, intensive care, surgical clerkships, and emergency medicine rotations) with multi-source performance evaluations from the internship. We assessed relative risk (RR) and 95% confidence interval (CI) of achieving excellent scores according to the number of intensive courses taken, using generalized estimating equations, adjusting for demographics, US Medical Licensing Examination (USMLE) Step 1 board scores, and other measures of medical school performance.
For each additional intensive course taken, the RR of obtaining an excellent score per intensive course was 1.05 (95% CI 1.03–1.07, P < .001), whereas the RR per nonintensive course taken was 0.99 (95% CI 0.98–1.00, P = .03). An association of intensive course work with increased risk of excellent performance was seen across multiple clinical competencies, including medical knowledge (RR 1.08, 95% CI 1.04–1.11); patient care (RR 1.07, 95% CI 1.04–1.10); and practice-based learning (RR 1.05, 95% CI 1.03–1.09).
For this single institution's cohort of medical interns, increased exposure to intensive course work during the fourth year of medical school was associated with better clinical evaluations during internship.
There is interest in enhancing the relevance of the fourth year of medical school for residency and practice.
Intensive clinical course work taken during the fourth year was associated with enhanced excellent performance in the domains of medical knowledge, patient care, and practice-based learning.
Single specialty, single elite institution sample reduces generalizability.
Increased exposure to intensive clinical course work during the fourth year was associated with better clinical evaluations during internship.
Introduction
Many undergraduate medical education programs are redesigning their curriculum and assessment methods to meet a changing practice landscape.1 Educators largely have focused on the first 3 years of the traditional 4-year undergraduate curriculum to address concerns about the cost of medical education and declining interest in primary care.2,3 Some medical schools have eliminated the fourth year entirely, while others believe it is critical to professional development and, in support, cite declining board certification performance.4–6 Residency program directors also raise concerns that interns lack self-reflective skills, leading to underdeveloped professionalism, weak medical knowledge, and lack of preparedness to manage medical emergencies.7,8 To address these issues, some advocate for a more rigorous undergraduate experience.9
At the same time, little attention has been paid to the composition and quality of experiences during a fourth year of medical school, which represents the last opportunity to expand clinical skills and knowledge before learners become residents.10,11 The subinternship experience, considered the cornerstone of the fourth year, lacks national standards for content and assessment.2,11 For many medical schools, the medical subinternship is the only requisite course in the fourth-year curriculum. At a majority of schools, the remainder of the year is largely unstructured, with students choosing from a variety of clinical and nonclinical electives.12
We sought to assess the impact of the fourth year on clinical performance during internal medicine internship. We examined 2 consecutive classes of interns to determine how their fourth-year experiences, including the number and intensity of courses, related to their multi-source assessments of performance based on the Accreditation Council for Graduate Medical Education (ACGME) educational milestones.13
Methods
Design and Participants
We conducted a single center study of interns enrolled in the internal medicine residency program (IMRP) at Beth Israel Deaconess Medical Center (BIDMC) from 2011 until 2013. BIDMC is a 600-bed academic hospital; the IMRP has approximately 47 categorical and 13 preliminary interns each year.
Transcript Collection and Coding
The program receives final medical school transcripts for most interns. One author (N.D.) deidentified the transcripts and assigned identification numbers to protect anonymity prior to coding. The transcripts were coded and entered into a REDCap (Research Electronic Data Capture, Vanderbilt University, Nashville, TN) database. The authors a priori defined intensive clinical courses as experiences with a higher order of clinical responsibility and knowledge than the average course. These included subinternships of any variety, intensive care, and surgical and emergency medicine rotations. Research, less relevant patient care specialties (pathology and radiology), didactic courses, and language courses were defined as not clinically intensive. The authors reviewed all categorizations; the 3 instances of difficulty interpreting transcript information were resolved by consensus.
Multi-Source Assessment
The IMRP maintains assessments of all residents using New Innovations, a confidential online assessment tool. Evaluations are based on the ACGME's 6 competencies and milestones.13 After most clinical rotations, attendings, residents, fellows, medical students, and nurses evaluate interns using questionnaires specific for the rotation and evaluator.
Evaluations use a 5- or 9-point scale, depending on type and specific question (see online supplemental material for sample evaluations). Based on a strong ceiling effect of the total distribution of scores with clustering at maximal values, we defined “excellent scores” as an 8 or 9 on the 9-point scale and 5 on the 5-point scale. For robustness and ease of interpretation, we also established an outcome of a “poor score” on any individual item as 6 or less (9-point scale) and 3 or less (5-point scale).
Covariates
We collected demographic information (age, sex, race, and categorical versus preliminary status) and metrics of medical school performance (US Medical Licensing Examination [USMLE] Step 1 score, Alpha Omega Alpha [AOA] membership, and additional degrees) from residency applications. We a priori grouped medical student performance evaluations and medical school ranking into 3 categories based on previous department determinations.
This project was reviewed by the Committee on Clinical Investigations at BIDMC and was approved with exemption from full review.
Statistical Methods
To account for the multiple questionnaires within-intern and within-rater, we performed all analyses using generalized estimating equations, with the individual item as the unit of analysis. We estimated odds ratios (ORs) for the likelihood of the primary outcome, excellent scores, using binomial error structures, a log link, and an exchangeable correlation matrix, with hierarchical clustering by both intern and rater. For robustness, we estimated OR for the likelihood of poor scores similarly, using a logit rather than log link. In all cases, we constructed both models that only included the number of intensive and nonintensive courses and models that further adjusted for outlined covariates.
We categorized the intensity of course loads in multiple complementary ways by examining the proportion of time spent in intensive or nonintensive activities. We also assessed the number of courses taken, adjusting for intensive and nonintensive coursework. We examined the individual course type as described above, and treated the proportion and number of intensive courses as linear variables (tests of curvature using quadratic terms were not significant). We also present deciles for illustrative purposes.
Results
Demographics, Course Load, and Evaluations
Of 115 interns eligible for participation in the study, we were able to obtain 83 medical school transcripts, of which 5 were not interpretable and excluded. The demographics are summarized in table 1; 3 interns held additional degrees (PhD/MS). A summary of total completed fourth-year courses and breakdown of course type is found in table 2. A total of 69 641 individual points from 2350 completed evaluations were available, with a median of 30 evaluations per intern (range 19–56). Of these, 42 203 (61%) assessments met the criteria for excellent and 5724 (8%) for poor.
Note: Mean values with either percentage or standard deviations are listed above. The group's average Step 1 score was 243 (SD = 14) compared to the national average of 228.
Continuous variables by Wilcoxon rank sum test.
Categorical variables by Fischer's exact test.
Abbreviations: sub-I, subinternship; ICU, intensive care unit.
Note: The majority of interns completed at least 1 subinternship, but further intensive course work was less common; interns completed a median of 2 intensive courses (range 0–6). Categorical and preliminary interns did not differ in the number of intensive courses or medical subinternships they took.
Relative Risks and OR of Excellent and Poor Scores
When examined continuously, the relative risk (RR) of an excellent score per intensive course was 1.05 (95% CI 1.03–1.07, P < .001), while the corresponding RR per nonintensive course was 0.99 (95% CI 0.98–1.00, P = .03); these RRs differed significantly (P < .001). When adjusted for demographics, the RR of an excellent score was 1.05 (95% CI 1.03–1.08, P < .001) per intensive course and 1.00 (95% CI 0.98–1.01, P = .40) per nonintensive course; these again differed significantly from each other (P < .001).
A second analysis accounting for variable course lengths (median 4 weeks) assessed the relationship of percentage time in intensive courses with evaluations. The upper chart in the figure depicts the adjusted RR of obtaining an excellent score pursuing intensive course work, with the referent being the lowest decile of intensive course work (decile 1). To determine if the positive association of intensive course work with performance was driven by any of the individual components, we determined the adjusted RR of excellent evaluations based on individual course types. No single type of intensive course work accounted for our findings (table 3).
Adjusted for age, sex, minority status, dean's rank, medical school tier, intern year, categorical, and Step 1 score.
The positive influence of intensive course work was seen in all competencies except professionalism, and our measurement of global assessment independent of any of the ACGME Milestones (table 4).
Abbreviation: ACGME, Accreditation Council for Graduate Medical Education.
Adjusted for age, sex, minority status, dean's rank, medical school tier, intern year, categorical, Step 1 score, and number of intensive courses.
To determine the robustness of these associations, we performed a sensitivity analysis using poor scores. The unadjusted OR for obtaining a poor score per intensive course was 0.92 (95% CI 0.84–1.01, P = .07), whereas the OR per nonintensive course was 1.04 (95% CI 1.00–1.08, P = .04). These 2 ORs were significantly different from one another (P = .02); these differences were similar but were no longer statistically significant after adjustment for demographics (P = .12). Similarly, there was a persistent decrease in the OR of a poor evaluation with increasing time spent taking intensive course work (P < .001; figure).
We performed an additional sensitivity analysis of the 2350 evaluations; 532 (23%) were uniformly excellent. The RR of such an evaluation for each additional intensive course was 1.13 (95% CI 1.05–1.21, P = .001). The corresponding RR for each nonintensive course was 0.97 (95% CI 0.94–1.01, P = .01). These 2 estimates differed significantly from each other (P < .001).
Discussion
In this study of medical interns in 1 large residency program, the quantity and intensity of medical school courses taken in the fourth year had a small, but significant and dose-dependent, association with clinical performance during internship. This effect was seen across all ACGME competencies except professionalism, and persisted when corrected for potential confounders. The association of intensive courses with better performance differed significantly from the corresponding association with nonintensive courses and strengthens the plausibility that intensive course work has a measurable impact on intern performance.
The observed effect from intensive courses was robust and seen for all types of evaluations, for most ACGME clinical competencies, and for global performance, where the relationship was the strongest. The 1 exception was professionalism. Other studies suggest a fundamental difference between professionalism and other competencies, theorizing it is more difficult to teach, correct, and change over time.14–17
Our program uses robust assessment tools based on the ACGME core competencies that considers input from students to attending physicians. We analyzed nearly 70 000 points of assessment, which afforded the power to detect subtle differences in the performance of high-performing medical interns. Our data not only support the argument that the fourth year should be maintained, but also that its clinical intensity should be strengthened to produce “clinically ready” graduates.
We do not believe that nonintensive course work, as we have defined here, is without value. There is certainly benefit to research and nonclinical specialty exposure. In this regard, our performance measures may not capture the value of such courses, as we focused on intern clinical performance—which has the most direct relationship to courses—but not on long-term success or satisfaction. Nonetheless, medical students could reasonably be advised to take intensive courses during their fourth year to improve their clinical performance during internship.
Our study has limitations. We studied a single academic residency program, and results may not generalize to other programs. However, the interns in this study represented 46 different medical schools, enhancing our generalizability. In our analysis, poor scores were rarely given for intern performance. We observed a low median number of fourth-year courses taken by our intern classes. Without comparable information from other programs, we cannot necessarily extrapolate our results to the incremental value of intensive courses when students take a larger number of courses.
Interns at BIDMC are academically talented, with high levels of AOA membership and above-average USMLE Step 1 scores.18 This had the expected consequence of a ceiling effect in evaluations, minimizing the variability of performance within the cohort. This tends to reduce our ability to detect differences among interns, leading to a possible underestimate of benefit.
Another limitation of observational studies like ours is the ability to infer causality in the presence of confounding. Students with strong clinical backgrounds may disproportionately select demanding fourth-year programs. Although we controlled for several potential markers of performance (AOA, USMLE Step 1 score, and reputation of medical school), none of these factors combined or individually materially confounded our primary estimates of association. We were limited to these proxy markers of achievement; we did not have access to USMLE Step 2 scores, and the heterogeneity of medical school grading precluded the use of honors. Even more subjective concepts, such as medical student motivation, are not readily measured by any routinely used instrument. Ultimately, the only way to fully control for all forms of identified and unidentified bias would be to perform a randomized trial, which, in this setting, is unlikely to occur. Of note, our results do identify variables of potential importance for clinical training that would be helpful to program directors.
Conclusion
In a single institution's cohort of medical interns, the selection of clinically intensive course work during the fourth year of medical school had a small, but significant, dose-dependent, and wide-ranging impact on clinical evaluations of performance. This association was not influenced by other potential predictors of high performance, and was not matched by improved performance with less intensive courses.
References
Author notes
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: The authors declare they have no competing interests.
Editor's Note: The online version of this article contains sample questions taken from the evaluation tools.