ABSTRACT
Residency programs and the Accreditation Council for Graduate Medical Education (ACGME) use survey data for the purpose of program evaluation. A priority for many programs is to improve resident wellness, often relying on self-reported surveys to drive interventions.
We tested for result differences on wellness surveys collected through varying survey methodology and identified potential causes for differences.
Aggregated results on the resident wellness scale for a single institution were compared when collected electronically through the ACGME Resident Survey immediately following the program evaluation survey for accreditation purposes and anonymously through an internal survey aimed at program improvement.
Across 18 residency programs, 293 of 404 (73%) residents responded to the internal survey, and 383 of 398 residents (96%) responded to the 2018 ACGME survey. There was a significant difference (P < .001, Cohen's d = 1.22) between the composite wellness score from our internal survey (3.69 ± 0.34) compared to its measurement through the ACGME (4.08 ± 0.30), indicating reports of more positive wellness on the national accreditation survey. ACGME results were also statistically more favorable for all 10 individual scale items compared to the internal results.
Potential causes for differences in wellness scores between internal and ACGME collected surveys include poor test-retest reliability, nonresponse bias, coaching responses, social desirability bias, different modes for data collection, and differences in survey response options. Triangulation of data through multiple methodologies and tools may be one approach to accurately gauge resident wellness.
Residency programs and the Accreditation Council for Graduate Medical Education use survey data to improve education and gain insight into resident wellness. A more clear understanding of the validity and reliability problems that result from different survey methodologies can help improve the interventions that result from such surveys.
A comparison of results from a single institution's ACGME Resident Survey wellness scale with those of an internal survey on resident wellness.
Both surveys were administered at a single institution, limiting generalizability.
Results from one institution's wellness scale collected as part of the ACGME Resident Survey and an internal survey on wellness varied considerably.
Introduction
Residency training programs rely on survey data to continuously improve medical education. Starting in 2004, the Accreditation Council for Graduate Medical Education (ACGME) annually distributed a web-based survey to accredited residency programs to systematically assess issues such as work hour compliance and adequacy of clinical supervision.1 For the 2018 distribution of the ACGME Resident Survey, coinciding with revised Section VI of the Common Program Requirements,2 and reflecting a growing concern about physician wellness, questions regarding resident wellness were added. Response data from these additional wellness items were provided to program and institutional leadership as a measure of resident wellness in their training programs and as a signaling mechanism for targeted intervention. As illustrated through the annual ACGME survey, graduate medical education uses surveys to assess values, beliefs, and perspectives of trainees. However, survey methodology is vulnerable to bias depending on a variety of methodological and psychological factors.3,4
Prior to the original 2004 distribution of the ACGME survey, a number of researchers surveyed residents to gain insights into wellness, duty hours, retention, and learner and faculty perspectives on residency training.5–8 More recently, many residency program sponsoring institutions have surveyed their residents' wellness, morale, burnout, and other related constructs.9 Existing evidence suggests resident burnout links to poor patient outcomes, including self-reported medical errors,10 self-reported suboptimal patient care,11 and changes in brain activity during clinical reasoning.12 Solutions to improving physician wellness include organizational-level (eg, work hour restrictions) and individual-level (eg, mindfulness training, stress management, small group discussions) interventions.13 However, it remains unclear when certain interventions are applicable, for which groups, and in what combination.13 Accurate institutional-level and program-level data serve as a foundation to design effective interventions to improve resident wellness.
The objective of this study is to determine if there are differences in responses on the ACGME Resident Survey as compared to an internally administered wellness survey to examine potential threats to validity and reliability that can affect survey responses.
Methods
Internal Survey
Between May 1 and June 30, 2018, residents across 18 residency programs were internally surveyed at Virginia Commonwealth University Health, a large academic medical center in Central Virginia, to assess facets of culture and context. One of the scales on the internal survey was the 10-item Resident Wellness Scale (RWS),14,15 which was also the scale used by ACGME to assess wellness in 2018. Respondents indicated frequency of positive indicators of wellness on a 5-point Likert scale over the past 3 weeks (1, never; 2, seldom [on internal survey] or 2, rarely [on ACGME survey]; 3, sometimes; 4, often; and 5, very often). Both internal and ACGME surveys included a 5-point Likert scale, but the ACGME survey altered a response option from 2 as “seldom” to 2 as “rarely” to remain consistent with other program evaluation item response options, whereas the internal survey kept the original response option as 2 equals seldom.
Residents were recruited to complete the voluntary internal survey via paper format during a regularly scheduled meeting (eg, didactic conference) primarily restricted to residents. A medical education researcher with no supervisory duties or oversight of trainees read aloud a script detailing key information (eg, anonymity in responses, improvement purpose for surveying, protections during data reporting to ensure confidentiality) to recruit participation. Residents submitted completed or blank paper surveys to a large box to allow anonymity in response and participation. Residents were also e-mailed an anonymous electronic Qualtrics (Qualtrics LLC, Provo, UT) survey link, allowing participation from absent members.
ACGME Resident Survey
Between January 15 and April 15, 2018, the ACGME opened a mandatory national survey to all residents, including those who participated in our internal survey, for the purpose of annual program evaluation. Residency training programs were scheduled by ACGME in a staggered manner for data collection within 5-week windows during this time period. Program directors (PDs) were responsible for e-mailing their residents during the 5 weeks that their survey was open for completion and securing a minimum 70% to 100% response rate depending on program size. PDs did not have access to the survey questions or individual resident responses. Residents had discretion over where they completed the ACGME survey.
Following the accreditation-based electronic survey questions on the ACGME Resident Survey, introductory text was displayed explaining that data from the wellness items would not be provided to any ACGME residency review committees and would not be used to make accreditation decisions. Responses to the RWS questions were mandatory to progress and complete the ACGME survey. Items 9 and 10 included a “not applicable” response option on the ACGME survey, which was not offered on the internal survey. The reason ACGME included a “not applicable” option for items 9 and 10 was because some residents may not have interacted with a patient or had a tragic work incident within the past 3 weeks. Responses of “not applicable” were not counted into mean calculations for ACGME scoring.
Two survey administrations allowed the opportunity to examine potential threats to validity and reliability that affect survey responses on resident wellness. Despite both surveys being anonymous, residents may judge wellness items differently when administered through an internal source rather than the ACGME. In addition, variations between the survey administrations may also influence findings and interpretations on the current state of resident wellness.
The Virginia Commonwealth University Institutional Review Board deemed our internal survey and our study to compare program-level data with ACGME data exempt from review.
Analysis
Internal survey data were aggregated to the program level to link with the ACGME RWS results. The results were reported in aggregate by each training program with raw counts for each response option for the 10 RWS items. Since program means were not directly reported, manual calculation was conducted to determine a mean score for each item as well as a composite score based on the number of respondents for each response option and the number of respondents to the entire survey. Scale reliability and principal components analysis was conducted using SPSS 25 (IBM Corp, Armonk, NY), as well as 1-sample t tests to detect differences between the internal and ACGME data for both composite scores and each of the 10 scale items. Differences between paper and electronic results with the internal survey were compared through the Mann-Whitney U test.
Results
Of 404 residents, 293 (73%) responded to the internal survey (table 1). The 2018 ACGME Resident Survey on wellness had a higher response rate (96%, 383 of 398). Unlike the ACGME survey, our internal survey included residents who were in a 1-year preliminary spot without categorical affiliation. The majority of internal survey responses were collected through paper format (82%, 240 of 293) compared to electronic (18%, 53 of 293). There was no significant difference in composite RWS scores across residencies between paper and electronic formats for the internal survey (P > .05).
For our internal survey RWS results, the Kaiser-Meyer-Olkin measure of sampling adequacy was 0.89, above the commonly recommended value of 0.60, the Bartlett's test of sphericity was significant (χ2(45) = 1347.33; P < .001; Cronbach's alpha = 0.88). Principal components analysis with varimax rotation results (table 2) suggested adequate construct validity evidence. Similar analysis was not possible with the reported program-level ACGME wellness data.
There was a significant difference between the composite score for wellness collected through our internal survey compared to its measurement through the ACGME survey (table 3). The overall composite score for RWS from ACGME was higher (4.08 ± 0.30), indicating more positive wellness on the national accreditation survey, compared to the overall composite score from our internal survey (3.69 ± 0.34). Resident wellness was also significantly more positive on the ACGME survey for all of the 10 individual RWS items compared to the internal survey results (table 3). Further inspection of effect sizes showed a range of medium to very large effect sizes, suggesting a remarkable magnitude of difference between ACGME and internal survey results (table 3).16 Mean composite RWS scores broken out by program (deidentified) also showed higher scores on ACGME compared to internal surveys for 15 of the 18 programs (table 4). For the ACGME survey, 8% to 53% of residents within a program chose “not applicable” for item 10, whereas 0% to 44% of residents within a program chose “not applicable” for item 9.
Discussion
We found large differences in wellness scores from 2 anonymous resident surveys using the RWS instrument with primarily the same subjects at a single institution; one survey was administered in winter/early spring and the other in late spring. The first administration of RWS was electronic through the ACGME Resident Survey, which produced significantly higher scores across all items in comparison with the second internally administered survey of RWS through paper and electronic formats.
Our study suggests a need for scrutiny when analyzing resident wellness data. Without a gold standard of measurement for wellness, it remains unknown which administration accurately represented the true state of wellness in our resident population. Graduate medical education leaders responsible for monitoring and addressing resident wellness should be aware of potential threats to validity and reliability of wellness data.
Factors that may explain our results include test-retest reliability, which refers to the degree in which results are stable when the same measurement tool is administered at 2 different time points with the same sample.17 There was a time lag between when the ACGME and internal surveys were collected, primarily to avoid survey fatigue. Therefore, one explanation for our findings may be due to poor test-retest reliability. The scale instructions also asked respondents to reflect on the last 3 weeks when responding to items. Our findings may be influenced by instability in wellness during each administration's study period. In addition, 3 weeks may not be an adequate length of time to study the construct of wellness, which may be situation-dependent with fluctuations over time.18 Unfortunately, time effects were difficult to address in our study since we were unable to link ACGME and internal survey data at the individual level. Within-person studies of resident wellness would help determine whether external factors and time influence results.
Nonresponse bias, the extent that nonrespondents are different from respondents with surveying efforts, may also explain study findings.19 Such a bias can be of concern when large portions of the population choose not to participate in surveying efforts; however, low response rates are not always indicative of nonresponse bias.19–21 Response rates only explain a small amount of variance in nonresponse bias findings, therefore direct measurement of nonresponse bias is advised (eg, interest-level or passive nonresponse analysis, wave analysis, benchmarking, replication).19–22 Nonresponse bias can also be assessed by comparing population and sample characteristics to detect representation.20 The ACGME survey did not provide respondent demographics, but respondents' training year data from our internal survey was similar to the distribution with our resident population (table 1). Nonresponse bias was also minimized with the internal survey because attendance during the data collection sessions was random, and absent members were able to complete the survey via electronic format.
Social desirability, a response bias, refers to the process by which survey responses are influenced by the tendency for respondents to present themselves in a favorable manner, unintentionally or intentionally adjusting answers to the perceived ideal or correct responses.23 This concept has been identified as a limitation of surveys and source of bias since the 1960s.24 Given that ACGME is an accrediting body, the responses to the wellness items may be particularly susceptible to social desirability bias as respondents attempt to represent themselves and their training programs in a favorable manner. Residents' responses may be perceived as a threat to the accreditation status of their respective programs because the wellness items were collected alongside the accreditation items. New validity evidence needs to be determined with each survey administration considering the potential for unique subjects, settings, and purpose. Even though residents were told the wellness items on the ACGME survey would not be used for accreditation, respondents may not have fully grasped this point. In comparison, the internal survey collected wellness data for the explicit purpose of program improvement.
In many programs, leadership meets with residents to help them understand the purpose of the ACGME survey. While “coaching” residents to respond favorably is not permitted, program directors have voiced concerns about the potential for ACGME survey items to be misinterpreted.25 Program leadership may, intentionally or unintentionally, exacerbate social desirability bias by discussing the potential consequences to accreditation status based on certain response patterns. By clarifying the intent for surveying wellness with residents for program improvement, leadership can more accurately gauge wellness among their trainees. Emphasizing the purpose and scope of the ACGME wellness items may also moderate social desirability bias and reframe a perceived threat to accreditation.
Finally, and most likely the most influential drivers of our findings, different formatting and modes for data collection and different response options can impact reliability and validity of survey findings. Psychologists and social scientists have long studied the impact that wording, formatting, and response options can have on findings through cognitive and communicative processes when responding to survey items.26 For example, optimizing refers to the series of complex cognitive steps enacted to respond to survey items: (1) Interpret the question and infer its intent; (2) Recall memories for relevant information; (3) Integrate recalled information to form a judgment; and (4) Translate the judgment by selecting a response option.27 Accordingly, cognitive judgements can be largely dependent on what is literally presented to the respondent in the survey. Different modes for data collection and administration (eg, paper, electronic, in-person, voluntary/mandatory participation) can also influence survey responses.3,4 While our results found no significant difference in composite RWS scores between our internal paper and electronic survey results, there is still the potential for such differences to influence results. There is also evidence that anchor wording and response choice can influence survey results.28 The internal and ACGME surveys had different response options for one of the scale anchors, and the ACGME survey added a “not applicable” response option for a few items, which could have influenced mean scores.29
Our results caution the reliance on single or limited sources of data on resident wellness. Cross-sectional survey designs provide a snapshot of resident perception, and surveys at other times of the year may produce differing results depending on various factors. Therefore, survey results are only one indicator of the health of a program. Areas of concern can receive additional clarification in follow-up efforts to collect data through anonymous or confidential methods. Program leadership can facilitate an open discussion with residents on wellness and an action plan for improvement based on multiple sources of data, including internal and ACGME survey results. A feedback loop of results with an action plan also signals an improvement-oriented reason for collecting such data compared to a passive act of annually collecting data on wellness without effort to improve the lives of trainees.
Based on our results, other institutions may want to triangulate their own internal data sources on wellness with ACGME results. Other applications of multi-source, multi-method data30 for improving wellness include correlation between program evaluation data with internal measures of the clinical learning environment31 and deconstructing external reports to determine group differences in wellness.32 Next key research steps include integration of qualitative methodologies (eg, cognitive interviewing, focus groups) to understand the motivations behind response patterns, interpretations of scale items, and psychometric stability of the RWS based on specialty type.33,34
Conclusions
Our study found large differences between responses on a wellness instrument collected through the ACGME compared to an internal survey, suggesting potential threats to validity and reliability in the measurement of resident wellness. Potential causes for differences include poor test-retest reliability, nonresponse bias, coaching responses, social desirability bias, different modes for data collection, and differences in survey response options.
References
Author notes
Funding: The authors report no external funding source for this study.
Competing Interests
Conflict of interest: Dr. Santen receives funding for evaluating Accelerating Change in Medicine Education from the American Medical Association. Mr. Yaghmour is a paid employee of the Accreditation Council for Graduate Medical Education.
The authors would like to thank the Virginia Commonwealth University Health's residency coordinators, program directors, and residents for participating and helping with our internal survey efforts.
This work was previously presented at the ACGME Educational Conference, Orlando, Florida, March 7–10, 2019.