Background 

Vague quantifiers used in the Accreditation Council for Graduate Medical Education–International (ACGME-I) resident survey are open to interpretation, raising concerns about the validity of survey scores. Residency programs may be unduly cited if survey responses are affected by differing judgments of vague quantifiers.

Objective 

Through investigating frequency judgment overlap, we assessed the validity of vague quantifiers by quantifying variation in residents' frequency judgment of the following response options: never, rarely, sometimes, very often, and extremely often.

Methods 

We conducted a cross-sectional survey of residents in 2 ACGME-I accredited institutions in Singapore. Participants assigned a frequency judgment to response options in 8 questions in the ACGME-I Resident Survey. Overlap in frequency judgment was computed using the minimum and maximum frequency judgment for each response option. This was ascertained to have occurred when the maximum frequency of the preceding category exceeded the minimum frequency of the downstream categories. The percentage of participants whose frequency judgment overlapped was computed.

Results 

Of 652 residents, 289 (44%) responded; after exclusions of incomplete and careless responses, 119 responses (18%) were included in the study. Frequency judgment overlap was more frequent for vague quantifiers that are adjacent, ranging from 11% to 50% for questions in faculty, evaluation, and resources domains. The percentage of frequency judgment overlap was greatest for duty hour questions, with an overlap between 21% and 47% for adjacent categories.

Conclusions 

Residents demonstrated wide variation in frequency judgment of vague quantifiers, especially on the duty hour questions in the ACGME-I resident survey.

What was known and gap

The Accreditation Council for Graduate Medical Education–International (ACGME-I) Resident Survey is used as an important data item in the accreditation process in nations that use the ACGME-I accreditation framework.

What is new

Quantification of overlapping responses to vague quantifiers suggested some differences in resident interpretations of adjacent response categories.

Limitations

Small sample size and low response rate reduce generalizability.

Bottom line

Residents' responses demonstrated variation in frequency judgment of vague quantifiers, especially for responses to the duty hour questions.

The Accreditation Council for Graduate Medical Education–International (ACGME-I) Resident Survey is an important monitoring tool for evaluating residency programs and making accreditation-related decisions. An annual survey that gathers perceptions of clinical education and the learning environment has been conducted in the United States since 2004.1  Since 2010, the survey has been administered in English in Singapore and 4 other countries that have adopted the ACGME-I accreditation framework.2,3  Studies looking at the reliability and validity of scores on the resident survey have yielded mixed results.47 

For questions related to frequency and occurrences in the survey, residents respond by selecting 1 of the 5 following options: never, rarely, sometimes, very often, and extremely often. These response options have been termed vague quantifiers, as they denote quantification but lack concrete numerical quantities. Vague quantifiers have been found to be subjected to a wide range of frequency judgments with considerable overlaps, especially in those that are semantically adjacent.4,5  Internal medicine program directors reported that resident survey terms are “vague/ambiguous/misinterpreted by residents,”5(p3) and indicated that the response option of sometimes can be problematic.5 

The resident survey is used by ACGME-I as a screening tool to assess compliance; the importance of this screening is increasing with Singapore residency programs' move toward a new accreditation system with annual data screening and less frequent site visits. Concerned about the effect that residents' varied frequency judgment may have on survey results, we aimed to quantify the variation in residents' frequency judgment for vague quantifier response options.

Study Setting and Data Collection

We conducted an anonymous, cross-sectional survey with residents enrolled full time in ACGME-I accredited residency programs as of March 1, 2014, at 2 sponsoring institutions in Singapore. All residents are proficient in English, and English is the lingua franca and medium of instruction within educational institutions in Singapore. E-mail invitations with an anonymous electronic link to the survey platform Qualtrics (Qualtrics LLC, Provo, UT) were sent to residents via their respective program coordinators between March and May 2014. When residents clicked on the link, they were directed to a participant information sheet explaining study aims and details, with a link to the survey. Consent to participate was implied if they proceeded with the survey.

Eight questions from the domains of educational content, faculty, duty hours, and resources were taken from the ACGME-I Resident Survey (table 1). Participants were instructed that never refers to 0% of the time, and to provide their frequency judgment of rarely, sometimes, very often, and extremely often by moving a slider between numerical values of 0 to 100 on the survey interface.

table 1

Comparison of Frequency Judgment Across Vague Quantifiers

Comparison of Frequency Judgment Across Vague Quantifiers
Comparison of Frequency Judgment Across Vague Quantifiers

The National University of Singapore Institutional Review Board reviewed this study and determined it to be exempt.

Data Analysis

Previous studies sought to understand validity through expert validation,810  interviews, and focus group discussions.4,5,7  By getting participants to provide numerical frequency judgments corresponding to the various vague quantifiers, this study presents another way of looking at the issue of validity through eliciting comprehension of the vague quantifiers. This is akin to a cognitive interview without additional probing.1012 

Data were first screened for logical consistency of the responses. Based on the phrasing of the study questions and option labels, it necessitates that the frequency judgment of rarely to be the smallest and the frequency judgment of extremely often to be the greatest. Participants who, for example, gave frequency judgments of rarely that were greater than sometimes, very often, and extremely often would be providing logically inconsistent responses. Logically inconsistent responses could be a result of inattentive or insufficient effort, and can be exacerbated by the anonymous nature of web-based surveys.13 

In this study, we assumed response patterns13  that did not show an incremental increase in the frequency judgment of adjacent vague quantifiers (consistent with the measurement of event occurrence from the lowest to the highest intensity) to be careless responses, and removed them from further analyses. Next, we tabulated minimum and maximum frequency for each of the vague quantifier response options (table 1). Frequency judgment overlap was ascertained to have occurred when the maximum frequency of the preceding vague quantifier exceeded the minimum frequency of the downstream vague quantifiers. For example, frequency judgment overlap occurs when the maximum frequency for sometimes is 70 but the minimum frequency for very often is 50. The overlap values between 50 and 70 could either mean sometimes or very often. The percentage of participants whose frequency judgment falls within these overlapped regions would constitute the percentage of overlap between 2 vague quantifiers. To calculate percentage of overlap, the number of participants in the overlapped regions for 2 vague quantifiers was divided by the total number of participants rating both quantifiers.

Intraclass correlation was calculated to determine whether participants were consistent in their frequency judgment of vague quantifiers across all questions. For instance, if they equated a frequency judgment of 15 as rarely across questions. Descriptive statistics and the figure illustrating frequency judgment overlap without outliers were also included to understand how outliers affect frequency judgment overlap. Outliers were identified as observations that were furthest away from the mean, and they were replaced by sample mean.

figure

Illustration of Frequency Judgment Overlap Across Vague Quantifiers

figure

Illustration of Frequency Judgment Overlap Across Vague Quantifiers

Close modal

Data were analyzed using RStudio 0.98.162 (RStudio, Boston, MA).

A total of 289 of 652 eligible participants (44%) responded to the study, and 186 (64%) completed all study questions. We excluded 67 (23%) participants due to careless responses. We included 119 surveys (18%) in the final analysis.

Of these 119, 66 (55%) were from medical residency programs, 28 (24%) were from surgical residency programs, and 13 (11%) were from all other residency programs. Twelve (10%) did not indicate their residency program. Sixty participants (50%) were in postgraduate year 1 (PGY-1) to PGY-3; 49 (41%) were in PGY-4 to PGY-6; and 10 (8%) were in PGY-7 and above. The final sample is comparable to the total population (652 residents) and the participants (289 residents) in terms of representation from the 2 sponsoring institutions as well as types of residency programs. In addition, the PGY breakdown was similar between the sample that responded to the survey and the final sample.

Table 1 summarizes the frequency judgment of the vague quantifiers for the 8 survey questions. There was a steady increase in the mean of frequency judgment from rarely to extremely often. The standard deviation tended to be smallest for rarely and larger for the other vague quantifiers. When compared across domains, the standard deviation for very often and extremely often was greater in the domains of resources and duty hours.

In general, frequency judgment overlap occurs at a higher percentage for vague quantifiers that are adjacent, with overlap between 38% and 82% for questions in the faculty, educational content, and resources domains (figure and provided as online supplemental material). Percentage of frequency judgment overlap was also higher for duty hours questions, with overlap between 58% and 95% for adjacent vague quantifiers. In contrast, the percentage of frequency judgment overlap was considerably lower for nonadjacent vague quantifiers. For instance, the overlap was between 1% and 49% for rarely and very often, between 1% and 16% for rarely and extremely often, and between 5% and 29% for sometimes and extremely often for questions in the faculty, educational content, and resources domains. Similarly, the percentage of frequency judgment overlap for nonadjacent vague quantifiers was also higher for duty hours questions—between 18% and 45% for rarely and very often, between 3% and 18% for rarely and extremely often, and between 53% and 56% for sometimes and extremely often.

After removing the outliers, frequency judgment overlaps between adjacent vague quantifiers, although reduced, were still substantial, with overlaps ranging from 21% to 72% for questions in the faculty, educational content, and resources domains (descriptive statistics and illustration are provided as online supplemental material). Good intraclass correlation within participants was found (table 2).

table 2

Intraclass Correlation of Participants' Frequency Judgment

Intraclass Correlation of Participants' Frequency Judgment
Intraclass Correlation of Participants' Frequency Judgment

Our results suggest that there was considerable frequency judgment overlap of adjacent vague quantifiers in the ACGME-I resident survey, attributable to participants perceiving adjacent vague quantifiers to be similar in meaning. Participants were consistent in their frequency judgment of the vague quantifiers as evident from the good intraclass correlation coefficient. Participants who gave a frequency judgment of 15 to rarely for question 1 were likely to give a similar frequency judgment for rarely in the other questions.

Disconcertingly, standard deviations and frequency judgment overlaps were greater for questions about duty hours. It is unclear why this is the case. One possibility could be the confusing and difficult phrasing of questions in the duty hour domain. Questions in this domain require participants to recall various frequencies and undertake various calculations in their heads, in contrast with questions from other domains where they are asked to recall instances. For instance, 1 question requires residents to think of instances when they break the duty hours rule, which presupposes that they are guilty of breaking the rule. Residents then have to recall these instances over a 4-week period. The effort it takes to process this question is likely to be a burden on the residents' working memory, with subsequent greater variability in their recall.14  This increase in cognitive effort, coupled with a wide range of interpretations for vague quantifiers, may have implications on interpreting the results of the questions in the ACGME-I Resident Survey, in particular with regard to duty hours violations. In our study, we found that the percentage of frequency judgment overlap was greater for rarely and sometimes (39.9% to 46.6%), rarely and very often (15.6% to 41.6%), and rarely and extremely often (1.7% to 15.9%) for questions in the duty hours domain. Residency programs would be flagged for noncompliance if a substantial number of residents answered sometimes, very often, or extremely often to these questions. The high percentage of frequency judgment overlap may result in the reported incidence of duty hours violation to be higher than the actual duty hours violation, which could lead to residency programs being unduly flagged for noncompliance.

Our findings are similar to those of other studies on frequency judgment. While the other studies set out to understand the average frequency for each vague quantifier, we went a step further to understand the percentage of frequency judgment overlap, which allowed us to quantify variation in residents' frequency judgment for vague quantifier response options in the ACGME-I Resident Survey.

Our study has limitations, including the low response rate and small final sample due to nonresponse and exclusion of careless responses, which reduces the generalizability and validity of the results. While the final sample is comparable to the population, we cannot rule out systematic differences in those who were included in the final sample and those who were not included. The substantial proportion of careless responses (23% of respondents) may suggest a larger problem with study fidelity. If careless responses were included in the analysis, it would increase the percentage of frequency judgment overlap as the majority of the careless respondents gave frequency judgment that were greatest for rarely and smallest for extremely often.

A larger follow-up study is needed to ascertain whether the phrasing of survey questions in the ACGME-I Resident Survey or the vague quantifiers themselves lead to variation in frequency judgment. Future studies could replace vague quantifiers with response options that are more specific, for example, less than once a week for rarely. This way, a reference period and actual numerical benchmark of event occurrence could be established.15 

In this study, residents were asked to give their frequency judgment of the vague quantifiers response options used in the ACGME-I Resident Survey. Considerable variation in residents' interpretation of frequency judgment was found, which could affect the validity of survey results.

1
Holt
KD,
Miller
RS.
The ACGME Resident Survey aggregate reports: an analysis and assessment of overall program compliance
.
J Grad Med Educ
.
2009
;
1
(
2
):
327
333
.
2
Huggan
PJ,
Samarasekara
DD,
Archuleta
S,
et al.
The successful, rapid transition to a new model of graduate medical education in Singapore
.
Acad Med
.
2012
;
87
(
9
):
1268
1273
.
3
ACGME International
.
Where we are
. ,
2017
.
4
Sticca
RP,
MacGregor
JM,
Szlabick
RE.
Is the Accreditation Council for Graduate Medical Education (ACGME) Resident/Fellow Survey a valid tool to assess general surgery residency programs compliance with work hours regulations?
J Surg Educ
.
2010
;
67
(
6
):
406
411
.
5
Adams
M,
Willett
LL,
Wahi-Gururaj
S,
et al.
Usefulness of the ACGME Resident Survey: a view from internal medicine program directors
.
Am J Med
.
2014
;
127
(
4
):
351
355
.
6
Holt
KD,
Miller
RS,
Philibert
I,
et al.
Residents' perspectives on the learning environment: data from the Accreditation Council for Graduate Medical Education Resident Survey
.
Acad Med
.
2010
;
85
(
3
):
512
518
.
7
Ibrahim
H,
Lindeman
B,
Matarelli
SA,
et al.
International residency program evaluation: assessing the reliability and initial validity of the ACGME-I Resident Survey in Abu Dhabi, United Arab Emirates
.
J Grad Med Educ
.
2014
;
6
(
3
):
517
520
.
8
Schwarz
N.
What respondents learn from questionnaires: the survey interview and the logic of conversation
.
Int Stat Rev
.
1995
;
63
:
153
168
.
9
Artino
AR
Jr,
La Rochelle
JS,
Dezee
KJ,
et al.
Developing questionnaires for educational research: AMEE Guide No. 87
.
Med Teach
.
2014
;
36
(
6
):
463
474
.
10
Bocklisch
F,
Bocklisch
SF,
Krems
JF.
Sometimes, often, and always: exploring the vague meanings of frequency expressions
.
Behav Res Methods
.
2012
;
44
(
1
):
144
157
.
11
Curran
PG.
Methods for the detection of carelessly invalid responses in survey data
.
J Exp Soc Psychol
.
2016
;
66
:
4
19
.
12
Sudman
S.
Mail surveys of reluctant professionals
.
Eval Rev
.
1985
;
9
(
3
):
349
360
.
13
Johnson
JA.
Ascertaining the validity of individual protocols from web-based personality inventories
.
J Res Personal
.
2005
;
39
(
1
):
103
129
.
14
Tourangeau
R,
Rips
LJ,
Rasinski
K.
The Psychology of Survey Response
.
Cambridge, UK
:
Cambridge University Press;
2000
.
15
Lietz
P.
Research into questionnaire design
.
Intl J Market Res
.
2010
;
52
(
2
):
249
272
.

Author notes

Funding: The authors report no external funding source for this study.

Competing Interests

Conflict of interest: Dr Archuleta is a member of the International Review Committee, Accreditation Council for Graduate Medical Education–International.

Results of this survey were presented as a poster at the Association for Medical Education in Europe Conference, Glasgow, Scotland, United Kingdom, September 5–9, 2015.

The authors would like to thank Loo May Eng and Kenneth Lim for their invaluable support for their reading of the manuscript.

Editor's Note: The online version of this article contains a table of postgraduate year breakdown between the sample that responded to the survey and the final sample, as well as descriptive statistics and illustration of frequency judgment overlaps across vague quantifiers.

Supplementary data