ABSTRACT
The objective was to compare the average number of mistakes made on multiple-choice (MCQ) and fill-in-the-blank (FIB) questions in anatomy lab exams.
The study was conducted retrospectively; every exam had both MCQs and FIBs. The study cohorts were divided into 3 tiers based on the number and percentage of mistakes in answering sheets: low (21–32, >40%), middle (11–20, 40%–20%), and high (1–9, <20%) tiers. The study used an independent 2-sample t test to compare the number of mistakes between MCQs and FIBs overall and per tier and a 1-way analysis of variance to compare the number of mistakes in both formats across the 3 tiers.
The results show that there was a significant difference in the number of mistakes between the 2 formats overall with more mistakes found on FIBs (p < .001). The number of mistakes made in the high and middle tiers had a statistical difference, being higher on MCQs (p < .001). There was no significant difference in the number of mistakes made in the low tier between formats (p > .05). Furthermore, the study found significant differences in the number of mistakes made on MCQs and FIBs across the 3 tiers, being highest in the low-tier group (p < .001).
There were fewer mistakes on the MCQ than the FIB format in exams. It also suggests that, in the low tier answering sheets, both formats could be used to identify students at academic risk who need more attention.
INTRODUCTION
Most of the time, students’ academic achievement on exams is used to evaluate their accumulation of knowledge and competencies.1 Medical education assessment is a crucial component and a big challenge. If it is well designed, it can aid in achieving the curriculum’s main objectives.2,3 Any medical school’s core competency has always been its testing methodology.4 A variety of long-established assessment techniques are employed by different instructors, such as true/false, multiple-choice questions (MCQs), matching, and fill in the blank (FIB).4 While evaluating all areas of medical education, instructors might not completely investigate whether the findings from 1 type of evaluation approach are comparable to those from other methods. Because of this, teachers frequently apply strategies that are familiar to them without realizing how much the method of assessment used by faculty may influence both the learning of students and the outcomes of their evaluations.
Students with different personalities may approach tests in various ways even though most students believe that some exam styles are “trickery” and they tend to favor the test format with specific choices, such as single-response MCQs, over FIB.5 Both MCQs and FIB are objective assessments. They are designed so that each question has only 1 correct answer.6 Both of these are typical assessment questions, but they can vary in several ways. MCQs typically offer a stem or question prompt and then a list of potential distractors.7,8 The examinees must choose the best response from the available options. MCQs are frequently used to evaluate a wide variety of knowledge and are effective for evaluating a sizable number of examinees in medical and chiropractic schools, for example, on the U.S. Medical Licensing Examination and by the National Board of Chiropractic Examiners. Additionally, they can be used to assess particular knowledge domains such as recognition, understanding, and recall. Contrarily, FIBs demand that examinees give a specific response. The test participants are frequently given a statement, a sentence, or a specific vocabulary item. FIBs are frequently used to evaluate more detailed information, such as recall or comprehension of particular topics.9,10
As evaluation techniques, MCQs and FIBs have been the subject of numerous studies, each of which produces a different set of findings. For instance, 1 study indicates that, when assessing for both recognition and recall of knowledge, MCQs are a more valid and reliable assessment instrument than FIBs.11 Researchers discovered that MCQs are less susceptible to guessing than FIBs and are more reliable and discriminating in terms of gauging student performance.11 However, a different study reveals that FIBs are superior to MCQs for assessing a student’s capacity to remember specific knowledge.12 The researchers discovered that FIBs offer a more accurate reflection of student knowledge and are more responsive to their level of understanding.
Studies comparing student performance using MCQs and FIBs also show mixed results depending on the specific context and the nature of the assessment. One published study compared student performance on MCQs and FIBs in an anatomy and physiology course and finds that students significantly preferred MCQs to other formats of tests.13 The researchers suggest that this may be because MCQs provide more cues to help students retrieve information from memory. It is worth noting that the effectiveness of different question formats can vary depending on various factors, such as the content being tested, the learning objectives, and the level of difficulty of the questions (eg, primary vs secondary questions, straight vs indirect answers). In some cases, FIB questions may be equally effective or even more effective than MCQs. For example, a study comparing the performance of medical students on MCO and other types of questions in a pharmacology course finds there was a lack of correlation, suggesting that the performance in 1 of the testing formats had a strong influence on the final course.14 Overall, the research suggests that the effectiveness of MCQs and FIBs in assessing student performance may depend on the specific context and the nature of the assessment. MCQs may be more effective for assessing the recall of factual knowledge, whereas FIBs may be more effective for assessing students’ ability to apply their knowledge in a real-world context. However, further research is needed to fully understand the comparative effectiveness of these assessment methods.
As in the other medical schools, chiropractic schools also adopt the same assessment methods to evaluate students’ level of learning proficiency, including MCQs and FIBs.2 There are not any studies specifically focused on the use of MCQs and FIBs in chiropractic education though there are some studies that examine the use of these assessment formats in other health professions education programs, such as a study that examines the effectiveness of MCQs and short-answer questions in a continuing medical education program for practicing physicians.15 Another study compares the effectiveness of MCQs and FIBs in assessing learning in a medical program and concludes that both MCQs and FIBs could be used in medical education, but FIB could be more reliable to reflect students’ real competence.16 Another study compares the effectiveness of MCQs and short-answer questions in assessing medical and dental anatomy knowledge and finds that the final exam scores in the anatomy course are correlated to the exam format and the student’s academic year.17 Unfortunately, there has been no research on the comparison of the impact of MCQs and FIBs on student achievement at chiropractic educational institutes. Therefore, the study of the impact of MCQs and FIBs on student academic performance is warranted because such a study could help determine which format is superior in terms of accuracy, efficiency, and student engagement by examining the relative benefits of each question type in evaluating student learning in different subjects. Such a study could offer perceptions into how the usage of various types of questions influences student performance and assist educators in making well-informed judgments regarding the question types they employ in their exams. Additionally, to the authors’ knowledge, there is no research to study the impact of MCQs and FIBs on students with different levels of performance, so the study of the impact of MCQs and FIBs on students with different academic learning abilities might pinpoint any elements that might favor 1 question type over the other and investigate ways to boost the effectiveness and efficiency of both question formats. Therefore, this study was designed to address such questions: whether test format affects students’ academic performance between MCQs and FIBs in a given subject and its influence on a different group of students based on the academic grading. We hypothesized that, overall and within groups of students with varying levels of learning ability, MCQ-format lab examinations would have a lower number of mistakes than FIB-format lab examinations. The aims of the study were to examine and compare the number of mistakes in MCQ and FIB formats in anatomy lab test answer sheets as well as variances in the numbers of mistakes in the MCQ and FIB format in anatomy lab test answer sheets between various grading groups.
METHODS
Research Design
This was an observational retrospective study in which data were collected from examination records of anatomy lab practicals. A waiver was granted by the institutional review board of Palmer College of Chiropractic due to minimal risk to participants or other justifiable reasons in accordance with ethical standards.
Study Group
A convenience sample of 218 anatomy laboratory exam answer sheets from classes of quarters 1 and 5 of Palmer College of Chiropractic Florida were obtained. Among them, 72 exam answer sheets were excluded from the study due to 0 number of mistakes, and the rest of the 146 were used for the study’s analysis. These exam answer sheets had several mistakes ranging from 1 to 32. To analyze the details on which group of exam sheets was more likely to be affected by a different exam format, the study cohorts were further divided into 3 tiers based on the number and percentage of mistakes: low (21–32, >40%), middle (11–20, 40%–20%), and high (1–9, <20%) tiers.
To avoid too much variation and for uniformity purposes, the following 2 major categories of variables, dependent and independent variations, were used. The dependent variation was the number of mistakes made in answering MCQs and the number of mistakes made in answering FIBs that were used to determine the study’s findings. The independent variation is grouping variation, such as low, middle, or high mistake ranges. The total number of MCQ and FIB mistakes across all tests as well as mistakes per tier were gathered and examined. They served as a basis for comparison. To identify any significant trends, the respective averages of the number of mistakes for MCQs and FIBs were compared. Along with measuring the total number of mistakes across the board, averages of the number of mistakes per tier were also calculated to determine if there was a consistent pattern of the number of mistakes throughout levels.
Data Collection
Exams from 2 distinct anatomy lab courses were used. One came from quarter 1 spinal anatomy and the other from quarter 5 thoracic anatomy. All anatomical test questions were presented in paper-and-pencil style, including 50% of MCQs and 50% of FIBs with an equal level of difficulty (such as only structural identifications and no secondary questions); hence, every exam answer paper included both formats. The total number of questions was 50. Both MCQs and FIB questions had only 1 correct key. Examinees did not have access to sample questions for either the test format or the content, and it was announced by the course director that wrong spellings would be counted as incorrect and there would be no reward for partially correct answers prior to the test. The lab practical was conducted in a rotational manner; each examinee accessed 1 question at a time with the cover sheet on top of the exam paper, so there was a minimal cheating opportunity. The lab exams were scored by the course instructor, who was experienced in the subjects without knowing the identity of the examinees. Therefore, the results of grading were reliable, objective, and comparable.
Data Analysis
To test our hypothesis, a 3-step analysis was used in this study. Step 1 of the analysis was to compare the total mean number of mistakes between MCQs and FIBs of all answer sheets by Student t test. Step 2 analysis was to compare the mean number of mistakes between MCQs and FIBs per tier by Student t test. Step 3 analysis was to compare the mean mistakes of MCQs and FIBs across all 3 tiers by 1-way analysis of variance. The statistical package used was SPSS (IBM, version 25). The number of answer sheets with mistakes (MCQ, FIB, or both) in 3 tiers was tallied to examine the distribution of mistakes in each tier.
RESULTS
The number of answer sheets of different tiers was uneven with the low tier having the fewest (25, 17%) and the high tier having the most (76, 52%), indicating that only 17% of examinees had inadequate results (less than 60% of all right answers).
The findings of the comparison of the number of mistakes between MCQs and FIBs obtained as a result of the analysis of the data through various statistical tests are included in Table 1. Except for the low tier, which lacks statistical significance, the data verified our hypothesis that there were fewer mistakes in MCQ format lab answer sheets than in FIB format lab answer sheets overall (p < .001) and in each of the high and middle tiers (p < .001). On MCQs, mistakes were expected to be explicitly misidentifications and no answers, whereas on FIBs, mistakes were either wrong identifications, no answer, incorrect vocabulary spellings, or word displacement. It is clear from the results that in the low-tier group, both the MCQ and FIB formats had similar numbers of mistakes (p < .05).
The Mean Number of Mistakes for Fill-in-the-Blank (FIB) and Multiple-Choice Questions (MCQs) in All Groups

The distributions of answer sheets with mistakes of MCQs, FIBs, or both formats in the 3 tiers are shown in Figure 1, showing that all answer sheets in the low tier contain mistakes of both formats, whereas in the answer sheets of the middle and high tiers, there are uneven distributions of mistakes of MCQs, FIBs, or both formats, being more on the MCQ format.
Distribution of number of answer sheets with mistakes in each tier. For the low tier, all 25 answer sheets contain mistakes of both multiple-choice questions and fill-in-the-blank formats. For the middle and high tiers, most of answer sheets contain fill-in-the-blank mistakes only, followed by both, and then multiple-choice questions.
Distribution of number of answer sheets with mistakes in each tier. For the low tier, all 25 answer sheets contain mistakes of both multiple-choice questions and fill-in-the-blank formats. For the middle and high tiers, most of answer sheets contain fill-in-the-blank mistakes only, followed by both, and then multiple-choice questions.
A 1-way analysis of variance was performed to compare the number of mistakes of 3 different tiers on exam answer sheets. A 1-way analysis of variance revealed that there was a statistically significant difference in mean number of mistakes of 2 formats between at least 2 tiers (F(2, 140) = [25.847], p < .001, Ꞃ2 = .8). Tukey’s honestly significant difference test for multiple comparisons found that the mean value of mistakes was significantly different between tiers 1 and 2, 1 and 3, and 2 and 3 (p < .001, 95% confidence interval, −14.48 to −0.92).
DISCUSSION
Up until now, there have been no specific studies that compare the use of MCQs and FIBs in chiropractic education. Therefore, this study serves as the first of its kind of study in chiropractic education. The findings of this investigation provide new knowledge to the body of existing literature.
One of the study’s findings is that there were uneven numbers of answer sheets among different tiers, particularly in the low tier, in which there were only 25 of them compared with 46 in the middle tier and 72 in the high tier. This result could be explained by the observational retrospective nature of the study, which examined existing data from a convenient sample; therefore, it was expected that, for a class, the majority of examinees mastered the learning materials well, and only a small percentage of examinees received unsatisfactory outcomes. This result signaled as closely as possible the examinees’ level of ability of mastery of course materials; that is, if examinees could not master the course materials well and could not prepare well, they would make more mistakes on their exams.
Some of the findings from this study support our hypothesis and are in agreement with other studies. Overall, there were significantly fewer mistakes on MCQs than on FIBs. One of the explanations for this is that the examinees’ test skill for answering MCQs may help in making fewer mistakes. Unlike answering FIBs, answering MQCs requires skills, such as the skill to use cues. For example, the examinees can apply exclusion to pinpoint the correct answer even if they do not know the correct answer as long as they know the wrong answers; therefore, it is legitimate to use guessing skills during the exam.18,19 That could help overall outcomes. Also, the examinees do not have to worry about word spellings or word displacement. On the other hand, FIBs require examinees to recall information from memory and provide an exact answer,20–23 which may make it more challenging to get the right answers. Also, our FIBs must consider terminology that some examinees may find more challenging. This could increase the likelihood that examinees would make more mistakes, such as forgetting the right answer or making spelling mistakes.
Although there are no specific studies that have examined whether students experience more stress when answering FIB questions compared to MCQs, it cannot be ruled out that stress could be 1 of the players that affect performance on answering FIBs due to the authors constantly receiving more questions regarding FIBs than MCQs. There was a study that suggested that chiropractic students had stress and test anxiety.24 Therefore, it is possible that some examinees may find that answering FIB questions is more stressful than answering MCQs. FIB questions typically require more deep memory for both structure and spelling than MCQs, which can be more challenging and stressful for some students. FIB questions often have no cues in lab examinations, and this can be more anxiety-provoking for some students who prefer cues and guidance in their exams. Also, FIB questions can be more time-consuming than MCQs, which can cause stress for some students who are pressed for time during an exam.
On the other hand, the results of this study suggest that the statistical difference in the number of mistakes made between MCQs and FIBs is not absolute and can vary depending on the specific characteristics of the abilities of the examinees. Our study finds that the numbers of mistakes for both MCQs and FIBs in the low tier were the highest, but there were no statistical differences between them. This suggests that the examinees in this group performed equally poorly no matter what type of question format was used. This finding is not in agreement with our hypothesis or with other studies.
Studies to compare MCQs with FIBs for students with various degrees of learning ability are scarce. One such study examined the performance of high- and low-ability students on MCQs and FIBs in a college-level psychology course and found that, whereas high-ability students performed equally well on both question types, low-ability students performed better on MCQs than on FIBs. The authors suggest that this may be because MCQs provide more cues and feedback, which can help low-ability students to better understand the material.25 The findings of our investigation, however, reveal differences. In our study, the findings show that, in the high-and middle-tier groups, there was a statistically significant difference in the number of mistakes between MCQs and FIBs, but overall, the number of mistakes was low. It could be attributable to various scoring standards as mentioned above; our scoring criteria penalized some nonessential mistakes, such as word placement and spelling as well as incorrect answers. If nonessential mistakes were eliminated, the results could be comparable. On the other hand, the findings from the low-tier group are in opposition to the abovementioned study: there was no statistically significant difference in the number of mistakes made on both MCQs and FIBs in low-tier answer sheets. The difference between studies could be due to different study designs. Instead of only 2 groups in the abovementioned study, this study included 3 tiers based on the number of mistakes. In fact, there were only 25 answer sheets in the low tier, and we believe that our classification of learning ability may be more accurate than in the abovementioned study, which had only 2 groups of learning ability.
There are several implications from the results of our study. First, the fact that the equal distribution and high number of mistakes on both MQCs and FIBs in the low tier suggests that the examinees performed less well on both formats. As a result, this could be used as a sign to suggest low competency of examinees; this could also be used as 1 of the indicators to identify those students who are at academic risk and need more attention, such as to provide necessary support to help them overcome challenges and achieve academic success. Second, it is worth discussing why spelling errors were counted as mistakes. Correct medical terminology is essential because it ensures clear communication between health care professionals, helps to avoid errors and misunderstandings, promotes patient safety, and helps to avoid potential legal issues. One study found that incorrect medical terminology can lead to miscommunication, which can result in errors and adverse events for patients.26 The study recommends that health care organizations take steps to improve the accuracy of medical terminologies, such as providing training and education for health care professionals. Furthermore, medical malpractice claims often involve errors in medical documentation, including incorrect terminology and spelling. A study found that errors in medical documentation were a contributing factor in more than 25% of malpractice claims.27 All this evidence supports the importance of using accurate medical terminology to avoid legal issues and promote patient safety in health care. As a part of training, the examinees were required to pay attention to spellings.
Limitations
This was an observational, retrospective study. We have data only from the 1st and 5th quarters of a 13-quarter curriculum; only 1 lab course was studied, therefore, limiting generalization. Also, we did not compare all forms of MCQs, such as those that have more than 1 correct choice.
CONCLUSION
The current study supports our hypothesis that the overall number of mistakes was significantly lower on MCQs than on FIBs. However, if the samples were further divided into 3 tiers based on the number of mistakes, it seems that the low tier is less affected by the exam format. This could indicate that the examinees in this group performed equally poorly on both types of question format. Further research in this area is, therefore, highly guaranteed in the realm of medical education.
FUNDING AND CONFLICTS OF INTEREST
This work was funded internally. The authors have no conflicts of interest to declare relevant to this work.
REFERENCES
Author notes
Xiaohua He (corresponding author) is a professor in the life sciences department at Palmer College of Chiropractic Florida (4777 City Center Pkwy., Port Orange, FL 32129; [email protected]). Niu Zhang is a professor in the life sciences department at Palmer College of Chiropractic Florida (4777 City Center Pkwy., Port Orange, FL 32129; [email protected])
Author Contributions Concept development: XH. Design: XH. Supervision: XH. Data collection/processing: XH, NZ. Analysis/interpretation: XH. Literature search: XH. Writing: XH. Critical review: XH, NZ.
This is an award-winning paper presented at the Chiropractic Educators Research Forum (CERF), December 3, 2022, conference, Rise of Faculty Scholars: Building Capacity for a Stronger Future. The CERF awards are funded in part by sponsorships from NCMIC, ChiroHealth USA, Activator Methods, Clinical Compass, World Federation of Chiropractic, and Brighthall. The contents are those of the author(s) and do not necessarily represent the official views of, nor an endorsement, by these sponsors. This paper was also selected for a 2023 National Board of Chiropractic Examiners Research Award at the Association of Chiropractic Colleges–Research Agenda Conference.