Despite wide use, the value of formative exams remains unclear. We evaluated the possible benefits of formative assessments in a physical examination course at our chiropractic college.
Three hypotheses were examined: (1) Receiving formative quizzes (FQs) will increase summative exam (SX) scores, (2) writing FQ questions will further increase SE scores, and (3) FQs can predict SX scores. Hypotheses were tested across three separate iterations of the class.
The SX scores for the control group (Class 3) were significantly less than those of Classes 1 and 2, but writing quiz questions and taking FQs (Class 1) did not produce significantly higher SX scores than only taking FQs (Class 2). The FQ scores were significant predictors of SX scores, accounting for 52% of the SX score. Sex, age, academic degrees, and ethnicity were not significant copredictors.
Our results support the assertion that FQs can improve written SX performance, but students producing quiz questions didn't further increase SX scores. We concluded that nonthreatening FQs may be used to enhance student learning and suggest that they also may serve to identify students who, without additional remediation, will perform poorly on subsequent summative written exams.
INTRODUCTION
Teaching faculties are interested in factors that predict academic success and facilitate learning. American College Testing (ACT) scores, grade point average, and language familiarity are widely recognized predictors of subsequent academic performance.1–3 It also is appreciated that student achievement improves when faculty members provide postassessment feedback.4–6 Therefore, it has become common for faculty to supplement traditional summative exams with formative quizzes.7,8 Faculty members also use quizzes to predict exam scores,9 and web-based self-assessments have been used to improve knowledge and test performance.10 While summative exams evaluate student knowledge or task performance at the end of instructional segments, formative assessments provide students with feedback about how they are doing along the way.8 Students must view formative assessments as relatively “low-stake” or “nonthreatening” for them to be effective.8,11,12 Therefore, formative assessments usually are voluntary, with little or no impact on course credit.8,12 It is recommended that instructors should follow formative assessments with specific guidance and remedial instruction for students who demonstrate serious deficiencies in understanding, knowledge, or competence. Despite their wide use, the value of formative exams remains unclear. Haberyan,7 and Brothen and Wambach13 reported that formative assessments did not enhance overall learning outcomes, while Kibble,11 Olson and McDonald,12 and Buchanan14 reported significant improvements. Although this issue has been examined in dental12 and medical15 curricula, we did not find similar studies in chiropractic education programs.
Therefore, we decided to evaluate the possible benefits of formative assessments in a physical examination course at our chiropractic college. This course is offered in the 3rd quarter of a 13-quarter program. We considered that student participation in developing multiple-choice quiz questions used in the formative assessments would be a valuable learning exercise. Therefore, we hypothesized: (1) The use of formative quizzes during the course would increase summative exam scores, (2) student participation in writing formative quiz questions (student generated quiz questions [SGQQs]) would produce additional increases in summative exam scores, and (3) formative quiz scores would predict subsequent summative exam performance.
METHODS
Student Participants
The institutional review board of Palmer College of Chiropractic granted this educational method study an exemption from formal review. Permission was obtained from all students to use de-identified performance assessments for this study and subsequent publications.
A total of 189 3rd quarter students participated in the study across 3 separate iterations of a 3-credit physical examination Class that addressed head and neck examination procedures and related health conditions (March 2012–March 2013). Students in Class 1 created a bank of multiple choice SGQQs that subsequently were administered as 8 formative quizzes (FQs) in Class 1 and Class 2 (Table 1). The students in Class 2 were given FQs, but did not contribute to the quiz question bank. In addition, students in Class 1 and Class 2 were given formative quiz reviews (FQRs) 1 week before each of the summative exams (SXs) that were administered at the midpoint and end of the course, respectively (Table 1). Each of the FQRs reviewed the 4 FQs that preceded them and were intended to prepare the students for the SX that was to be administered the following week. While the SGQQs covered the material that would be tested in SXs, the SXs did not contain exact replicates of SGQQs. Class 3 students served as a control, taking SXs, but not producing SGQQs or taking FQs and FQRs.
Course materials, with the exception of FQs and FQRs, were equivalent for all 3 classes. In addition, the same instructor taught the 3 classes, taking care to cover the same material with equivalent class-time allocation. Demographic data (sex, age, academic degrees, and ethnicity) also were collected for the 3 classes. As shown in Table 1, FQs and FQRs were given at weeks 1–4 and 6–9, while SX1 and SX2 were administered at weeks 5 and 10, respectively.
Formative Quiz Question Bank
Class 1 students developed a bank of multiple-choice quiz questions addressing head and neck conditions, and physical exam procedures. These SGQQs were required to adhere to 2 guidelines: (1) all questions were required to have 4 multiple-choice options with only 1 correct answer and (2) questions were to be drawn from the course-required textbook or notes. Each Class 1 student was asked to create 1 question for each of the 8 topic areas covered in the course (head, neck, ear, eye, nose, mouth, cranial nerves, and cerebellum). All SGQQs were reviewed carefully by the course instructor and validated by a knowledgeable faculty member not involved in teaching the course. Inaccurate or irrelevant questions either were modified or discarded. After evaluation and validation, all submitted SGQQs were subgrouped according to the 8 course topics to produce a question bank for the FQs.
Formative Quizzes and Summative Exams
A total of 8 FQs was administered to Class 1 and Class 2 over the 8 consecutive academic weeks of the term in which each class was enrolled (1 quiz per week, Table 1). Each quiz consisted of 6 to 26 SGQQs selected from the question bank and pertained to the topic covered in the given course week. Class 1 and Class 2 students were allowed 1 minute for each question. The FQs were administered during lab and answers were announced after students submitted their answer sheets to the instructor. Students in Class 1 and Class 2 also were given FQRs in the week before each of the two SXs (Table 1). For these reviews, students completed an online summary quiz composed of the preceding 4 FQs via Blackboard Learn (Blackboard Inc, Indianapolis, IN). This allowed students easy off-campus access. Hardcopy answer sheets were submitted in lab, after which the instructor gave open feedback concerning the correct answers. Summative exams were administered the following week to all 3 classes during lecture hours.
Data Analysis
Data were summarized and analyzed using SPSS version 22 (IBM Corporation, Armonk, NY). Statistical test assumptions were verified and P values less than .05 were considered significant. We applied a 1-way analysis of variance (ANOVA) with orthogonal contrasts to evaluate formative quiz effects across the 3 classes: Hypothesis 1 – Receiving formative quizzes will increase summative exam scores and Hypothesis 2 – Writing formative quiz questions will further increase summative exam scores. The ability to predict summative exam scores with formative quizzes (hypothesis 3) was evaluated with multiple linear regression using the forced entry method (copredictors of sex, age, academic degrees, and ethnicity). Age and ethnicity were collapsed to dichotomous variables for regression analysis (age, ≤30 or >30 years; ethnicity, Caucasian or minority).
RESULTS
Demographic Data and Descriptive Statistics
A total of 189 students participated in this study and demographic data are summarized in Table 2. In our sample, sexes were fairly similar in distribution between Classes 1 and 2, with a slightly greater percentage of males. This distribution was more heavily skewed toward males in Class 3. Academic degree, age, and ethnicity were markedly skewed within all classes in favor of bachelor degrees, <30 years of age, and Caucasians. A total of 189 students participated in this study. Mean assessment scores are reported for the 3 classes in Table 3.
Between-Class Comparisons of Summative Exam Scores
Of 295 SGQQs received, 162 were discarded due to irrelevance, inaccuracy, wrong format, or redundancy. Most discards were due to question redundancy. Therefore, the formative quiz question bank consisted of 133 SGQQs.
A 1-way independent ANOVA demonstrated a moderate (ω2 = .054),16 statistically significant difference in total SX scores (SX1 + SX2) between the 3 classes (Table 4). Planned linear contrasts revealed that total SX scores for the control group (Class 3) were significantly less than those of Classes 1 and 2 – first contrast. However, the act of writing quiz questions and taking formative quizzes (Class 1) did not produce significantly higher total SX scores than only taking the formative quizzes (Class 2) – second contrast.
Formative Quizzes as Predictors of Summative Exam Scores
The capacity to predict total SX scores from total FQ scores (sum of all 8 formative quizzes) was evaluated by multiple linear regression while accounting for sex, academic degree, age, and ethnicity (Table 5). Total FQ scores were found to be statistically significant predictors of total SX scores, accounting for 11% of total SX scores (step 1 of the regression model). The addition of sex, academic degree, age, and ethnicity into the regression model did not significantly increase predictive power (step 2 of the regression model).
The total FQR score (FQR1 + FQR2) also was examined as a predictor for the total SX score while accounting for sex, academic degree, age, and ethnicity (Table 6). This regression analysis revealed that the total FQR score was a better predictor than the total FQ score (compare R2 values, step 1 of Tables 5 and 6). The total FQR score accounted for 52% of the total SX score (step 1 of Table 6). Again, the addition of sex, academic degree, age, and ethnicity did not significantly increase the predictive power of the model (step 2 of Table 6). In both regression models, R2 shrinkage (the difference between R2 and adjusted R2 values) was less than 5% for the initial regression step, suggesting good generalizability for these simple linear models. The relationship between total FQR and total SX scores is plotted in Figure 1.
Formative quiz review predicting summative exam performance. Total formative review score (FQR1 + FQR2) was a good predictor of total summative exam performance, predicting 52% of that performance (R2 = .52). Specifically, our data suggested that a 1-unit change in FQ total score is associated with a 0.94-unit change in the total summative exam score (B = 0.94, also see Table 6).
Formative quiz review predicting summative exam performance. Total formative review score (FQR1 + FQR2) was a good predictor of total summative exam performance, predicting 52% of that performance (R2 = .52). Specifically, our data suggested that a 1-unit change in FQ total score is associated with a 0.94-unit change in the total summative exam score (B = 0.94, also see Table 6).
DISCUSSION
The primary purpose of this study was to determine if the use of formative quizzes and SGQQs would improve performance on subsequent summative exams. On first consideration, the putative benefit of formative quizzes and student participation in writing quiz questions might go unquestioned. Researchers have reported that formative assessments enhance summative exam performance in dental students,12 medical students,15 and a variety of undergraduate majors.14 However, other investigators have reported that formative assessments do not enhance summative exam scores.7,13,17
Our study results supported the notion that formative quizzes will improve performance on subsequent written summative exams. Both classes receiving formative quizzes (Classes 1 and 2) had significantly higher summative exam scores than Class 3, the class not receiving these quizzes.
We also anticipated that writing SGQQs would require greater study and understanding, and this would produce an additional increase in summative exam scores. This argument is consistent with the theory that instructional methods that promote learner interaction are more effective than less active methods.18 However, this hypothesis was not supported by our findings. The class that produced SGQQs and also received the formative quizzes (Class 1) did not have significantly higher summative exam scores than Class 2, which received only the formative quizzes.
Several studies have suggested that formative assessments can predict summative exam outcomes.4,9,11 Our study results supported this conclusion. Total FQ scores and total FQR scores were significant predictors of written summative exam scores. However, total FQR scores were substantially stronger predictors than total FQ scores. With either FQs or FQRs, sex, age, academic degrees, and ethnicity were not significant copredictors of summative exam scores.
In a recent comparative review, Dunlosky et al19 explored the efficacy of 10 learning techniques: elaborative interrogation, self-explanation, summarization, highlighting/underlining, keyword mnemonic, imagery for text, rereading, practice testing, distributed practice, and interleaved practice. They reported that practice testing, which they defined as “self-testing or taking practice tests over to-be-learned material,” has demonstrated effects across an impressive range of practice-test formats, kinds of material, learner ages, outcome measures, and retention intervals. Moreover, they noted that practice testing is not particularly time intensive relative to the other learning techniques, and it can be implemented with minimal training.
Dunlosky et al19 emphasized the importance of instructor feedback in association with practice testing. They noted that instructor feedback protects against perseveration errors when students respond incorrectly on a practice test. In addition, they commented that the corrective effect of feedback does not require that it be presented immediately after the practice test. In fact, Metcalfe et al20 found that final-test performance for initially incorrect responses was actually better when feedback had been delayed than when it had been immediate.
Finally, it is impressive that the beneficial effects of practice testing have been observed for substantial time periods after the exercise: 2–4 weeks,21–25 2–4 months,26–28 5–8 months,29,30 9–11 months,31 and even 1–5 years.32 These are exciting findings for students and educators. We seek long-lasting knowledge, not just temporary learning improvements.
Study limitations are that students in the experimental group received formative quizzes (FQ1–4 and FQ5–8) and quiz reviews (FQR1 and FQR2) before the 2 summative session exams (SX1 and SX2). Students in the control group received traditional lectures on the same topic material and were assessed with similar summative session exams, but did not receive the formative quizzes or quiz reviews. The current study design did not allow us to parse the relative effect of formative quizzes from that of the quiz reviews. Would FQs without FQRs be effective? A future study with separate experimental groups for each of these factor combinations is needed to make that determination. In addition, another limitation of this study is the relatively small sample size and restricted source. Ours was a “sample of convenience” as all students came from a single chiropractic college. Future studies are needed that examine larger populations drawn from a representative sampling of chiropractic colleges. One also might wonder if the observed effect would be substantially influenced by topic of study. All students in this study were enrolled in a physical examination course, and only written assessments are reported here. Other researchers have found that “practice testing” is robust, producing beneficial affects across many topics.19
CONCLUSION
It is reasonable to posit that the formative quizzes in this study enhanced and predicted summative written exam scores because these quizzes were similar in form and content to the written summative exams and they evaluated the same knowledge base. We concluded that nonthreatening formative quizzes, with faculty feedback and quiz reviews, may be used to enhance student learning and suggested that they also may serve to identify students who, without additional remediation, will perform poorly on subsequent summative written exams. Moreover, an extensive education literature suggests that the beneficial effects of active learning may be quite durable, lasting months or even years.
FUNDING AND CONFLICTS OF INTEREST
This work was funded internally. The authors declare that there are no conflicts of interest to declare relevant to this work.
REFERENCES
Author notes
Niu Zhang is an assistant professor at Palmer College of Chiropractic Florida (4777 City Center Parkway, Port Orange, FL 32129; [email protected]). Charles Henderson is a consultant with Henderson Technical Consulting (5961 Broken Bow Lane, Port Orange, FL 32127; [email protected]). Address correspondence to Niu Zhang, 4777 City Center Parkway, Port Orange, FL 32129; [email protected]. This article was received April 10, 2014, revised September 11, 2014, and accepted September 13, 2014.
This paper was selected as a 2014 Association of Chiropractic Colleges – Research Agenda Conference Prize Winning Paper – Award funded by the National Board of Chiropractic Examiners