Specifications grading is a student-centered assessment method that enables flexibility and opportunities for revision. Here, we describe the first known full implementation of specifications grading in an upper-division chemical biology course. Due to the rapid development of relevant knowledge in this discipline, the overarching goal of this class is to prepare students to interpret and communicate about current research. In the past, a conventional points-based assessment method made it challenging to ensure that satisfactory standards for student work were consistently met, particularly for comprehensive written assignments. Specifications grading was chosen because the core tenet requires students to demonstrate minimum learning objectives to achieve a passing grade and complete more content of increased cognitive complexity to achieve higher grades. This strict adherence to determining grades based on demonstrated skills is balanced by opportunities for revision or flexibility in assignment deadlines. These options are made manageable for the instructors through the use of a token economy with a limited number of tokens that students can choose to use when needed. Over the duration of the course, a validated survey on self-efficacy showed slight positive trends, student comprehension and demonstrated skills qualitatively improved, and final grade distributions were not negatively affected. Instructors noticed that discussions with students were more focused on course concepts and feedback, rather than grades, while overall grading time was reduced. Responses to university-administered student feedback surveys revealed some self-reported reduction in anxiety, as well as increased confidence in managing time and course material. Recommendations are provided on how to continue to improve the overall teaching and learning experience for both instructors and students.
Introduction to Chemical Biology or Chemistry 128 (Chem 128) is an upper-division course taken by third- and fourth-year undergraduates in a chemistry major at the University of California, Irvine (UCI). It is required for both the chemistry major and to meet the biochemistry requirement for the American Chemical Society (ACS) degree certification (1). Although the student demographic primarily includes chemistry majors, the course is also open to students from the School of Biological Sciences as an elective; typical enrollment is around 100 to 120 students. The course covers the fundamentals of chemical biology, specifically the application of chemical techniques and mechanisms to explain biologic phenomena at the scale of atoms and bonds. Topics include structures and reactivity, chemical mechanisms of enzyme catalysis, chemistry of signaling, biosynthesis, and metabolic pathways. The lectures provide background information and context required to connect fundamental principles from chemistry with key concepts governing living organisms. In practice, most of the material covered relates to the Central Dogma of Molecular Biology (2), following the flow of information from DNA to RNA to protein. The logic and interpretation of experiments are heavily emphasized in this course; “how do we know?” is at least as important as “what happens?”
Chemical biology has emerged as a recognized subdiscipline within the last several decades and bridges the gap between the molecular detail of chemistry and complex systems of biology. Despite being integral to several areas of transformative research, core competencies, such as those outlined for other subdisciplines by the ACS Committee on Professional Training guidelines or seminal texts on undergraduate biology education (3), have not similarly been established for chemical biology (4). This may be, in part, because the subject matter is evolving at a very rapid pace (5), making it challenging to develop an integrated curriculum suitable for multiple majors that is appreciable by students and achievable by instructors (6). For example, the textbook (7) used for this course is less than a decade old at present (a short timescale for many science, technology, engineering, and math [STEM] subjects); however since the textbook was published in 2013, the genome-editing method, clustered regularly interspaced short palindromic repeats and CRISPR-associated protein 9 (8, 9), was developed and subsequently awarded a Nobel Prize, single-molecule benchtop nucleic acid sequencing (10, 11) has become commercially available at a price point allowing mass use, and messenger RNA vaccines (12) have been developed for commercial use. This flood of new information is potentially made even more problematic by the “tyranny of the textbook” (13), as these are often the default learning tool for undergraduate education.
Undergraduate education in such an interdisciplinary subject would benefit greatly from activities or assignments that require students to apply knowledge to real-world research and mimic responsibilities in future careers. One such activity for upper-division students is the use of case studies that develop critical skills necessary to read literature, justify methods, analyze data, critique findings, and propose hypotheses (4). Assignments based on peer-reviewed literature need to be well planned so as not to be too complicated or time consuming and are therefore often underused in the classroom despite being essential to future education and careers. Not only does addressing this issue have the potential to ameliorate employer dissatisfaction with recently graduated science major communication skills (14), but it also serves as a means to keep the course material up-to-date with relevant advances in the field.
The goal of Chem 128 in its most recent iterations (2019 to 2022) was, therefore, focused on providing students with a working foundation in chemical biology concepts, techniques, and applications, particularly filtered through the lens of reading the current literature. Central to this objective is the ability to effectively interpret, analyze, and critique scientific papers in writing. Students were assigned approximately one paper per week from relevant journals and submitted 2 minireview assignments during the academic term in which they critiqued a paper and discussed relevant background literature. The course was taught from 2019 to 2021 using a traditional points-based assessment system in which the 2 writing assignments accounted for a total of 20% of the students' final grade. Many students had no prior experience with scientific writing or reading current literature, generating stress for the students and frustration for the instructor. The majority of review papers submitted by students did not meet the expected standards and left the instructor with the unsatisfying choice to either grade the assignment accordingly, which would lower students' grades and be unintentionally discouraging, or give artificially high grades even though the standards were not met. Neither option felt appropriate for the most comprehensive assessments of the course objectives or supportive of student learning. This disconnect motivated the implementation of a simultaneously more rigorous and flexible grading policy.
Specifications grading is a student-centered assessment method focused on demonstration of learning objectives (15). It has been successfully used in the following courses: general chemistry lecture (16, 17), organic chemistry lecture (18–20), organic chemistry laboratory (21), biochemistry laboratory (22), cell biology lecture (23), and various other STEM courses (24). Inspired by these efforts, we developed a version of this system for the winter 2022 offering of Chem 128 at UCI. Here, we present, to the best of our knowledge, the first implementation of specifications grading in an upper-division chemical biology lecture. Further, we provide a reflective analysis of potential benefits and areas for improvement to future implementations based on student and instructor perceptions and offer considerations for future education research.
II. SCIENTIFIC AND PEDAGOGICAL BACKGROUND
Proficiency in quantitative analysis is often strongly prioritized in STEM education. However, numeric assessments can be satisfactorily completed without a rigorous conceptual understanding of the material, whereas vague or out-of-context responses to open-ended questions or essays highlight knowledge deficits (25). Further, memorization of equations or stand-alone facts does not support the broader goals of science education, which are enabling graduates to apply fundamental knowledge to make predictions, explain observable outcomes of an experiment, and assess new situations. To the greatest extent possible, information learned should be demonstrated through assessments that mimic real-world use to extend the utility of students' knowledge and skills beyond the classroom to independent scholarship (26).
Analytic writing has been demonstrated to enhance conceptual learning, especially when used in tandem with other assignments, to engage the students with material across the cognitive spectrum (27). Due to the nuanced understanding needed to achieve effective written communication in STEM and its importance to most career paths after graduation, students would likely benefit from pedagogic efforts to incorporate more frequent development of this critical skill (28). Consistent practice and feedback is most advantageous (22); however, written assignments tend to be among the most time-consuming types of assessments to complete and to grade, resulting in less favor among both students and instructors. For students, the reasons scientific writing poses a challenge are numerous and multifaceted. Writing experience gained through other courses, such as humanities, does not necessarily transfer well due to the distinct organization, specialized terminology, and different audience of lab reports and critiques of peer-reviewed work (29). More generally, students also tend to have difficulty connecting seemingly disparate knowledge (30), which is then further complicated by simultaneously processing and incorporating new course-specific knowledge, as this is among the highest level cognitive skills (31).
Simply incorporating more written assessments alone may still not yield the desired results without improved instruction. In order for students to learn content or writing, practice will ideally include elements such as providing a rationale for the design of an investigation, making sense of data, crafting an argument, and refining a text in light of a critique (32). Success in these abstract and high-order cognitive tasks is made more challenging by students' complicated relationship with feedback (33–35). On one hand, students are eager to receive feedback, and it is an essential tool for learning. Effective feedback is specific, understandable, and helpful for completing a future task such that a student is willing and able to use it (36). On the other hand, feedback can also be unintentionally problematic if it is not presented well. Poor-quality feedback is not useful due to being too authoritative, generic, or confusing or if it is unclear how to implement it in future assignments (37). Although the aforementioned may seem obvious, there are subtleties to successful execution. After receiving a grade, a student may have little motivation to actively engage with the feedback if assignments are viewed as modular (38) or stand-alone products, even if a similar task is assigned later in the course. This lack of incentive is further reinforced if the grade for the assignment has already been determined because students can no longer directly benefit from revision efforts (39). This contradiction of intent on both sides can be mitigated if the student and instructor use the feedback to create a dialogue such that students are able to incorporate it into the process of learning (36). It has been shown that when provided with the opportunity to perform iterative, reflective refinement, student views on feedback improve due to increased literacy and appreciation for the rationale (40). Proactive recipience, or active engagement in the feedback process (34), is one of the most important factors that increase overall performance (33).
Developing a more flexible and interactive mode of assigning grades also has compelling implications for student learning and inclusion. Traditional grading provides a static picture that is often misconstrued as aptitude, therefore minimizing the opportunity for feedback that could be beneficial to development of creative problem solving. This generally tends to increased anxiety and lower interest in learning, especially among students from minority demographics (41). Norm-referenced grading was developed because it was believed to be less subjective (42) and is often accepted as a meaningful way to communicate between institutions (43). However, these “standard” curves can be deceiving because they may represent a comparison of student work relative to each other (44), rather than actually conveying meaningful information about individual student understanding or retention of knowledge (45). In fact, it has been shown that competitive environments in which students feel the need to outperform peers lead to less retention (46). Academic performance may become motivated based on extrinsic validation more than intrinsic curiosity, which can affect self-esteem (47) and how students perceive the educational experience in relation to themselves (48). This does a disservice to students as individuals by denying them effective opportunities to learn through reflection (49, 50), as they work toward the ultimate goal of becoming self-regulated learners (51), as well as to the broader scientific community, if we are complicit in accepting the loss of talented underrepresented students (52, 53), for what at best, amounts to tradition, given the problems and misconceptions that have been identified. This is particularly important in the wake of the COVID-19 pandemic, which disproportionally negatively affected students from minoritized groups (54). The effect of the pandemic on student well-being will be unique to each individual in terms of its scope and duration (55); however, it can potentially be mitigated by efforts in the classroom to improve self-efficacy, a component of well-being that has been correlated to performance. Negative trends in interpersonal communication, problem solving, and grades have been reported in a recent study about the return to in-person teaching at institutes of higher education, with a proposed solution being to modify course content and delivery (56).
Specifications grading has the potential to provide several notable benefits for both instructors and students (15). A specifications grading system was used in a Writing for Chemists course developed at UCI, with the goal of providing students frequent opportunities to engage with feedback and submit revisions (28). This assessment method differs from the traditional points-based grading system in that students are required to demonstrate achievement of learning objectives at a satisfactory level or no credit is earned. To offset the higher stakes of removing partial credit, a key feature of this method is that instructors must provide very clear, detailed specifications for what is considered satisfactory. For instructors, this can result in less time spent grading, and for students, this shifts the focus from negotiating partial credit to improving understanding of course concepts to adequately demonstrate a learning objective (57). Also, one of the core tenets of specifications grading is the use of tokens to return a sense of ownership over the learning experience to the students. Tokens provide opportunities for flexibility in submission deadlines and the opportunity to incorporate instructor feedback in the resubmission of revised course assessments, while also maintaining a sustainable workload for instructors. To earn higher course grades, students must demonstrate a mastery of more advanced or complex skills and content applied to more assignments. Requiring revisions, instead of awarding partial credit, motivates students to actively understand why the previous work did not meet learning objectives that support learning (49, 50). Students will not necessarily achieve all the possible learning outcomes, but the course grade will indicate which outcomes they have and have not achieved. Overall, this method enables instructors to adequately uphold high standards, while shifting agency for the overall grade to the student (58–61) by enabling them to revisit challenging concepts or skills in a productive way.
The major goals of the specifications grading redesign of Chem 128 were to promote improvement of the writing assignment submissions such that students could adequately demonstrate application of knowledge to new situations and engagement in scientific argumentation (32) and student self-efficacy through the perceived ability to succeed in the course and confidence to effectively communicate about course concepts. These are both essential skills to advance research literacy and future career success. As we were unable to directly compare other results to previous versions of the course due to the COVID-19 pandemic, these metrics serve as a means of evaluating the effectiveness of this stand-alone implementation toward these goals.
III. MATERIALS AND METHODS
A. Course design
Specifications grading can be hybridized with points-based assessments in a partial implementation (17); however, we elected to use a full specifications grading option (no points component) in the most recent iteration of the course to simplify the assessment policy and to create maximum buy-in from the students. This required establishment of rules for using tokens, updates to assignment rubrics to reflect mastery criteria for meeting learning objectives, and creation of an overall grade tracker, based on demonstrated proficiency across the various course assessments. The course had several formal assessments over a range of cognitive levels designed to evaluate fundamental understanding of the application of chemical techniques and mechanisms to explain biologic phenomena at the scale of atoms and bonds. In previous course iterations, these included discussion section worksheets, problem sets, quizzes, a midterm, a final, and writing assignments. Minor changes to the grading schema included replacing the 2 exams with 4 quizzes because it is our interpretation that high-stakes, summative exams are philosophically contradictory to the intent of specifications grading (15) and eliminating 1 of 5 problem sets due to time constraints. Worksheets, problem sets, and quizzes were designed as assessments of fundamental knowledge and skills. The writing assignments were designed as minireviews of the protein and nucleic acid literature, requiring students to combine concepts learned in the course to critically analyze methods, results, and proposed future work. This work, which is classified as exempt (research involving normal education practices in an established educational setting), was carried out in accordance with the standards established by the UCI Institutional Review Board (protocol 264).
B. Token policy
In this course, students earned all tokens by completing small, course-related activities. Up to 7 tokens could be earned over the duration of the quarter broken down as follows: precourse self-efficacy survey (2 tokens), syllabus assignment (1 token), chemical biology meme (1), attending a relevant department seminar (1 token), and postcourse self-efficacy survey (2 tokens). The precourse self-efficacy survey was due by the end of the first week of the class, and the postcourse survey was due by week 8 of the 10-week quarter to provide time to use the earned tokens. Mandatory participation in research-related surveys is prohibited in the classroom, so alternative assignments, such as reading a chemistry education research publication and writing a brief (2 to 3 sentences) summary, were also made available to students who chose not to participate in the surveys.
The Token Trade-In List provided to students through a page in the course learning management system (LMS) at the beginning of the quarter is provided in the Supplemental Material. This document detailed specific guidelines on how tokens could be used, which included resubmission of research papers (first paper, 2 maximum and second paper, 1 maximum), resubmission of a problem set (2 maximum), revision to 1 quiz question (1 per quiz, maximum 4), opt to take final to replace quiz score (1 per quiz, maximum 4), not attend a discussion section (1 maximum), and late assignment submission (3 maximum per assignment, 1 token per 24-h period, with 72-h maximum extension). Maxima that could be applied to any given assessment, a time limit of 1 week to complete revisions after each assignment, and a deadline to use tokens by week 9 of the quarter (except for the final exam) were established as a means to mitigate student and instructor workload. Each problem set and quiz resubmission also required a student reflection on the changes made to correct mistakes or incorporate feedback. Reflections were not required for resubmission of the writing assignments.
The 2 teaching assistants (TAs) assigned to the course maintained a tracker of tokens earned and used for each student. Individual assignments were marked as either complete or incomplete. TAs then used a single, editable token assignment in the LMS, and the score would increase when tokens were earned and decrease when used to monitor the number of tokens each student had available. Students were required to email TAs directly with the specific need (i.e., 24-h late submission) to request use of tokens. An external inventory was accounted for in an Excel spreadsheet, version 2019 (Microsoft Corporation, Redmond, WA) accessible to both TAs that contained how students earned tokens and how they used them.
The writing assignment rubric was adapted from grading criteria used from a writing course taught by KJM at Emory University and previous iterations of the chemical biology course. Updates to and expansion of the rubric made feedback both more general, as it did not require the instructor to provide as many individual comments, and more detailed because each criterion was written to be more specific and clear. Rubric criteria encompassed skills previously observed to be problematic in student scientific writing: scientific vocabulary, concision in writing, formatting and organization, flow, conventions of scientific writing (62), proper use of literature citations, presentation of data, and avoiding plagiarism (14). Eleven of the 24 criteria were designated as core (Table 1) and were required to be met along with a cumulative total of 17 for low pass and 21 for high pass assessment. If the minimum requirements were not met, the assignment was evaluated as needs revision. In line with the specifications grading method, criteria beyond those designated core were higher order cognitive tasks, such as justification of methods. If minimum criteria to achieve a passing grade were not met, the assignment was marked as needs revision, and students were allowed the opportunity to apply a token to resubmit. Students who achieved a low pass were also permitted to resubmit to attempt to achieve a high pass.
D. Grade criteria
Ultimately, grade criteria are at the discretion of the instructor, which maintains academic freedom in applying this method. However, the general expectation in specifications grading is that students will need to demonstrate mastery of skills or concepts with higher cognitive demand or complete more work to earn higher final letter grades. We used Bloom taxonomy (63, 64) to establish baseline skills for grade demarcations. Each question on a problem set or quiz was assigned a letter grade for the purpose of establishing performance thresholds on assignments. C-level questions were based on knowledge and understanding, requiring students to define, summarize, identify, and perform simple calculations. B-level questions were based on application and analysis, requiring students to make connections among different topics, apply principles to a new problem, draw structures, propose mechanisms, or deduce the correct equations to use. A-level questions were based on evaluation and creating, requiring students to: explain how methods were used, justify methods and controls by assessing the effect on the results, generate hypotheses and describe an experimental design to test them or make predictions. These general descriptions were made available to the students; however, the letter grade associated with each question was not released until afterward to promote maximum participation in the exercises. Minima for low pass and high pass scores were consistently applied to all assignments and quizzes. To earn a low pass, students were required to either satisfactorily complete all of the C-level questions or all but one of the C-level questions and at least one other question. To earn a high pass students were required to demonstrate at least all but one of the C-level questions and achieve at least 80% satisfactory completion of the assignment, which would necessitate demonstrated skills at both the B and A levels. If the criteria for low pass were not met, then the assignment or quiz would be returned as needs revision, and the student would be allowed to use to token to perform revisions and improve the score. The highest score achieved after allowed resubmissions was recorded.
The overall grade determination matrix for the course is presented in Table 2. Students earned the highest grade for which they met all of the minimum requirements. To achieve a D, students were required to earn a low pass on all assessments and complete 6 discussion section worksheets. Plus and minus grades are used at UCI, so additional distinctions were made from the base grade requirements. For plus grades, students needed to complete at least one additional discussion section worksheet and earn a high pass on a research paper when a low pass was required. For minus grades, students were permitted completion of one fewer discussion section worksheets and earning a low pass when high pass was required on a research paper.
E. Self-efficacy survey
The 14-question self-efficacy survey used for this course (provided in the Supplemental Material) was modified from a validated survey to probe student confidence in learning biology, especially as nonmajors (65). There were 3 assessment factors addressed by the questions: methods of chemical biology (question 1), generalization to other chemical biology or science courses and analyzing data (questions 2 to 7), and application of chemical biology concepts and skills (questions 8 to 13). The survey questions were adapted very minimally to make the wording applicable to this course. Table 3 documents changes in wording from the original survey questions (boldface) to the survey used for this course (boldface and italic). Question 14 was the only question we added that was not adapted from the original survey but was deemed pertinent to assessing the goals of the course. The full survey (provided in the Supplemental Material) is available for further reference. The survey was made available through the UCI's instance of Qualtrics, version March 2022 (Provo, UT) a cloud-based platform for distributing Web-based surveys. Participation was completely voluntary (an alternative assignment was provided for students who chose not to participate) and results were analyzed en masse to maintain anonymity.
IV. RESULTS AND DISCUSSION
A total of 107 students enrolled, and 99 students completed the winter 2022 iteration of the course described here. We judged the use of specifications grading to be an overall success, as there were no concerning differences in overall grade distribution, the mean results of the student self-efficacy survey improved slightly, and there were substantial improvements observed on several rubric metrics between the initial submission of writing assignment 1 (WA1) and writing assignment 2 (WA2). This is particularly significant because it was many students' first exposure to this grading method that can initially cause anxiety (21, 66), and it was the first implementation for this course and can be challenging for a variety of reasons (67). We are encouraged by these results so that other educators in biophysics may be able to adapt this framework for the classroom.
A. Token economy
The token system should ideally be aligned to support demonstrated mastery of course objectives without allowing students to generate an unmanageable workload for themselves or the instructors (15, 68). Providing too few tokens causes students to hoard them, preventing them from revising the work, whereas providing too many allows students to mismanage the workload by pushing everything to the end of the class, which is a suboptimal learning experience, as well as producing an unrealistic amount of grading for the instructors at the end of the course. We designed our token economy similar to the system implemented in the Writing for Chemists course (28).
Tracking tokens not only served as a means of accounting but also allowed for analysis of the overall way students used the tokens. Of the 7 total available, the average number of tokens earned and used was 6 and 4, respectively. Thirty-five of 99 students used fewer than half of the available tokens, and only 4 used all 7. As shown in Table 4, the highest percentage of tokens were used on WA1 (124, 32.9%), quiz revisions (103, 27.3%), and WA2 (66, 17.5%). Although exact replication of this policy is not the only means to achieve these results, as administered, the token system adequately supported the goals of the course, as it was not detrimental to student performance or instructor workload.
B. Writing assignments
Using specifications rubrics for the writing assignments, in particular, enables students to learn from mistakes on this challenging and novel (for them) task in a low-stakes context. The nucleic acid minireview paper was assigned in week 4 of the 10-week quarter, and students were allowed to use tokens to resubmit up to 2 times. The protein minireview paper was assigned in week 8, and students were allowed to use tokens to resubmit once due to time constraints at the end of the quarter. Two students did not submit either assignment, despite having access to tokens that could have enabled a late submission. A detailed breakdown of the criteria marked as needs revision for the initial submission and any resubmissions for each writing assignment is provided in Table 5. Boldface values (negative) indicate more than 25% of the class did not adequately demonstrate the rubric line item. Boldface and italic values (positive) indicate criteria with the largest amount of improvement (less frequently marked as needs revision) between WA1 and WA2. Five overall criteria, including 4 core (citations format and placement, discussion, controls, and conclusions) and one other (clarity). were marked as needs revision for 25% or more of the class on initial submissions for both writing assignments. Criteria that showed the most improvement from the initial submission of WA1 to the initial submission of WA2 were discussion, controls, and technical writing, which improved by 28%, 32%, and 75%, respectively, indicating that learning improved between the 2 assignments. In total 14 (11 not previously mentioned) of the 24 criteria yielded a decrease in the frequency of needs revision evaluations between the initial submissions of both assignments. Criteria when students did not improve between the initial submission of writing assignments were relatively anomalous, affecting less than 10% of the students; however, this information could indicate areas to be emphasized, with additional practice or discussion in future iterations of the course.
For both writing assignments, most students received an overall evaluation of needs revision on the first submission but achieved high pass by the final submission, as shown in Table 6. Slightly more students received a final grade of low pass on the second paper, likely due to only having one submission attempt and possibly other competing time requirements at the end of the quarter. The reason we do not assess this to represent declining performance is because roughly 20% of students improved the initial submission grade from WA1 to WA2, with needs revision dropping from 87 to 65, respectively. Students not only applied feedback to make corrections to each individual assignment, but also these results indicate that feedback from WA1 was used to improve the initial submission of WA2. We interpret this finding to demonstrate that students learned new skills and knowledge throughout the revision process. Almost all students were able to achieve high pass on both writing assignments, and although not directly comparable to previous iterations of the course, student performance was qualitatively noted to be much more consistent and improved overall.
C. Grade distributions
This course was taught by the same instructor for 4 consecutive years beginning in the winter quarter of 2019. In 2019, students' final letter grades were determined by the total points accumulated over the duration of the course from the following assessments: quizzes and discussion problems (10%); problem sets (15%); writing assignments (20%); a midterm (25%); and a final exam (30%). The late policy for points-based grades permitted assignments to be accepted up to 1 h late with no penalties and a 10% reduction in score for assignments received each 24-h period beyond the original deadline. While using points-based assessments, students were not permitted to revise or resubmit work. Specifications grading was used in 2022 with the grade criteria and token policies previously described.
Final grade distributions for the 2019 and 2022 courses are shown in Supplemental Figure S1. Winter 2020 and 2021 grades were omitted from the comparison because these iterations were substantially altered to accommodate remote instruction due to the COVID-19 pandemic. The 2019 points-based grade distribution was characteristically Gaussian, with a mode grade of B+ (typical for an upper-division course taken primarily by majors) for a class size of 108 students. In this implementation of specifications grading, significantly more students earned A+ and A final grades, yielding a unimodal distribution across the 99 students. The net workload and expectations for the course predominantly remained unchanged. Therefore, the grade shift is representative of more students demonstrating mastery of the learning objectives, in part, due to opportunities for revision. As an example of this, make-up quizzes were written to be conceptually similar but with unique questions such that answers could not be memorized and learning must be demonstrated. The general shift to higher grades is consistent with some other implementations of specifications grading in undergraduate STEM education (16, 21, 69). We hypothesize that this may be, in part, because a student that would typically earn a B in a traditional points-based system is presented with the tools and awareness to achieve an A (16, 45, 70). The grade distributions are not directly comparable to each other in terms of changes in student learning due to adjustments in the course structure and the unknowable effect of the COVID-19 pandemic. However, we have included the grades to provide a baseline for evaluating whether we provided enough opportunities for rework and to demonstrate this implementation did not lower students' grades on average, despite the more rigorous standards.
D. Survey results
We surveyed students at the beginning and end of the course to test whether student perceptions about the ability to succeed in this or related courses improved after exposure to the more self-directed learning approach offered in specifications grading, or alternatively, if it declined due to receiving detailed, critical feedback. As determined by the token tracker, one student did not complete either survey, 22 students (some of whom dropped the course) only completed the first survey, and 2 students only completed the second survey. Sixteen students submitted 2 entries for one or both of the surveys, possibly by mistake; therefore, we elected to include only the first response in the analysis. This was determined based on an Internet Protocol address alone, as names were used only for awarding token credit and were removed from the survey results prior to analysis. In total, 77 sets of surveys (∼78%) were used in this investigation.
Students responded to the 14 questions with a Likert-scale ranging from 1 (not at all confident) to 5 (totally confident) (65). Results of the precourse (week 1) and postcourse (week 8) surveys were paired for each student. The mean result was determined for the questions corresponding to each assessment factor for each set of surveys (65, 71). Student response means for each of the 3 original factors, as well as the question we added, were assessed for statistically significant changes. We performed both paired t tests and Wilcoxon signed rank tests in R statistical software, version 2019 (R Core Team, Vienna, Austria; 72, 73) to determine whether results were significant. The results of the paired t tests for each factor are provided in Table 7, and distributions of the initial and final factor averages are presented in Figure 1. Both tests qualitatively validated that confidence in all factors increased, indicating that student self-efficacy improved over the duration of the course. The results of this survey demonstrated that specifications grading qualitatively improved student perceptions on self-efficacy to succeed in the course and communicate about related topics, especially in areas of particular focus related to the goals of the class. Extensive prior research has focused on the influence of mindset on academic performance. Our results corroborate this relationship and further suggest that academic performance influences students' mindsets (74).
Limitations of this study are mostly due to its being the first implementation of specifications grading in this course. For instance, we did not include a control group, in part, because this was the first implementation of specifications grading in the course and only one section of the class was offered during that quarter. In the future, it would be beneficial to perform the survey in the same manner with a version of the class with the same assessments and rubrics but taught using a traditional points-based system. We did not receive responses for both surveys from every student enrolled throughout the course, so it is possible that students who were already biased toward feeling confident answered. Further, the questions are a qualitative self-reflection that may be affected by many factors outside of administration of this course.
E. Student perceptions
University-administered teaching evaluations were completed by 29 of 99 students at the end of the quarter. The free-response questions used the standard wording for teaching evaluations at UCI and did not ask about specifications grading in particular. The questions are the following:
(a) Which aspects of this class did you feel were intellectually or creatively stimulating?
(b) Which aspects of this class did you feel contributed most to your learning?
(c) Which aspects of this class could be improved to enhance your learning?
Here, we summarize the responses to these questions that related to specifications grading aspects of the course. Comments on other course features, such as the specific topics covered, the lectures, or the discussion sections, are not included. Students' comments on specifications grading in this course were mostly positive, and many of the negative comments focused on organizational issues related to this being the first time the grading scheme was implemented in this course.
Students liked that the course was organized around 4 quizzes, rather than a midterm and a final. Some found it easier to stay engaged and monitor progress with more frequent assessments. Reduced anxiety due to the lower stakes of each quiz was also mentioned. Although more frequent, low-stakes assessments are not unique to any one grading method, and they are essentially required by specifications grading to adequately allow opportunities for rework. Students appreciated the increased transparency afforded by specifications grading, because they knew from the beginning how the grades would be determined. They also found that specifications grading made it easier to understand what to prioritize, which is important in a class in which a large amount of complex material is covered. Some students appreciated completing revisions, which allowed the opportunity to learn from mistakes, and the token economy, which enabled management of revision attempts. Of the 28 respondents, 72% answered that the instructor provided opportunities to better understand material (36% strongly agree and 36% agree). These results are consistent with expected benefits of specifications for the student learning experience (15).
Students also provided suggestions for improvement, many of which focused on the materials being new and not previously tested. As an example of relatively common feedback (21, 75), some students found the rubrics confusing and thought the grading scheme could be explained better. We plan to improve these materials for future use based on the students' comments. Some other requests are more difficult to implement or are inconsistent with course goals. For example, one student mentioned wanting to know which questions are A, B, or C before the assignment is turned in. We made the deliberate choice not to reveal the question classifications until after the assignment is turned in because we wanted students to make a good faith effort on all problems rather than only attempting the C or C and B problems. Some students wanted more time to revise the assignments, and one specifically requested an unlimited window until the end of the quarter. Although we will be more mindful of spreading out the assignments in the future, it is not realistic or desirable to offer unlimited time for revisions, both because of the instructional team's workload and because allowing assignments to pile up until the end of the quarter, rather than revising them in a timely manner, does not provide an optimal learning experience for students. Finally, one student expressed dislike for specifications grading because it is more work for the students, particularly those without substantial writing practice. However, the student also acknowledged understanding our goals in implementing it and voiced becoming a better writer, which is consistent with student perceptions in other writing classes using specifications grading (28, 76) and is consistent with the more general observation of student dissatisfaction with methods they view as unconventional, regardless of improved performance (77).
F. Teaching assistant and instructor perceptions
Here, we present qualitative assessments assembled from the TAs and course instructor following completion of the grade submissions. From an instructional standpoint, it was expected that some challenges would arise due to this being the first implementation of specifications grading for the course and this grading scheme being new to many students. After a brief initial period of clarifying the instructions related to grading rubrics and token use, the majority of student interactions at office hours and after class meetings were focused on substantive topics related to learning objectives, such as how to identify the controls in an experiment or how to draw a chemical mechanism correctly. From an instructor perspective, the best feature of specifications grading was the shift in focus from points and grades to problem solving and skills. It was observed that less time was dedicated to discussing grades because the overall course expectations were generally clearer, with a path to achieve a given letter grade, and all assignments either satisfactory or returned as needs revision. This was a welcome contrast from previous versions of the same course, when most discussions were concentrated on negotiating for more partial credit and discussing how many points were lost for particular mistakes without the ability to directly correct them, making feedback frustrating for the students and the instructor. Removing the possibility of partial credit seemed to shift the conversation in a more productive direction, toward mastering the skills needed to succeed at the writing assignments or quizzes. This is not always the case with point-based systems, where partial credit can contribute significantly to accumulating enough points to achieve a desired overall grade (15, 57, 78), or where final grades may ultimately be subject to curves or weighted adjustments to achieve a desired distribution. As a positive and perhaps nonintuitive outcome for instructors, grading was much more straightforward and faster even when accounting for time spent grading resubmissions. Open-ended questions were still challenging because a key or rubric cannot fully capture every possible variation of a correct answer or a formatting issue, so some discernment is required. However, this would be the case in a points-based system as well, and it may be even more challenging to fairly apply partial credit, whereas if instructors are in doubt in specifications grading it is fully appropriate to mark as needs revision and allow informed revision. Adoption of this line of thinking can be challenging, even with substantial buy-in, because TAs and instructors have all been indoctrinated almost exclusively to points-based systems. During the course, one TA was concerned that the binary nature of specifications grading as either a pass or needs revision could be detrimental to student grades. Student communication with TAs and the course instructor was observed to improve, generally noted as more positive, less anxious, more eager to improve, and more focused on course concepts.
G. Considerations for future implementation
Buy-in from TAs is critical to realize the benefits to both students and instructors. In this case, even though both TAs understood and supported the goals of specifications grading, they still found it difficult to grade each question in a binary manner after previous experiences with assigning partial credit. This required occasional reminders during our regular instructional team meetings to grade quickly and assign a passing score only when all required elements of the correct answer were present. In between these discussions, it was easy for TAs to slip back into the default mode of thinking about partial credit, which is contrary to the course goals and takes up too much of the time of the TA. The latter point is especially critical when dealing with revisions. Because each assignment may be graded more than once, the workload becomes unmanageable if grades are not assigned quickly and without considering student effort or trying to rationalize partially correct answers. This was mostly a concern at the beginning of the course and became less of a problem with practice. Overall, the TAs, one of whom had taught the same course before the implementation of specifications grading, reported that the average workload for this course was about the same as for similar courses. The issues with implementation could potentially be mitigated by incorporating a brief training for TAs, especially those not or less familiar with specifications grading, before the course begins.
Based on some core criteria of the writing assignments being consistently rated as not met for the majority of students on initial submissions, shown in Table 5, it could be beneficial to break these criteria down and incorporate consistent practice into problem sets. Questions based on reading a piece of literature were included in a few problem sets, but it may be beneficial to include them on all problem sets in the future. The questions also could be more clearly related to the core criteria on the writing assignment rubrics, which may then help students make the connection between the problem sets and the writing assignments. One other idea to support improvement in this area was to provide students with examples of acceptable assignments; however, the instructor determined that this was not aligned with the learning objectives. The students are presented with several examples of well-written, brief review papers (e.g., Nature journal feature “News and Views”) throughout the course. However, they are not provided with examples of this particular assignment because the goal is for them to analyze and discuss the assigned papers based on understanding rather than simply following a template. Further clarification to rubric line items based on student questions and feedback is likely to continue to be important in any future implementations of specifications grading due to the all-or-nothing credit system.
In this implementation, answer keys for problem sets and quizzes were posted immediately after initial grades were released to students, and reflections for resubmitted quizzes and problem sets were not required to be in a specific format. In the future, to ensure that the resubmission demonstrates learning and mastery of a learning objective, we plan to require students to answer the following prompts in addition to the correct answer for each question to be reassessed: (a) What was incorrect about the first approach or answer? Briefly explain why. (b) What changes did you make to achieve the correct answer? Briefly explain why these changes were necessary. (c) What did you learn that you will apply to problems like this in the future? We hope that questions will require students to actively reengage with the course material, reassess any misunderstandings, and promote long-term retention of the material.
It is expected that a handful of outliers may not meet all required criteria, as presented in the grade determination matrix. It is not realistic to predict every possible scenario that could lead to this; however, it is beneficial to have a strategy to mitigate this as uniformly as possible. In this course, most of the observed grading challenges arose when students did not meet all of the specifications needed to earn a low pass for the second writing assignment after one round of feedback and revision. Ideally, they would have a second opportunity to revise the work and earn a better grade; however, this was not feasible because it was too close to the end of the course. In all 4 cases when this happened, the students' second drafts showed significant improvement relative to the first, and they were assigned a score of low pass, enabling them to pass the course. One other student turned in a revised second writing assignment without having submitted the first draft; this was graded normally and earned a score of high pass. Although improving the rubrics and instructions will likely reduce the number of exceptions that have to be dealt with, it is probably impossible to eliminate them altogether, and some flexibility is needed to determine grades in these cases.
The only major drawback of this implementation of specifications grading was the accumulation of grading near the end of the quarter. In particular, 2 rounds of revisions were allowed for the first writing assignment to make sure students were provided with enough feedback on the work and opportunities to correct mistakes. However, the initial submission for the first writing assignment was late enough in the quarter that the second round of revisions coincided with the initial submission of the second writing assignment, causing a bottleneck in grading. This led to excessive work for the instructor during this time, as well as a delay in students' receiving feedback. We believe this problem can be resolved with better scheduling, particularly moving the first writing assignment earlier in the quarter, even though students will not have as much background when they begin to work on it.
Due to the rapid pace of changes in the field of chemical biology, an upper-division undergraduate course was redesigned using specifications grading to support research literacy as demonstrated through comprehensive writing assignments. Specifications grading offers a tailorable, student-centered assessment approach that can be beneficial for both students and instructors, especially for high-complexity cognitive tasks that can benefit from iterative feedback. The grading system allowed students to resubmit work, qualitatively improving both the conceptual understanding and written communication skills. Students, overall, were receptive to the changes and showed improvements in both self-efficacy and performance in areas aligned with the course learning objectives. Workload for the instructors was comparable to past versions of the course. Although this system requires some buy-in and additional efforts at clarification, it is likely to be beneficial in other interdisciplinary and dynamic areas of study.
The supplemental material contains the token trade-in document provided to students, grade criteria, grade distributions, self-efficacy survey, and specifics of self-efficacy survey statistical analysis and is available at: https://doi.org/10.35459/tbp.2022.000239.s1.
RWM was instructor of record for the course and graded the writing assignments. RWM and JIK codesigned the specifications grading criteria and rubrics, set up the self-efficacy survey, performed analysis of survey results, token usage, and student feedback, and assembled the supplemental material. JLU and MFR were teaching assistants for the class and were responsible for grading problem sets and quizzes, tracking tokens earned and used, and providing teaching assistant perspectives for the manuscript. GRT contributed to statistical analysis of survey results, and WSG assisted with development of the violin plots presented in the supplemental material. RDL assisted with the statistical analysis and presentation of the survey results presented in the main text. RDL and KJM provided consultation on implementation of specifications grading and frameworks for course administration before and during the course. JIK and RWM wrote the manuscript.
This research was supported by the National Science Foundation (award DMR-2002837) to RWM and D. J. Tobias and the California Education Learning Lab grant project title, the Teaching Experiment Academy Office of Planning Research–issued grant (OPR19178). JIK acknowledges support from the Department of Chemistry and the School of Physical Sciences at the University of California, Irvine. The funders had no role in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript. We thank Kailey Baez and Bo Choi from the University of California, Irvine, Division of Teaching Excellence and Innovation for practical advice about specifications grading implementation. Most importantly, we gratefully acknowledge the hard work and helpful input of the students in Chemistry 128 during winter 2022 at the University of California, Irvine.