Purpose: This retrospective study measured correlation of student performance between 2 objective structured clinical examinations (OSCEs) and an introductory integrated clinical skills course that preceded the OSCEs. The hypothesis was that there would be a strong, positive correlation between the earlier level examinations and the upper level OSCE, high enough that earlier examinations could be viewed as predictors of upper level OSCE performance. Methods: Using student scores for 5 academic terms of upper level OSCEs for 2008–2009 (n = 208) and respective earlier scores, correlation coefficients were calculated for the upper level OSCE and Clinical Skills course, and upper and lower level OSCEs. Multiple linear regression analysis was used to evaluate how well the lower level OSCE and clinical skills scores, both as lone and combined independent variables, predicted the upper level OSCE scores. Results: There was at least a moderate correlation between both sets of scores: r = .51 (p < .001) between upper level OSCE and clinical skills course, r = .54 (p < .001) between the upper and lower level OSCEs. A combination of clinical skills and lower level OSCE scores suggested a moderate prediction of upper level OSCE scores (R2 = .38.) Conclusions: Correlations were found to be of at least a moderate level. According to linear regression analysis, a combination of the earlier scores was moderately predictive for the upper level OSCE. More research could be done to determine additional components of student performance.
As in many other professional educational programs, students in Life University's doctor of chiropractic program (DCP) must complete successfully objective structured clinical examinations (OSCEs), a testing format that has been used widely in health care education. It long has been recognized that written tests alone are not sufficient for examination of clinical, technical, and practical skills,1 and the OSCE format, first described in 1975,2 is one alternative, designed with the intent of standardizing the testing of clinical competence and minimizing the biases of traditional written evaluation methods.3 The conventional view of an OSCE is of a series of 5- to 10-minute stations where standardized clinical tasks are performed under the observation of 1 or 2 examiners per station, with each examiner grading the performance on a structured scoring form.1 At Life University, students are required to complete 2 OSCEs, the second following the first by 9 months, after a series of classes and examinations that require progressively higher levels of knowledge and skills. The purpose of our study is to examine the consistency of student performances on the 2 OSCEs and an introductory integrated clinical skills course that precedes the OSCEs.
The current structure of the examinations of interest was implemented in 2005, with development of the clinical education track (CLET), a sequence of 8 classes designed to teach clinical reasoning, and the physical examination and case management skills necessary to function effectively as a primary health care clinician. Of most concern is the first class in the sequence, the 7th quarter clinical skills course, designed to introduce students to an organized approach to the fundamentals of clinical problem solving, data gathering, case management, and clinical examination skills. Within the 14-quarter standard curriculum, students entering this 7th quarter class already have been trained in patient interview methods; orthopedic, neurologic, and physical examination; and the fundamentals of chiropractic adjustment techniques. Testing for the lecture portion of this course consists of weekly individual Readiness Assessment Tests (iRATs), in which students are required to read material before coming into class, and a written final examination using multiple choice and short answer questions. Testing of the practical portion of the course consists of evaluation of students interviewing a simulated patient, as well as orthopedic, neurologic, and physical examination skills. Grading is accomplished by the use of standardized forms. The clinical skills course must be passed before students are allowed to advance to the third year of the curriculum and Level I clinic internship (beginning practice with other students.)
OSCEs were introduced to our program in 2003, and during 2007 and 2008 became established as a part of the curriculum for the 9th and 12th quarters. The 9th quarter OSCE must be passed for students to advance to internship in the outpatient clinic; successful completion of the 12th quarter OSCE allows students to advance to the senior level of clinic internship, for which expectations are higher for their recordkeeping and case management, and they are allowed to participate in off-campus outreach clinics and internships in field offices.
The 9th quarter OSCE tests students' skills and knowledge through the first 8 quarters of the curriculum, and consists of a case management section that includes 2 interview encounters with simulated patients, each followed by written critical thought stations; 4 examination stations (orthopedic, neurologic, physical); and 2 chiropractic technique encounter stations (e.g., motion palpation and static set-ups of adjustment techniques). Students are allowed 5 minutes per station, and each station has a different examiner. There also are 10 radiology examination stations, each with 2 questions regarding identification and interpretation of normal anatomy and congenital anomalies, with 4 minutes allowed per station. The 12th quarter OSCE uses the same format, and includes similar case management and physical examination components, but with a higher level of difficulty and the addition of a multiple-choice case-based critical thought examination, and an expanded radiographic examination to include all aspects of plain film radiography (e.g., a variety of pathologies, fractures, and degenerative processes.)
Before our study, to our knowledge no one had measured previously the degree of correlation of student performance at these various levels of our program, although there are reasons for doing so. For one, there is a natural tendency to look at the scores earlier in a program of study as a way of predicting future performance; however it isn't clear how dependably that can be done. For another, despite continual efforts to improve and standardize teaching and testing methods at our institution, students occasionally report differences in faculty members' opinions on the correct performance of some examination procedures and that grading on performance-based tests can be inconsistent. It generally might be expected that, if teaching and testing are greatly inconsistent, there might be a low correlation between earlier and later test scores. A higher degree of correlation would correspond with more consistent student performances at different levels; if high enough, scores at earlier levels could be viewed as valid predictors of future performance. An extremely low correlation could indicate problems with the testing program; though the findings of this particular study would not be very specific, closer scrutiny of the program would be justified.
There have been a few other investigations into the relationship of OSCE performance in chiropractic education, though none appeared to have been truly similar to our study. For some examples, Foster and Wise examined undergraduate grade point average, National Board of Chiropractic Examiners Part 1 scores, and academic performance in clinical science courses, and compared them to scores in an OSCE, finding that only grades in clinical science courses demonstrated any predictive value.4 Wells et al. examined students' demographic characteristics and academic history in an attempt to predict performance on an OSCE-style examination.5 Adams et al. reported close correlation of a written examination with an OSCE developed for a chiropractic standardization program in Japan.6 Lawson found OSCE performance to have a moderate correlation on some components of the Canadian Chiropractic Examination Board examinations,7 and Tobias and Goubran reported on an OSCE given at the end of the first year of a chiropractic program, in which most students agreed the test was appropriate for their knowledge and skills at that point in their education.8
For our study, the hypothesis was that there would be a positive correlation between scores at different levels, that is students who scored well at the earlier levels (7th quarter Clinical Skills class and 9th quarter OSCE) also will have scored well on the 12th quarter OSCE, while students who scored poorly at the earlier levels also will have lower scores on the later OSCE, and that the amount of correlation would be high enough that tests given at earlier levels could be viewed as valid predictors of future performance.
The Life University Institutional Review Board approved the use of student grades for the purposes of our study. Using existing data supplied by the administrator for OSCEs at Life University, students were included in our study if they had taken the 12th quarter OSCE (OSCE-12) during the academic quarters summer and fall of 2008, and the winter, spring, and summer of 2009. The capture period began with the summer of 2008 because students who took the OSCE-12 just before that time generally had taken a 2-quarter version of the Clinical Skills class that existed only briefly during a redesign of the DCP curriculum. A single percentage value was used for analysis, calculated as the average of all the individual station scores on the examination. If a student took the OSCE-12 more than once, only the first score (usually a failing grade) was used, because that more likely represented student performance before any remedial tutoring or other post-failure improvement strategies.
Scores from the 9th quarter OSCE (OSCE-9) and Clinical Skills class (CLET-7) then were obtained. For OSCE-9, a single percentage value was calculated by averaging the individual station scores. For CLET-7, the single percentage value used to represent performance in the class was the average of the written final examination and performance-based lab examination. Students were to be excluded if they did not have a single CLET-7 score (i.e., they took the earlier, 2-quarter version.) For students who took the OSCE-9 or CLET-7 more than once, only the first score was used, as with the OSCE-12.
Data and Analysis
All scores were entered as percentages of a 100-point scale into Microsoft Excel documents for purposes of organizing different sources into a single document; analysis was done in SPSS version 16.0 (SPSS Inc., Chicago, IL). Group means for CLET-7, OSCE-9, and the OSCE-12 were compared using repeated measures ANOVA. A correlation coefficient (Pearson r) was calculated for comparison of CLET-7 to OSCE-12 and of OSCE-9 to OSCE-12. Multiple linear regression was done to evaluate how well CLET-7 and OSCE-9 scores, as lone and combined independent variables, predicted OSCE-12 scores (dependent variable.) Results were to be considered significant if p values were < .05.
A list of OSCE-12 scores from the 5 quarters spanning the summer of 2008 through the summer of 2009 contained 288 names. Of these, some names were duplicated, as some students had taken the examination twice (n = 25) and some 3 times (n = 4), so the scores kept were for their original attempts and the later ones (total of 33) deleted. There were an additional 33 students for whom OSCE-9 scores were missing and apparently predated the records supplied; delays between the OSCE-9 and OSCE-12 are not unusual or unexpected, and might be caused by failing and repeating some required classes, dropping out of school for a time, or attending on a part-time schedule. Of the remaining names, 14 students did not have a usable CLET-7 score, either because they took the earlier, 2-quarter version of the class, or did not take the final examination in their first attempt at the class. The final analysis included 208 students for whom all 3 examination scores were available.
In Table 1, mean scores and standard deviations for each test are shown. Students' group mean scores were highest on the OSCE-9 and lowest on the OSCE-12, and each was significantly different from the others. Table 2 lists the correlations between the tests at various levels: Pearson r = .51 between CLET-7 and the OSCE-12, and r = .54 between the OSCE-9 and OSCE-12. The comparisons reached a level of significance at p < .001, so the results are unlikely to have occurred due to chance alone.
Table 3 displays the results of the linear regression analysis. The R2 value for OSCE-7 alone with OSCE-12 indicates that the variation in one group of scores is similar to the other (shared variance) to a degree of approximately 26%; the R2 value for OSCE-9 alone with OSCE-12 reflects shared variance of approximately 29%. The R2 value for a combination of CLET-7 and OSCE-9 scores was higher than for either score alone, such that the combination of the independent (or predictor) variables accounted for approximately 38% of the OSCE-12 score. The values used for constructing prediction equations all were significant (p < .001).
Faculty members and students tend to operate on the assumptions that a successful (or poor) performance on an earlier test likely will predict a successful (or poor) performance on a later test of related subject matter. Despite occasional student fears to the contrary, such assumptions seem to have been reasonable, along with an awareness that other factors are involved, for the particular tests examined in our study.
According to the Pearson r values found in our study, there appears to have been at least a moderate correlation between test scores at different levels. In labeling this amount of correlation at least “moderate,” comparisons have been made with Pearson r values of other OSCE studies, in which r = .16–.18 has been labeled as “very little” correlation,9 ,r = .22–.26 “weak,”10 ,r = .221–.282 “small to moderate,”11 ,r = .322–.395 “moderate,”12 and r = .4–.6 “fair.”13 Searches of the literature turned up other studies of relationships between OSCEs and scores on other evaluations (Table 4).7,9–24 In several examples the amounts of correlation appeared similar to the Pearson r values found in our study, while in several others the correlation appeared much lower, and only 1 example was found for which correlation appeared substantially higher than that of our study. 20 Some investigators' insights could be helpful to other investigators. Campos-Outcalt et al. explained a “low” correlation (Pearson r = .305) between 4th-year medical school OSCE and 1st-year residency rating as, “…a common problem in medical education research because medical students are a highly uniform cohort in academic abilities...”.14 Langford et al. observed, when a multiple-choice questionnaire correlated “weakly” with an OSCE (Pearson r = .2), that the OSCE testing revealed “serious practical difficulties” that were not discovered through written examinations alone.19
The linear regression analysis suggests a moderate ability to predict OSCE-12 scores by the combination of CLET-7 and OSCE-9 scores. In the case of the single CLET-7 score, more than 74% of the variance of the OSCE-12 scores can be attributed to other factors (nearly 71% for the OSCE-9, although only about 62% when both scores are used together for prediction). However, what actually was found might be as high as could be expected for these circumstances. A few other examples of studies similar to ours may be seen in Table 4.12,20–22 in each case the investigators evaluated the ability of an OSCE to predict performance on later assessments. In another example from the chiropractic education literature, in which earlier assessments were used to predict OSCE performance, Foster and Wise used regression analysis and found “no real correlation” between National Board of Chiropractic Examiners Part 1 scores and performance on an OSCE, and “weak to mild” correlation between undergraduate GPA and OSCE performance, and stated that “Only performance in clinical science courses demonstrated any predictive value in determining clinical abilities as assessed by an OSCE.”4 No numeric coefficients were stated in their conference abstract, and the study appears not to have been published.
While there may be few examples similar to our study, a search of the literature makes it clear that prediction of student performance is an important topic in health care education. A few examples from other areas might provide a useful context for what ranges of scores could be expected. In a study of the role of preadmission factors in predicting first-year podiatric school performance, Smith and Geleta found an R2 of .32, and commented that the predictive accuracy of their model (using MCAT scores, undergraduate Grade Point Average [GPA], age, sex, ethnicity, and first-year podiatric medical school GPA) was similar to those of other studies of preadmission variables.25 Veloski et al. used multiple linear regression to test the predictive value of age, race, sex, undergraduate GPA, and MCAT scores for students' performance on the United States Medical Licensing Examination (USMLE), finding R2 values of .26, .23, and .17 for USMLE steps 1, 2, and 3, respectively.26 In an example from chiropractic education, Cunningham et al. looked for predictors of performance on the U.S. National Board of Chiropractic Examiners' Part 1 examination, and found the best predictors to be students' GPA from within the chiropractic program (R2 = .368) and GPA from pre-chiropractic education (R2 = .252), as compared to strategies, such as short-term increases in study effort (R2 = .010) or commercial coaching courses and other sources of prepared materials (R2 = .001).27
The intra- and inter-examiner reliability of the tests included in our study is unknown, and without such information, the correlation found in our study is uncertain. In contrast, many of the studies found in the literature search did measure reliability and state correlation coefficients,1,9,11,12,16,17,28 though others did not;22,24 some did not even mention the issue of test reliability.7,18,23 Following each OSCE in this time period, a faculty committee inspected evaluator checklists for inter-examiner discrepancies and made scoring adjustments where obviously needed, but there was no formal analysis of reliability. After the time of our study the OSCE post-examination process was altered, such that the newer, more formal process will allow reliability coefficients to be calculated in the future.
The methods used in our study can measure only a relationship, but cannot provide any information about why one set of correlations is higher or lower than another; nor is it known why students scored significantly higher on the OSCE-9 than the OSCE-12 and CLET-7. It would be helpful to those faculty members who design and administrate the examinations to know whether certain specific subject areas raise or lower the scores. If a different question were to be asked, for example “What are the factors that would predict scores most accurately on the upper level OSCE?,” there are a number of possible alternative approaches. A more complete consideration of elements of prediction equations was beyond the scope of this study, but there are many examples in the literature using factors, such as grade point averages, scores from basic science classes, or even Graduate Record Examination scores, depending on the academic level of interest. Future researchers perhaps also could consider factors, such as students' choices of which subjects to study, outside work schedules, National Board of Chiropractic Examiners' scores, individual attitudes and learning styles, or individual faculty members' grading patterns.
We also are involved in the administration of these tests and teaching of some of the classes that students take. When students enter an OSCE station, it is not uncommon for the student and faculty evaluator to recognize each other; this is unavoidable in such a small institution, but the multiple associations do create possible performance and grading biases.
Finally, the scores used in our study are the products of a unique combination of individuals and subject matter, and may not generalize to educational programs at other chiropractic colleges or even to other time periods for the same institution; however other instructors and administrators might be able to examine these methods and improve upon them.
Correlations between different levels of testing in Life University's Doctor of Chiropractic program were of at least a moderate level. A combination of the earlier test scores used in our study were moderately predictive for the senior level OSCE, but it appeared there also were other unidentified factors. More research could be done to determine additional components of student performance during progression through the DCP.
Drs Debra Bisiacchi, Richard Franz, Karen Numeroff, Michael Pryor, and Florence Ledwitz-Rigby, all of Life University, assisted with our study.
CONFLICTS OF INTEREST
There was no funding for this study and the authors have no conflicts of interest to declare.
About the Authors
Brent Russell is an associate professor in the Office of Sponsored Research and Scholarly Activity
Kathryn Hoiriis is a professor in the College of Chiropractic, and Director for the Center for Excellence in Teaching and Learning
Joseph Guagliardo is an associate professor in the Division of Clinical Sciences, all at Life University.