The purpose of this longitudinal study was to gather extrapolation evidence of validity by assessing whether performance on a national medical licensing exam, in addition to practice and socio-demographic variables, is predictive of future physician performance in practice. The study focused on a cohort of 3,404 physicians who were registered with the College of Physicians and Surgeons of Alberta (CPSA) and who completed the Medical Council of Canada Qualifying Examination (MCCQE) Parts I and II between 1992–2017. Separate multivariate quasi-Poisson regression models were run to assess the degree of relationship between first-time pass/fail status on the MCCQE I and II, and several CPSA socio-demographic variables and several CPSA socio-demographic variables, in addition to complaints/physician and various prescribing flags. Candidates who failed the MCCQE I on their first attempt had 27% more complaints lodged against them, compared to those who passed. Physicians who failed the MCCQE II on their first attempt prescribed 2+ benzodiazepines and 2+ opioids to 30% more patients than those who passed. Conclusions: Performance on the MCCQE Part I and II is an important predictor of physician performance. Combined with other critical variables, these measures provide important evidence to aid in risk modeling efforts and to guide educational interventions for physicians at an early stage of their careers.
The primary intent of any medical licensing examination is to assure the public that the licensee has demonstrated the core knowledge and skills (competency) necessary for safe and effective performance in practice. In Canada, one requirement for entry into independent practice is the successful completion of the Medical Council of Canada Qualifying Examination (MCCQE). This bipartite program includes the MCCQE Part I, usually completed prior to entering post-graduate training, and the MCCQE Part II, a requirement for entering independent practice in Canada.
Using Kane’s argument-based approach to validation, several sources of evidence to support the plausibility of key claims within the framework of all licensing examination programs can be collected.1–3
Recent interest in exploring whether medical licensing exams can be used to inform secondary interpretations, akin to “off-label” uses of medications in clinical medicine, has emerged; and the use of exams to predict future practice performance has, in particular, been expressed.4 Using Kane’s validation framework, the latter falls under the extrapolation argument. To what extent does performance on an examination, in a circumscribed number of domains, extrapolate to the full complement of physician performance? We might posit that an examinee’s communication score generalizes to communication skills in a breadth of settings as a practicing physician, including end-of-life, patient education and other situations. Such an argument’s complexity underscores the importance of clearly delineating the score- or decision-based inference with provision of pertinent supporting (empirical) evidence.
Most stakeholders assume that competence, as reflected by a medical licensing exam score, predicts quality of care and performance in actual practice.5 However, in the absence of clear arguments and supporting evidence, the latter reduces to a highly fallible claim or in the worst case, mere conjecture.
Previous research explored whether performance on medical licensing exams extrapolated to key educational and clinical outcomes. One investigation followed more than 200 practicing physicians and reported that candidates who performed poorly on the MCCQE had a greater than threefold increase in the risk of an unacceptable quality-of-care peer assessment outcome.6 Another study tracked the performance of more than 2000 United States Medical Licensing Examination (USMLE) candidates and found that performance on the Step 2 Clinical Skills data interpretation sub-score predicted postgraduate ratings in history-taking and physical examination.7 Finally, a Canadian investigation reported that MCCQE Part I scores, for a sample of international medical graduates, were a significant predictor of subsequent performance on a family medicine specialty certification examination.8
Another study, focusing on more than 3,400 physicians who completed the MCCQE between 1993 and 1996, reported that candidates who performed poorly on the communication sub-score of the MCCQE Part II had a significantly higher number of patient complaints.9 Similarly, one study reported that a one-standard-deviation increase in Step 2 Clinical Knowledge scores was associated with a 25% decrease in the odds of receiving a patient complaint.10
Another investigation explored the relationship between scores on the College of Family Physicians of Canada certification examination and health-care resource usage by primary care physicians in Québec, 11 noting that exam scores were significantly associated with consultations and mammography screening rates, as well as prescribing behaviors. Similarly, an additional study found that higher scores on the MCCQE Part I were linked to better primary care clinical outcomes.12
While helpful in supporting the extrapolation argument to validation, some of these past investigations suffer from a number of methodological limitations. Chief among these are the use of: 1) un-equated scores, namely, scores that were unadjusted to account for varying levels of difficulty of test forms analyzed; and (2) unreliable sub-scores in the predictive models.6,9,11–12 Furthermore, most of the latter investigations were carried out on licensing exams that are more than 25 years old and no longer reflect the current state of medicine and medical education. Clearly, there is a need to replicate and extend past studies.
Our longitudinal study expands upon earlier work by investigating the extent to which physician performance on the MCCQE predicts patient complaints, as well as prescribing behavior for a sample of physicians registered with the College of Physicians and Surgeons of Alberta (CPSA). Specifically, our study focused on answering the following questions:
Does first-time pass/fail standing on the MCCQE Part I and/or Part II predict overall patient complaint rates for a sample of CPSA-registered physicians?
Does first-time pass/fail standing on the MCCQE Part I and/or Part II predict opioid and benzodiazepine prescribing behavior for this sample?
Since earlier forms of the MCCQE were not fully equated (prior to 2013), it was not possible to use actual exam scores as predictors in our models. We therefore restricted our analyses to Pass/Fail (P/F) standing on the first attempt for each exam in our various statistical models. This measure does provide some differentiation, as typically 10% to 20% of candidates might not succeed on either exam on their first attempt.
Rates of patient complaints and potentially harmful patterns in opioid and benzodiazepine prescribing were used as proxy measures of physician performance in practice. Alberta has some of the highest rates of prescribed opioids in the world, and, unsurprisingly, high rates of opioid overdoses and deaths.13–14 Alberta also has high rates of benzodiazepine prescriptions, including to individuals over 65 years of age who are vulnerable to potentially harmful consequences of these medications.14–15 The CPSA houses a database of complaint information for all registered physicians in the province. Opioids and benzodiazepines are medications with high potential for misuse and/or diversion; prescriptions have been monitored by the CPSA since 1986 and 2015, respectively.14,16
We also included CPSA practice and socio-demographic variables in our models, such as years since medical school graduation; country of medical training; specialty or type of practice; and gender; as such outcomes have been detailed as potential factors relating to physician performance.17–18 Ethics approval was obtained from the University of Alberta’s Health Research Ethics Board — Health Panel (Study ID: Pro00079279).
Our study sample included a matched group of 3,404 physicians who were registered to practice in Alberta within the past 15 years and who completed the MCCQE Parts I and II between 1992–2017. We excluded data that could not be matched from the study (for any reason). For the purposes of our study, family medicine and general practitioners were grouped together as “family medicine (FM).” “Primary specialty (PS)” included internal medicine, dermatology, pediatrics and obstetrics/gynecology. “Other specialty (OS)” included remaining physicians. Table 1 provides a breakdown of our sample.
As outlined in Table 1, our sample primarily included males (57.1%), Canadian medical graduates (60.8%) and FM physicians (48.8%). The mean age of our sample was 43.3 (SD=7.6) with a mean number of years in practice of 11.7 years (SD=4.0). The mean age of physicians in Alberta is slightly lower than the Canadian average (in 2019, the Canadian average was 49.4 years) 19 likely owing to a younger population in that province compared to other Canadian jurisdictions.
The MCCQE Part I and Part II
The MCCQE Part I measures the medical knowledge and clinical decision-making skills at a level expected of a physician entering supervised practice (residency) in Canada. The exam, for the years under study, was comprised of 196 multiple-choice items and up to 55 clinical decision-making cases. These cases were further subdivided into short-menu (pick-N) items, as well as short-answer constructed-response items. The MCCQE Part I was a paper-and-pencil exam until the year 2000, at which point it began to be computer-delivered.
The MCCQE Part II assesses clinical skills at a level expected of a physician entering independent practice in Canada. Exam forms included in the present study assessed data gathering, physical exam skills, communication skills, and considerations of cultural communication, legal, ethical and organizational aspects of medical practice. Though the number of stations varied in the early offerings of the MCCQE Part II, they remained fixed at 12 stations as of 2003.
Decision accuracy and consistency estimates, appropriate to report with criterion-referenced exams where the reliability of a pass/fail decision is of paramount importance, varied between the high 0.80s and low 0.90s for the test forms under study.
CPSA registration variables
In addition to exam performance, the following 13 CPSA registration variables were also included in our regression models, based on previous related research and evaluation:
Years since registration
Specialty (FM, PS, OS)
Days per week providing medical services
Canadian/International Medical Graduate (CMG/IMG)
Accepting new patients
Performing procedures requiring sedation/anesthesia
Use of an electronic medical record (EMR)
Practicing exclusively as a locum
Teaching in a non-clinical environment
The following four outcome variables were selected for our models:
Total number of patient complaints
Total number of patients for whom 90+ Oral Morphine Equivalents (90 OME) were prescribed
Total number of patients for whom three times the defined daily dose of benzodiazepines (3xDDD) was prescribed
Total number of patients for whom 2+ opioids and 2+ benzodiazepines (2+2) were prescribed
Separate multivariate quasi-Poisson regression models were run for each outcome variable. We were unable to run common Poisson regression models due to over-dispersion of the count data for one of the models (complaints) and under-dispersion for another model (2+2). The Poisson model only has one free parameter and therefore does not allow variance to be adjusted independently of the mean. In quasi-Poisson modeling, the variance is a linear function of the mean.20
Table 2 provides a breakdown of first-attempt P/F rates on each exam, available for 3,283/3,404 (96.5%) candidates on the MCCQE Part I and 2,909/3,404 (85.5%) on the MCCQE Part II. Respectively, 87.8% and 83% of our sample passed the MCCQE Part I and II on their first attempt.
For two out of the four multivariate quasi-Poisson regression models, MCCQE first-time P/F standing was a significant predictor. Results for these two models are provided in the next section of the paper.
Table 3 provides the results of the multivariate quasi-Poisson regression model, predicting the total number of patient complaints. Regression parameter estimates, incidence rate ratios (IRR), and Holm-Bonferroni corrected Type I error rates (nominal Type I error rate = 0.05) to control for multiplicity effects, are provided. For a binary variable, the IRR is the ratio of the number of events in one category to the number of events in the other category. For a categorical variable (with more than two categories), the IRR is the ratio of the expressed category to the base category.
Findings show that the first-attempt P/F standing on the MCCQE Part I was a significant predictor of the total number of patient complaints. Candidates who failed the MCCQE Part I on their first attempt, on average, received 27% more complaints than passing candidates (mean number of complaints [SD] per physician was 1.34 [2.07] and 0.62 [1.37], respectively, for those who failed and passed the MCCQE Part I on their first attempt), controlling for all other variables in the model.
However, passing or failing the MCCQE Part II on the first attempt was not significantly related to patient complaint rates.
All 13 CPSA registration variables were significantly associated with a higher frequency of patient complaints. Specifically, a larger number of complaints was associated with:
A higher number of years since initial registration (7% more complaints per year)
FM (over double the number of complaints compared to OS physicians)
More days of providing medical services per week (26% more complaints per day)
IMGs (44% more complaints than CMGs)
Accepting new patients (33% more complaints than physicians not accepting new patients)
Male physicians (33% more complaints than female physicians)
Older physicians (2% more complaints per year of age)
Performing procedures requiring sedation/anaesthesia (27% more complaints for those physicians performing procedures requiring sedation/anaesthesia)
Using an EMR (23% more complaints for those physicians using an EMR)
Not having hospital privileges (22% more complaints for those physicians not having privileges)
Not practicing exclusively as a locum (66% more complaints for those physicians not practicing exclusively as a locum)
Non-clinical teaching (1% more complaints for those physicians that teach with no provision of medical services)
PS physicians (21% fewer complaints than OS physicians)
Not practicing in a group (15% more complaints for those physicians who did not practice in a group, e.g., solo practitioners)
Table 4 provides the results of the multivariate quasi-Poisson regression model predicting the total number of patients for whom 2+2 were prescribed.
Results show that the first-attempt P/F standing on the MCCQE Part II was a significant predictor of this outcome variable. Candidates who failed the MCCQE Part II on their first attempt, on average, prescribed 2+2 to 30% more patients than passing candidates (mean number of patients for whom 2+2 were prescribed [SD] was 0.27 [1.11] and 0.11 [0.54], respectively, for those who failed and passed the MCCQE Part II on their first attempt.) Note that passing or failing the MCCQE Part I on the first attempt was not related to this prescribing outcome.
Ten out of the 14 CPSA registration variables were also significantly associated with this outcome. Prescribing 2+2 to a higher number of patients was related to:
More days of providing medical services per week (48% more patients were prescribed 2+2 for each day of providing services)
FM (over 79 times the number of patients were prescribed 2+2 compared to OS physicians)
Male physicians (33% more patients were prescribed 2+2)
Performing procedures requiring sedation/anaesthesia (over twice the number of patients were prescribed 2+2)
Being an IMG (31% more patients were prescribed 2+2 than CMG)
Accepting new patients (26% more patients were prescribed 2+2)
Younger physicians (for each decrease in one year of age, 2% more patients were prescribed 2+2)
Not practicing exclusively as a locum (14 times more patients were prescribed 2+2)
Teaching with provision of medical services (5% more patients were prescribed 2+2)
Using an EMR (37% more patients were prescribed 2+2)
The severity of the opioid crisis that has prevailed over the past two decades cannot be understated. In 2017, 47,600 U.S. residents, or more than 130 per day, died of overdoses related to both illicit and prescription opioids.21 Though perhaps less visible in the public press, benzodiazepine prescriptions have also skyrocketed. In a study using National Survey on Drug Use and Health data from 2015–16, Maust, Lin and Blow reported that 30.6 million U.S. adults disclosed using benzodiazepines during the study period.22 More alarming was misuse, which accounted for 20% of overall use.
In light of this public health emergency, our investigation was aimed at gathering evidence to assess whether performance on licensing exams, in addition to a number of in-practice registration variables, could be informative in predicting future professional outcomes, especially as they pertain to patient complaints as well as opioid and benzodiazepine prescribing behaviors. Our study supports validity arguments that fall beyond the primary aim of any medical licensing examination program but that nonetheless jibe with public expectations. That is, physicians who are licensed to practice medicine presumably can demonstrate the knowledge and skills necessary for safe and effective care. Furthermore, there is an implicit assumption (albeit often unverified) that there is some relationship between performance on licensing exams and actual practice.5
Consequently, gathering evidence to bolster any such extrapolation argument becomes particularly critical; i.e., does performance in core competencies expected of all physicians, irrespective of specialty, as measured by licensing examinations, extrapolate to in-practice behaviors? Though stakeholders readily recognize that licensing exams are designed and used for a primary purpose (i.e., initial entry into independent practice), the extent to which the latter might inform performance in residency, at different stages of independent practice and throughout a physician’s career, assumes added importance.
Our study showed that performance on the MCCQE Part I was related to the number(s) of patient complaints. Of interest, complaints seem to be more related to cognitive domains (knowledge, application of knowledge, clinical decision-making, etc.) which form the centerpiece of the MCCQE Part I, compared to the MCCQE Part II, which focuses on communication, professionalism and affective skills. A systematic review of complaints carried out on 59 studies suggests that the latter are equally divided between clinical (quality of care, safety incidents) and management (bureaucratic issues, access and admissions), as well as relationship (communication, caring, patient rights) issues.23 Most sustained patient complaints in the jurisdiction under study stemmed from patient dissatisfaction with treatment.24 Our findings seem to substantiate this statement. Results noted with CPSA registration variables were largely expected (e.g., higher complaints are associated with older male physicians, trained abroad, etc.).25 Of interest is that those physicians using EMRs had a higher rate of patient complaints. Previous studies have also highlighted this unintended consequence; that is, the doctor-patient relationship can be negatively impacted by the use of an EMR, presumably due to the attention placed on completing the record at the detriment of communicating with the patient.26–27
It is also interesting to note that performance on the MCCQE Part II was significantly associated with potentially dangerous opioid and benzodiazepine prescribing behavior. Given the greater emphasis placed on higher-order cognitive skills, as well as core non-medical competencies (communication, professionalism, etc.), it is likely that this risky prescribing behavior is due to a combination of poor clinical reasoning skills as well as deficits in affective domains.28 Many of the same CPSA registration variables previously noted with the patient complaints model were also significantly associated with this risky prescribing behavior. From a regulatory perspective, physicians who are unsuccessful on their first attempt at MCCQE Part II could perhaps be directed to continuous professional development opportunities related to prescribing of these and other potentially harmful medications — as a way to mitigate risk of future poor prescribing habits and negative patient outcomes.
FM physicians were particularly at greater risk of receiving a patient complaint and prescribing dangerous combinations of opioids and benzodiazepines.29–32 Logically, this makes sense because in Alberta, family medicine physicians account for almost half of all physicians in that province. They also tend to have more points of contact with patients over longer periods of time (compared to, for example, an anesthesiologist or emergency room physician), perhaps allowing for increased noted risks.
Our study also identified an interesting albeit unanticipated result: Younger physicians were more likely to have at least one patient to whom they prescribed two or more opioids and two or more benzodiazepines. One study in the literature found a positive relationship between physician age and prescribing of non-psychotropic medications, contrary to our results.33 This appears to be a complex issue, warranting further investigation as to why younger physicians in Alberta are prescribing more of these potentially dangerous combinations (2+2) to patients.
Finally, from a “practical significance” point of view, it is important to underscore that the value added by including first-attempt P/F standing on the MCCQE Part I and II is largely comparable to that associated with most CPSA registration variables. Though some of the latter measures were more strongly associated with patient complaints and 2 + 2 benzo/opioid prescribing behavior (e.g.: Specialty — Family Medicine for both outcomes), the incidence rate ratio values for first-time P/F standing were higher than 12 out of the remaining 13 CPSA registration variables for the complaints outcome and six out the remaining 10 CPSA predictors for the 2 + 2 benzo/opioid prescribing outcome, for the MCCQE Parts I and II, respectively.
Our study is not without limitations. First, only those physicians who eventually passed the MCCQE Part I and II were included in our analyses. Restriction of range and other effects impacted the findings reported in this study. Second, with regard to data from the CPSA, the sample retained for our investigation may not be fully representative of the population of practicing physicians in Alberta. Though we did not denote any major deviations, it would nonetheless be important to formally undertake this comparison as it impacts the generalizability of our findings. The CPSA complaints data includes all complaints — including those that may have been dismissed soon after receipt. Normally these dismissed grievances account for approximately 25% to 50% of all complaints received by CPSA, so our results may indicate a modest inflation of actual complaints that are processed. Third, we were unable to model actual MCCQE Part I and II scores as predictors, given the absence of formal score equating prior to 2013, leading to a loss of information associated with using first-time P/F status. It was not possible to identify a large enough sample (post-2013) to complete score-based analyses in the present investigation. However, present research is replicating this work using actual equated scores. Fourth, though not possible to address in our current study, future research should aim to assess whether any licensing exam-patient complaint/prescribing behavior relationships have changed with the evolution of exam-content to better reflect current health concerns, as well as a greater recognition of these areas by medical practitioners. Fifth, our study was exploratory and to a certain extent, “opportunistic.” Though our findings confirmed expected patterns, future research should be stated in an ex-ante manner, as opposed to ex-post facto. Confirmatory models postulating effects based on theoretical considerations would greatly strengthen this type of research. Finally, the list of predictors examined in our initial inves tigation is by no means comprehensive and should be supplemented with additional measures in any future research.
Despite these caveats, we believe that our work meaningfully contributes to previous studies by demonstrating that standing on medical licensing exams can predict aspects of physician performance in independent practice; a form of “off-label” use for such assessments. However, supplementary replication and extension of this research should be undertaken to further develop and support additional validation arguments.
About the Authors
André F. De Champlain, PhD, is Director, Department of Psychometrics and Assessment Services, Medical Council of Canada.
Nigel Ashworth, MB.ChB. MSc, is Senior Medical Advisor, Research and Evaluation Unit, College of Physicians and Surgeons of Alberta.
Nicole Kain, MPA, PhD, is Program Manager, Research and Evaluation Unit, College of Physicians and Surgeons of Alberta.
Sirius Qin, MSc, is Senior Statistical Analyst, Department of Psychometrics and Assessment Services, Medical Council of Canada.
Delaney Wiebe, MPH, is Research Associate, Research and Evaluation Unit, College of Physicians and Surgeons of Alberta.
Fang Tian, PhD, is Senior Psychometrician, Department of Psychometrics and Assessment Services, Medical Council of Canada.