Context.—Creating a tool that assesses professional proficiency in gynecologic cytology is challenging. A valid proficiency test (PT) must reflect practice conditions, evaluate locator and interpretive skills, and discriminate between those practitioners who are competent and those who need more education. The College of American Pathologists Gynecologic Cytology Proficiency Testing Program (PAPPT) was approved to enroll participants in a nationwide PT program in 2006.
Objective.—Report results from the 2006 PAPPT program.
Design.—Summarize PT results by pass/fail rate, participant type, and slide-set modules.
Results.—Nine thousand sixty-nine participants showed initial PT failure rates of 5%, 16%, and 6% for cytotechnologists, primary screening pathologists, and secondary screening pathologist, respectively. The overall initial test failure rate was 6%. After 3 retests, 9029 (99.6%) of the participants were able to achieve compliance with the PT requirement. No participant “tested out”; however, 40 individuals “dropped out” of the testing sequence (8 cytotechnologists, 9 primary screening pathologists, 23 secondary screening pathologists). Initial failure rates by slide-set modules were 6% conventional, 6% ThinPrep, 6% SurePath, and 5% mixture of all 3 slide types.
Conclusions.—A total of 99.6% of individuals enrolled in the 2006 PAPPT program achieved satisfactory results. The data confirm that cytotechnologists have higher initial pass rates than pathologists and pathologists who are secondary screeners perform better than those who are primary screeners. There was no difference identified in overall pass rates between the slide-set modules. Further analysis of data should help define the results and ongoing challenges of providing a nationwide federally mandated proficiency testing program in gynecologic cytology.
In the United States, the Centers for Medicare & Medicaid Services (CMS), through the Clinical Laboratory Improvement Amendments, routinely regulate laboratories that receive health care payments from CMS, including those facilities that perform gynecologic cytology examinations. To that end, the federal statute from the Clinical Laboratory Improvement Amendments passed in 1988 (CLIA '88) require the “periodic confirmation and evaluation of the proficiency of individuals involved in screening or interpreting cytological preparations, including announced and unannounced on-site proficiency testing (PT) of such individuals, with such testing to take place, to the extent practicable under normal working conditions.”1 These provisions were enacted by the US Congress with the belief that they would ensure basic protection and accuracy of the gynecologic cytology tests on which millions of women depend. The regulations that implement this CLIA '88 statutory provision require cytology laboratories and cytology professionals who examine gynecologic specimens to annually enroll in a CMS-approved PT program and that each individual practitioner achieve a passing score on a PT slide test.
The periodic confirmation and evaluation of an individual's proficiency has been problematic for CMS, the Centers for Disease Control and Prevention, and the cytology profession. Implementation of a valid cytology PT by a nationwide provider has been delayed because of insufficient numbers of stringently referenced cytology testing materials; significant technical, logistical, and administrative difficulties; and concerns about the validity of the testing format (eg, grading scheme, periodicity, and emphasis on individual rather than laboratory performance).2–7
Cytology PT programs in existence prior to 2005 include non-CMS compliant New York8 and Wisconsin9 programs and the CMS-approved Maryland Cytology Proficiency Testing Program (MCPTP).10 These geographically limited programs were required for practitioners examining specimens from patients in those states. Testing programs outside of the United States include the Providence of Ontario in Canada11 and the United Kingdom.12
In late 2004, CMS announced the approval of a gynecologic PT program submitted by the Midwest Institute for Medical Education, Inc (MIME). The launch of the MIME PT program began a nationwide “certification” test of cytology practitioners that was initially conceived and mandated 17 years previously and based on what is now considered outdated medical practice.
In 1988, the College of American Pathologists launched the Interlaboratory Comparison Program in Cervicovaginal Cytology (PAP) as a quality improvement and educational program for cytology laboratories and professionals, using glass slides mailed to laboratories.13 For more than 10 years, statistical analysis of the PAP data has also allowed for robust grading and “field validation” of referenced glass slides. The Gynecologic Cytology Proficiency Testing Program (PAPPT) represents an extension of PAP, and it was approved by CMS for nationwide PT testing in 2006. The original PAP program, consisting of both educational and graded slide sets, was able to form the basis for the PAPPT because of the large number of slides that had been accumulated and field validated during its administration.14,15
The College of American Pathologists Laboratory Accreditation Program requires all laboratories, subject to CLIA '88, performing gynecologic cytology to enroll in a PT program approved by CMS (a phase II requirement). The PAP Education Program component continues as a separate entity, consisting of only educational slides, and is designed for those laboratories not subject to CLIA '88 or for laboratories where an educational interlaboratory comparison program (phase I requirement) is needed. The PAP educational program is embedded within the PAPPT program as an additional educational challenge for individuals/laboratories subscribing to the PAPPT program. These results are from the first nationwide PAPPT examination, which was offered in 2006 using field-validated glass slides.
Slide Selection and Preparation Types
In 1996, the College of American Pathologists PAP glass slide program developed a measurement of laboratory proficiency through the introduction of a grading (or validation) system for slides. This system used laboratory responses for grading but individual participant responses for achieving field validation. The PAP program initially consisted of 3 simplified diagnostic series, in contrast to the CLIA '88 mandated PT program, which grades individuals (and not laboratories) and uses a 4-category reporting system. The 2 diagnostic classification series are shown in parallel in Table 1. In the 1992 CMS regulations pertaining to cytology PT, the distinction of low-grade squamous intraepithelial lesion (LSIL) and high-grade squamous intraepithelial lesion (HSIL) was considered to be an important one. The College of American Pathologists' intent in using a 3-tiered rather than a 4-tiered system was because of the recognized lack of reproducibility between these 2 interpretations16,17 and the lack of difference in current clinical management.18 Although specific supplemental criteria have been added during the past 11 years, the basic validation criteria require that a slide have a match rate of at least 90% to the correct “series” and a standard error of .05 or less.
These stringently field-validated conventional, ThinPrep, and SurePath slides from the PAP program are currently available for use in graded sets. Different slide preparation types are used to formulate slide-set modules selected by the laboratory at the time of enrollment. The 4 slide-set modules are conventional (PAPC), ThinPrep (PAPM), SurePath (PAPK), or a mixture of all 3 slide types (PAPJ). In 2006, PAPM and PAPK consisted of 7 to 10 liquid-based slides with the remaining 0 to 3 slides being made up of field-validated conventional slides.
The PAPPT program enrollees received 3 slide-set shipments in 2006; 2 were composed of 5 educational cervicovaginal cytology glass slides. The third slide set consisted of 10 graded slides for a CMS-approved proficiency test. The intent of the separate modules in the program was to continue the educational value of the program and to continue to select high-performing field-validated slides to feed the PT portion of the program.
CLIA '88 has 4 response categories and referenced slides are each placed into one of these: “A” category for unsatisfactory slides; “B” category for negative, infectious, and reparative conditions; “C” category for LSIL; and the “D” category for HSIL and carcinoma (HSIL/CA). Each PT test must include at least 1 slide from each of these categories.
Every individual in a laboratory who examines gynecologic cytology material must be tested independently. Each cytotechnologist (CT) receives a test set of referenced slides, examines each slide, identifies the diagnostic areas in the same manner as he or she would patient specimens (by dot or circle), and records the diagnosis on the result form. Physicians who screen gynecologic slides that have not been prescreened by a CT are considered primary screeners (1° MDs) and must be tested in the same manner as a CT. Physicians who examine gynecologic slides only after they are prescreened by a CT are considered secondary screeners (2° MDs) and may choose to test with slides that have been previously screened and dotted by CTs, or they may choose to test with slides that have not been previously screened and dotted. If the physician chooses to examine a prescreened set, the CT's result form accompanies the slide set, as a recapitulation of normal laboratory procedure.
Scoring for the PAPPT differs substantially from the PAP educational assessment. Both pathologists and CTs must score at least 90, and the scoring system rewards or penalizes participants in proportion to the degree of variance from the target interpretation, with the penalty additionally weighted in proportion to the severity of the lesion (Table 2). However, the penalty system is more severe for pathologists than for CTs. To determine the final score of a testing event, each slide is given a numerical value as specified on the scoring grids. Cytotechnologists receive a score based on the “CT grid” and the technical supervisor's (pathologist's) score is based on the “technical supervisor grid.” The CT's and technical supervisor's diagnosis can be found along the x-axis. The PT reference diagnosis, which by regulation is based on 100% consensus of at least 3 physicians board-certified in anatomic pathology and tissue biopsy confirmation of cases in the premalignant and malignant categories, can be found along the y-axis. Both CTs and pathologists begin with a score of zero. Points are accumulated based on the accuracy of their interpretations compared with the reference diagnosis specified by the PT provider.
A maximum of 10 points are awarded for a correct response and a maximum of 5 points (−5) are deducted for an incorrect response on a 10-slide test. The individual's score for the testing event is determined by adding the point value achieved for each slide preparation, dividing by the total points for the testing event and multiplying by 100. For example, if the correct answer for a test slide is response category D—HSIL/CA and a CT participant selected response category B—normal/benign, then the CT's point value on that slide would be calculated as −5 points. Assuming that the CT answered the other 9 test slides correctly, the CT's PT test score would be 85%, a failing score (9 slides × 10 points = 90 points + [−5] points = 85 points). For both pathologist and CT, one cannot place a reference category D— HSIL/CA slide into category B—normal/benign and pass (automatic failure). In another example, the correct reference diagnosis is response category C—LSIL and a pathologist participant selected response category D—HSIL/CA as the answer. The point value given for this slide is 5 points. Assume the other 9 slides were answered correctly. The pathologist participant's score is 95%, a passing score (9 slides × 10 points = 90 points + 5 points = 95 points).
Individuals have the opportunity to take multiple (up to 3) retests, because of initial or repeat failure(s) (Figure). For the initial examination, individuals are required to take a 10-slide test within a time period of 2 hours. If an individual passes the first 10-slide test, he or she has successfully participated for the year and need not be tested again until the following year. If the individual fails the first 10-slide test, he or she is required to take a 10-slide retest (R1) within 45 days after notification of test failure. When an individual passes R1, he or she has successfully participated for the year and need not be tested again until the following year. If the individual fails the 10-slide retest (R1), the individual must obtain remedial training in the area of failure, which must be documented on the test evaluation. In addition, all Papanicolaou tests screened by the individual subsequent to the notification of failure must be reexamined and the individual must successfully participate in a 20-slide second retest (R2). If the individual fails the 20-slide test (R2), he or she must cease examining Papanicolaou tests immediately on notification of failure. The individual must obtain at least 35 hours of documented, formally structured, continuing education in diagnostic cytopathology that focuses on the examination of gynecologic cytology, and the individual must successfully participate in another 20-slide third retest (R3).
We analyzed overall pass/fail rates by individual practitioner performance and slide-set module. In addition, a simulation of reversing the scoring grids between CTs and pathologists was performed to determine potential performance differences based on the differences between the scoring grids.
Statistical analyses for contingency tables were performed using chi-square tests and McNemar tests for paired samples scoring. A level of .05 was used for statistical significance. All statistical analyses were performed using SAS v9.1 (SAS Inc, Cary, NC).
A total of 9643 individuals were tested in 2006. Veterans Administration and Department of Defense results were excluded (n = 574), as these facilities are not subject to CLIA '88 regulation, thus leaving a total of 9069 participants. There were a total of 9580 tests results for analysis, as some individuals undertook repeated retesting (n = 511).
Individual Participant Results
The initial pass/fail rates by individual participants for validated conventional, ThinPrep, and SurePath slide-set modules for the 2006 PAPPT through May 2007 are shown in Table 3. Cytotechnologists showed an initial failure rate of 5% (227/4679), compared with 16% (50/309) for 1° MDs and 6% (246/4077) for 2° MDs. Overall, 6% (523/ 9069) of participants failed the initial 10-slide test. The difference noted between participant types was statistically significant (P < .001).
Thirty-one participants (6 CTs, 8 1° MDs, and 17 2° MDs) dropped out of the testing sequence after the initial slide test. Ninety-four percent (492/523) took the first retest (R1). The results from R1 are shown in Table 4 and the retest failure rate for CTs, 1° MDs, and 2° MDs were 6%, 10%, and 4%, respectively, with 5% (26/492) failing overall. The differences between individual participants were not statistically significant (P = .28).
Eight additional participants (2 CTs, one 1° MD, 5 2° MDs) dropped out of the testing sequence after R1. Sixty-nine percent (18/26) took the second retest (R2). Table 5 shows the results from 18 participants that took the 20-slide R2. All CT participants (11/11) were successful and the overall failure rate was 11% (2/18).
One participant (a 2° MD) dropped out after R2. Table 6 results show just one 1° MD participant taking a 20-slide third retest (R3), with a passing test score and achieving compliance. Of note, several 2° MDs chose to retest as 1° MDs during the R2 and R3 tests.
Slide-Set Module Results
Results of the initial test and slide-set modules are summarized in Table 7. Initial overall failure rates by slide preparations were PAPC 6%, PAPM 6%, PAPK 6%, and PAPJ 5%. There was no statistically significant difference in overall failure rates by slide-set modules (P = .92). For CT participants, the highest failure rate was seen with PAPM and PAPJ (5%), followed by PAPK (4%) and PAPC (1%) (P = .04). For 1° MD participants, the highest failure rate was on PAPC and PAPK slide sets (21%), followed by PAPM (14%) and PAPJ (12%) (P = .33). For 2° MD participants, PAPM, PAPC, PAPJ, and PAPK sets all showed a failure rate of 6% (P = .97).
To compare the impact of using different CLIA '88 scoring grids for CTs and pathologists, a simulation was conducted using the opposite scoring grid for individual participant types. The data in Table 8 show the results of initial 10-slide test failure rates using the pathologist scoring grid for CTs and the CT scoring grid for pathologists. Overall failure rate increased slightly from 6% to 7%. The difference in failure rates by participant type using the reversal of test score grid was significantly different (P < .001). Cytotechnologist failures doubled, from 227 (5%) to 460 (10%) (P < .001). The failures for 1° MDs fell from 50 (16%) to 30 (10%) (P < .001) and 2° MDs went from 246 (6%) to 162 (4%) (P < .001). Analysis of slide-set modules and participant types with the simulated (reversed) scoring grid show that for CTs, the largest increase in overall failures occurred with the PAPC set, and the largest decrease in failures for 1° MD and 2° MD participants was the PAPC set.
In 1988, Congress passed CLIA '88 to regulate laboratories.1 The bill included a provision known as PT to evaluate the performance of individuals for the interpretation of Papanicolaou tests. Federal regulators finalized regulations for this law in 1992, which included requirements for annual testing. However, the implementation of a nationwide program was delayed until 2005. The statute requires that gynecologic cytology PT be provided by a private, nonprofit organization that must undergo annual approval, use uniform criteria for acceptable performance, test individuals, make test results available to the public, and use announced and unannounced on-site testing, all done under normal working conditions.
In the second year of nationwide cytology PT testing, 99.6% of individuals enrolled in a CMS-approved program (PAPPT) demonstrated proficiency, as defined by satisfactory completion of the entire PT scheme (including retesting events). The data confirm that CTs achieve higher initial pass rates than pathologists, pathologists who are secondary screeners perform better than those who are primary screeners, and test performance improved slightly with retesting. Slide-set module type does not appear to make any difference in overall test performance.
The consequences for noncompliance with PT are serious and include financial penalties. The CMS will initiate intermediate sanctions or limit the laboratory's CLIA '88 certificate for cytology, and, if applicable, suspend the laboratory's Medicare and Medicaid payments for gynecologic cytology testing in accordance with subpart R of the CLIA '88 regulations, if the laboratory (1) fails to enroll in a CMS-approved cytology PT program, (2) fails to ensure that all individuals examining gynecologic cytology slides are enrolled in a CMS-approved cytology PT program, and (3) fails to ensure that an individual who fails a cytology PT test is retested, if this individual continues to examine slides for the laboratory or fails to take the required remedial actions specified in the CLIA '88 requirements.
Failures in the PT program can significantly affect the reputation and confidence of cytology professionals. Forty individuals either dropped out or chose not to continue retesting, presumably resulting in a discontinuance of gynecologic cytology practice. Failures in the PT program may potentially affect employment of CTs and contract agreements of pathologists.
Participant data provided from CMS for the 2005 nationwide PT (MIME combined with Maryland Cytology Proficiency Testing Program) shows 12 831 enrolled participants (Table 9), and PT initial test failure rates were 7%, 33%, and 10% for CTs, 1° MDs, and 2° MDs, respectively.19 Overall, 1177 (9%) failures occurred with the initial 10-slide test event. This is a higher initial test failure rate than the 2006 PAPPT program (6%). For further comparison, historical failure rates on the Maryland Cytology Proficiency Testing Program were 11% in 1990 and 6% in 1995.20 In New York State,8 historical initial PT test failure rates during many years have ranged from a low of 4% to as high as 47%. However, results from the New York PT testing program are not comparable as this non-CMS approved program has a number of different features, including testing of laboratory (and not individual) response, different scoring system, and the inclusion of nongynecologic specimens. In the pilot PT program in Wisconsin,9 the initial failure rate was 10.1% overall. Data reported from the United Kingdom showed an overall failure rate of 3.6% for all participants.12
The findings in the current study with regards to CT and pathologist performance are similar to previous PT experience. The Wisconsin program failure rate for CTs was 1.4% and 22.5% for pathologists; the UK program has reported failure rates of 3.4% for screeners (CTs) versus 7.7% for pathologists. The pass rates for 1° versus 2° MD screeners in PAPPT is similar to the early observations in MCPTP, in that laboratories with the lowest passing rates were small, operated by a single pathologist, and did not employ CTs.10
The simulation with reversal of the scoring grid produced some interesting results. The much higher failure rate for CTs when using the pathologist scoring grid was the most striking change. The potential reasons are numerous and include the fact that in clinical practice CTs tend to “upgrade” their interpretations during primary screening to maximize their sensitivity. This phenomenon probably accounts for more failures for CTs when using the pathologist scoring grid. This scoring change seems to have benefited the 1° MD screener, whose failure rate fell substantially. Interestingly, 2° MD scores did not change much. This later result is expected because the decrease in difficulty of the CT scoring grid will not dramatically affect an already high pass rate for this group. These changes in initial failure rates with reversal of scoring grids was also noted in simulated analysis of the 1994 Wisconsin PT experience.9 The lack of effect from the scoring grid simulation for 2° MDs reflects clinical practice, in which the CT and MD work together to arrive at the correct interpretation. The data from the current study indicate that the baseline failure rates between CT and MD cannot be directly compared as an assessment of overall accuracy of one group versus the other.
Further analysis of the 2005 CMS data set shows that 96% of the initial test failures moved on to the first retest (R1). Failure rates were 4% for CTs, 34% for 1° MDs, and 8% for 2° MDs in 2005. Overall, 10% failed R1. Fifty-five percent (60/110) of the R1 failures took R2, and 13% (8/ 52) failed R2. Four individuals took R3, and there was only 1 true PT test failure (a 1° MD). These retest sequence results are very similar to the 2006 experience in the PAPPT program, except that none of the participants actually “tested out” of the sequence. Although the raw numbers of total participants dropping out or discontinuing the PT cycle was much higher in the 2005 CMS (18 CTs, 55 1° MDs, 32 2° MDs) than in the 2006 PAPPT (8 CTs, 9 1° MDs, 23 2° MDs), the percent difference (0.8% vs 0.4%) does not appear to be significant.
The 2005 MIME-only data showed that there were more initial failures for CTs using ThinPrep (9%) or conventional (9%) slide sets than SurePath (5%) slide sets.20 This trend remained true for 1° MDs (11% ThinPrep, 12% conventional, 6% SurePath) and 2° MDs (6% ThinPrep, 6% conventional, 3% SurePath). This is in contrast to the results of the current study in which no real differences in overall failure rates were seen between the various types of slide preparation offerings. However, higher failure rates were noted for 2° MD screeners on PAPC and PAPK slide-set modules (21%), with best performance on PAPJ slide-set modules. Cytotechnologists did much better on PAPC (1%) and worse on PAPM and PAPJ modules. There was no difference in 2° MD failure rates and slide-set modules. Of the 4 slide-set modules, the best overall performance came on the PAPJ module, whereas performance on PAPC, PAPK, and PAPM were similar (5% vs 6%).
A side-by-side comparison of 2005 CMS and 2006 PAPPT testing results is shown in Table 9. The differences between the 2005 CMS and 2006 PAPPT data may be related to several factors. The major key to a successful PT program is the validation of the referenced slides used in the examination. The slides used in the PAPPT program have been extensively field validated and are very robust in their ability to be regularly and reliably interpreted.14,15 The 2005 MIME program offered no data regarding scientific field validation or reproducibility of comparable slide sets provided to test takers. Reliable and reproducible case material is difficult to find and takes a lengthy process of observer challenges under testing conditions to achieve statistically significant reproducibility results. Previous PAP data shows that 15% to 19% of cases selected by a panel of 3 expert pathologists fail field validation.14 A detailed analysis of PAP slides with a reference diagnosis of LSIL has shown that even validated LSIL cases are frequently classified as HSIL by participants. The discrepancy rate between LSIL and HSIL for referenced slides in the PAP program ranged from 9.8% to 15%. This has profound implications for PT test design and grading schema. Numerous reasons for poor reproducibility of cytologic interpretations have been shown, most notably the highly subjective nature of cytologic interpretation and variability in observation patterns on an individual slide.
Another possibility for the differences between 2005 CMS and 2006 PAPPT data is that there could be fewer professionals practicing cytopathology after the first year of PT. Although one can make the argument for a “culling of the herd” following the first year of nationwide PT, this does not completely explain the differences, as many of the individuals who failed the initial examination in the 2005 data set did eventually go on to pass the annual PT testing process (99.2% in 2005 vs 99.4% in 2006).
This study did not analyze results for PT pass/fail rates for specific reference diagnoses, such as the automatic failure when a participant chooses negative for intraepithelial lesion or malignancy (category B) for a slide with a reference interpretation of HSIL/CA (category D). A recent report of the PAP data showed that this danger more often occurred with slides that are not field validated (4% vs 2.2% for ThinPrep slides).21 Failure can also be attributable to the inability of a participant to distinguish LSIL from HSIL, which can be difficult.
Proficiency testing in gynecologic cytology is only one aspect of a comprehensive cytopathology laboratory quality system, which includes quality control of staining and processing, personnel competency evaluation, cytology-histology correlations, random and targeted rescreening of normal smears, and educational processes. Improved educational efforts and laboratory quality control are important goals that have been substantially achieved in most laboratories in the 19 years since CLIA '88. The utility or cost-effectiveness of the current federally mandated PT program in the overall quality of the cytology laboratory remains uncertain.
Despite problems with sensitivity, the annual Papanicolaou test remains a very effective program for cervical cancer detection. Papanicolaou tests are most effective when performed at regular, repeated intervals and when laboratory practices are optimized. Laboratories and cytology professionals must ensure compliance with a federally mandated PT at the expense of resources previously used for worthwhile quality improvement programs. The PT program is an expensive program, and educational funds are being shifted in many laboratories to pay for PT that leads to little local quality improvement.
This study ensures transparency of PT data to participants, provides statistical review of the test as currently formulated, and provides assessment for a fair, valid, and reliable testing program within the current regulations. Further study and analysis of data from PAPPT should help to further define the continuing challenges of PT in gynecologic cytology.
The authors have no relevant financial interest in the products or companies described in this article.
Reprints: Joel S. Bentz, MD, Department of Pathology, Huntsman Cancer Hospital at the University of Utah Health Sciences Center, 1950 Circle of Hope Dr, Suite 3860, Salt Lake City, UT 84112-5500 (email@example.com)