Context.—In 2006, the first gynecologic cytology proficiency tests were offered by the College of American Pathologists. Four years of data are now available using field-validated slides, including conventional and liquid-based Papanicolaou tests.

Objective.—To characterize the pattern of error types that resulted in initial proficiency-test failure for cytotechnologists, primary screening pathologists, and secondary pathologists (those whose slides are prescreened by cytotechnologists).

Design.—The results of 37 029 initial College of American Pathologists Papanicolaou proficiency tests were reviewed from 4 slide-set modules: conventional, ThinPrep, SurePath, or a module containing all 3 slide types.

Results.—During this 4-year period, cytotechnologists were least likely to fail the initial test (3.4%; 614 of 18 264), followed by secondary pathologists (ie, those reviewing slides already screened by a cytotechnologist) with a failure rate of 4.2% (728 of 17 346), and primary pathologists (ie, those screening their own slides) having the highest level of failure (13.7%; 194 of 1419). Failure rates have fallen for all 3 groups over time. Pathologists are graded more stringently on proficiency tests, and more primary pathologists would have passed if they had been graded as cytotechnologists. There were no significant differences among performances using different types of slide sets. False-positive errors were common for both primary (63.9%; 124 of 194 errors) and secondary (55.6%; 405 of 728 errors) pathologists, whereas automatic failures were most common for cytotechnologists (75.7%; 465 of 614 errors).

Conclusions.—The failure rate is decreasing for all participants. The failures for primary pathologist screeners are due to false-positive responses. Primary screening cytotechnologists and secondary pathologists have automatic failures more often than do primary screening pathologists.

The Clinical Laboratory Improvement Amendments of 1988 (CLIA '88) mandates proficiency testing for individuals performing gynecologic cytology.1 In 2005, a nationwide program of proficiency testing was adopted after a 17-year delay, and in 2006, the College of American Pathologists (CAP) Gynecologic Cytology Proficiency Testing Program (PAP PT) initiated its own proficiency testing program using field-validated slides. Field-validated slides are used because they are more likely to produce reliable results that are reproducible.24 

Unfortunately, the present testing and scoring paradigm does not use the current evidence-based paradigm in the screening, diagnosis, and follow-up of cervical neoplasia, particularly in the separation of low-grade squamous intraepithelial lesions (LSILs) and high-grade squamous intraepithelial lesions (HSILs).5,6 Thus, the potential for failure is greater for pathologists than it is for cytotechnologists.

We reviewed the data since the start of the CAP PAP PT program, looking for patterns of failure, particularly the frequency of error types. The data were analyzed for primary pathologists (those who screen their own slides), secondary pathologists (those who receive slides prescreened by a cytotechnologist), and cytotechnologists.

Since the inception of PAP PT, data are now available for a 4-year period (January 1, 2006, to December 31, 2009), which includes data from the Veterans Administration, the Department of Defense facilities, and locum tenens results. The CAP program uses field-validated slide-set modules tailored to each laboratory's practice pattern: conventional (PAPC), ThinPrep (PAPM), SurePath (PAPK), or a module containing mixture of all 3 slide types (PAPJ). The responses for each slide fall into 4 CLIA '88 categories: category A for unsatisfactory slides; category B for negative, infectious, and reparative conditions; category C for LSILs; and category D for HSILs and carcinoma. Each test slide set must contain at least one slide from each of these categories.

The responses to each slide are graded using the CLIA '88 scoring grid (Table 1). Although the minimum passing score for both pathologists and cytotechnologists is 90, pathologists are scored more stringently than cytotechnologists. The maximal score is 10 points per slide, and fewer points are awarded when the diagnosis varies from the target interpretation with the penalty weighted in proportion to the severity of the lesion. The reference diagnosis for each slide requires 100% consensus of at least 3 physicians who are board-certified in anatomic pathology. Furthermore, tissue diagnosis is necessary to confirm reference diagnoses in the C and D categories (ie, premalignant and malignant cases).

Table 1.

Clinical Laboratory Improvement Amendments of 1988 Grading System by Participant Type

Clinical Laboratory Improvement Amendments of 1988 Grading System by Participant Type
Clinical Laboratory Improvement Amendments of 1988 Grading System by Participant Type

A maximum of 10 points are awarded for a correct response and 0 to −5 points are assigned for an incorrect response on a 10-slide test. The individual's score for the testing event is determined by adding the point value achieved for each slide preparation, dividing by the total points for the testing event, and multiplying by 100. The CLIA '88 scoring system grades pathologists more harshly than cytotechnologists.

The failure rates for primary pathologists, secondary pathologists, and cytotechnologists were each reviewed. In addition, we reanalyzed the data scoring primary pathologists using the cytotechnologist's scoring grid. The data were reviewed in aggregate as well as by module type. Finally, we reviewed the data on the frequency of error types: automatic failures (AF), in which an HSIL was interpreted as negative; false-negatives (FNs), which included a D to C shift, a C to B shift, or a shift to A from any category; and false-positives (FPs), which included an A to B shift, a B to C shift, or a C to D shift (for pathologists). These error types were analyzed for the 3 types of test-takers and for the 4 types of slide-set modules.

Overall 38 482 tests were administered in the 2006–2009 testing cycles. Of those, there were 37 029 initial tests. Most participants took the PAPM slide-set module, with a 4-year total of 23 530 initial tests, which was followed by the PAPJ module with 6588 total initial tests, the PAPK with 5166 tests, and the PAPC with 1745 tests. The PAPJ module has shown a small growth over the years, the PAPM and PAPK modules have remained relatively constant, and the PAPC module is declining in use.

The overall failure rate for this period was 4.1% (1536 of 37 029) and was highest among primary pathologists (13.7%; 194 of 1419), followed by secondary pathologists (4.2%; 728 of 17 346), and was 3.4% (614 of 18 264) for cytotechnologists (Table 2). The overall failure rate has dropped by roughly half for all types of participants between the first and fourth year of testing. If the primary pathologists were graded using the cytotechnologist grading scale, there would have been fewer pathologist failures. Secondary pathologists likewise would have fewer failures if scored as cytotechnologists.

Table 2.

Failure Rates by Type of Participant and If All Participants Were Scored the Same During the 4-Year Period

Failure Rates by Type of Participant and If All Participants Were Scored the Same During the 4-Year Period
Failure Rates by Type of Participant and If All Participants Were Scored the Same During the 4-Year Period

Although there were no significant differences among performances from use of the different types of slide sets, there were differences among slide types and patterns of failure (Table 3). Both primary and secondary pathologists performed poorest on the PAPC module in all years, except in 2009, when secondary pathologists performed poorest on the PAPJ module at 3.7%, with a PAPC failure rate of 3.4%. Conversely, cytotechnologists do not fail as frequently when testing on conventional smears. In 2006, both types of pathologists had their fewest failures when tested using the PAPJ module; in 2007, primary and secondary pathologists had their lowest failure rates using the PAPK module. In contrast, the groups diverged in 2008. Thus, the only consistent pattern to emerge was poor performance for pathologists on conventional Papanicolaou tests.

Table 3.

Failure Rates by Slide-Set Type, by Year, and by Participant Type

Failure Rates by Slide-Set Type, by Year, and by Participant Type
Failure Rates by Slide-Set Type, by Year, and by Participant Type

The frequency of error patterns during the 4 years was reviewed (Table 4). The AFs, due to calling a category D (ie, HSIL or carcinoma) a category B (a negative finding), with or without other errors, accounted for almost half of the errors among pathologists failing the PAP PT (43.3% [84 of 194] of primary pathologist errors; 46.4% [338 of 728] of secondary pathologist errors). In contrast, AFs, with or without other errors, accounted for three-quarters of all errors (75.7%; 465 of 614) in cytotechnologists failing the PAP PT, and AFs alone accounted for 62.2% (382 of 614) of the errors in this group. In contrast, pathologists were more likely to have a FP result, with or without another error (63.9% [124 of 194] of errors by primary pathologist; 55.6% [405 of 728] of errors by secondary pathologists). Cytotechnologists who failed the PAP PT exam were much less likely to have an FP error (30.6%; 188 of 614).

Table 4.

Frequency of Error Patterns For Participants Who Failed the Gynecologic Cytology Proficiency Testing Program in 2006–2009

Frequency of Error Patterns For Participants Who Failed the Gynecologic Cytology Proficiency Testing Program in 2006–2009
Frequency of Error Patterns For Participants Who Failed the Gynecologic Cytology Proficiency Testing Program in 2006–2009

Those cases associated with AFs, FP failures, and FN failures in 2006–2009 are shown in Table 5, stratified by year. The AFs were lower in 2009 than they were in 2006 for secondary pathologists and cytotechnologists; primary pathologists demonstrated an increase in AF rates in 2009 compared with 2006. Primary pathologists demonstrated a decrease in FP failures in 2009 compared with 2006, whereas secondary pathologists and cytotechnologists demonstrated an increase in FP results in 2009 compared with 2006. False-negative errors were lower for primary pathologists in 2009 than they were in 2006 but were higher in 2009 than they were in 2006 for both secondary pathologists and cytotechnologists.

Table 5.

Frequency of Error Patterns by Year For Automatic Failures (AF), False-Positive (FP) Failures, and False-Negative (FN) Failures For Participants Who Failed the Gynecologic Cytology Proficiency Testing Program in 2006–2009

Frequency of Error Patterns by Year For Automatic Failures (AF), False-Positive (FP) Failures, and False-Negative (FN) Failures For Participants Who Failed the Gynecologic Cytology Proficiency Testing Program in 2006–2009
Frequency of Error Patterns by Year For Automatic Failures (AF), False-Positive (FP) Failures, and False-Negative (FN) Failures For Participants Who Failed the Gynecologic Cytology Proficiency Testing Program in 2006–2009

The analysis of CAP PAP PT program data shows very clear trends during the 4-year period 2006–2009. The 4-year trends are similar to the data initially reported.7 Cytotechnologists are least likely to fail the test, followed by pathologists who review prescreened slides, followed by pathologists who screen their own slides. However, all 3 groups of participants showed significant improvements during the 4-year period, with the failure rate approximately cut in half for each group and for overall testing. The decrease in failures may be due to a number of factors including participants dropping out of testing programs and no longer involved in Papanicolaou test screening, selection of better testing material, and better test-taking strategies.

There were no clear patterns of failure for various slide types, other than pathologists performing the worst with conventional Papanicolaou tests.

The CLIA '88 scoring system penalizes pathologists for errors more harshly than it does cytotechnologists. A major difference in the scoring grids for pathologists and cytotechnologists is for the C and D categories. If cytotechnologists locate the cells of either LSIL or HSIL/carcinoma and recognize them as abnormal, they do not need a precise diagnosis. However, CLIA '88 mandates that pathologists separate HSIL from LSIL and vice versa. The requirement for pathologists to separate LSIL from HSIL offers another chance for pathologists to be penalized for an FP result. If a negative slide is interpreted as “LSIL or HSIL,” a pathologist receives no points. If a pathologist fails to correctly identify either LSIL or HSIL to the exact reference interpretation, they receive only 5 points. In contrast, cytotechnologists who overcall either unsatisfactory (category A) or negative (category B) slides as LSIL (category C) or HSIL/carcinoma (category D) are penalized only 5 points.

Furthermore, management strategies make the distinction between LSIL and HSIL no longer clinically relevant. The initial step in management for these 2 entities is identical; colposcopy is the next step in patient management. Subsequent management strategies may differ, but if any level of squamous intraepithelial lesion is noted on the Papanicolaou test, the next step is colposcopy for women of childbearing years.6 

The difference in scoring may be one reason why pathologists fail more frequently than cytotechnologists, even when they receive prescreened slides. Bentz et al7 demonstrated that failure rates increased for cytotechnologists when scored on the pathologist grid, whereas they decreased for primary and secondary screeners when scored on the cytotechnologist grid. If a pathologist agrees with his or her screener on all 10 slides, the pathologist may fail, whereas the cytotechnologist will pass for the same errors.

Pathologists who screen even one slide per year are tested as primary pathologists, although that may not reflect their daily practice. Thus, one potential reason for the higher failure rate among primary pathologists is that pathologists who screen few slides per year on their own are forced into taking the PAP PT under conditions very different from their routine, daily practice. When pathologists are scored on the less-punitive cytotechnologist scoring grid, even primary pathologists show a marked reduction in failures.

In fact, a major goal of various test-taking strategies is to ensure that all HSILs are identified. Missing HSIL or cancer and calling the slide “normal” results in an AF because of a penalty of −5 and a resultant score of 85. That score is less than the pass rate of 90%. Thus, individuals should minimize their chances to misclassify HSIL. For a cytotechnologist, FP diagnoses are less likely to result in failure than FN results. Thus, cytotechnologists should theoretically grade up when uncertain. In contrast, pathologists are required to categorize epithelial cell abnormalities more precisely, so that strategy may fail. Grading up from a negative score to any squamous intraepithelial lesion will result in a 90%, which is the minimum passing score.

How does this hypothesis hold up with the real data? The most common reason for failure by a cytotechnologist is the AF, with or without other errors, accounting for three-quarters of all errors in those cytotechnologists failing the PAP PT. In contrast, AFs account for slightly less than half the errors among pathologists. Furthermore, FP diagnoses are far more common for both primary and secondary pathologists than they are for cytotechnologists failing the PAP PT. Secondary pathologists may not find agreeing with their screeners a reasonable practice in the PAP PT environment. Primary pathologists, who show very high levels of FP results (63.9%; 124 of 194) are undoubtedly screening very carefully to avoid an AF result. These data are congruent with previously published data.79 

Although there is no clear trend in the pattern of participant failure during the years from 2006 to 2009 (Table 5), primary pathologists had more AFs and fewer FP and FN results in 2009 than they did in 2006. Conversely, AFs by secondary pathologists and cytotechnologists decreased in 2009 compared with 2006, and FP and FN results increased in 2009 compared with 2006. Secondary pathologists and cytotechnologist errors are not independent, as would be expected for those pathologists whose slides are screened and marked before their review in PAP PT.

Hughes et al8 reviewed the differences between the performances of the same field-validated slides in the CAP educational program and the PAP PT program. In contrast to the educational program, category B slides (negative results) were more likely to be called category C or D results. Fewer category C slides (LSIL) were diagnosed as category B (negative) slides. These changes were most pronounced with cytotechnologists and less so for pathologists. Thus, the decreasing failure rate observed during the years is undoubtedly due in part to the test takers being more comfortable with the test. These results further demonstrate the disparity between routine working conditions and the conditions of testing. During PAP PT, routine laboratory communication between the pathologist and screening cytotechnologist and among laboratory cytotechnologists and pathologists is lost. The decreasing failure rate is, therefore, most likely due to better test-taking skills and does not measure increased proficiency for most test-takers.

When designing a proficiency assessment tool, clinically relevant challenges should be used that measure the competency of the laboratory as a whole, not individual test-taking ability. A Papanicolaou test is often interpreted in the context of the clinical presentation, the screening history, the results from human papillomavirus testing, and other patient variables that affect management. The PAP PT should measure the laboratory's ability to direct the patient to the appropriate next step in clinical management.

1.
Department of Health and Human Services
.
Clinical laboratory improvement amendments of 1988: final rule
.
Fed Regist
.
1992
;
57
(
40
):
7001
7186
.
Codified at 493.855.
2.
Young
NA
,
Moriarty
AT
,
Walsh
MK
,
Wang
E
,
Wilbur
DC
.
The potential for failure in gynecologic regulatory proficiency testing with current slide validation criteria: results from the College of American Pathologists Interlaboratory Comparison in Gynecologic Cytology Program
.
Arch Pathol Lab Med
.
2006
;
130
(
8
):
1114
1118
.
3.
Renshaw
AA
,
Wang
E
,
Mody
DR
,
Wilbur
DC
,
Davey
DD
,
Colgan
TJ
.
Measuring the significance of field validation in the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology: how good are the experts
?
Arch Pathol Lab Med
.
2005
;
129
(
5
):
609
613
.
4.
Ducatman
BS
,
Ducatman
AM
.
How expert are the experts? Implications for proficiency testing in cervicovaginal cytology
.
Arch Pathol Lab Med
.
2005
;
129
(
5
):
604
605
.
5.
Human papillomavirus testing for triage of women with cytologic evidence of low-grade squamous intraepithelial lesions: baseline data from a randomized trial: the Atypical Squamous Cells of Undetermined Significance/Low-Grade Squamous Intraepithelial Lesions Triage Study (ALTS) Group
.
J Natl Cancer Inst
.
2000
;
92
(
5
):
397
402
.
6.
Wright
TC
Jr,
Massad
LS
,
Dunton
CJ
,
Spitzer
M
,
Wilkinson
EJ
,
Solomon
D
;
for the 2006 American Society for Colposcopy and Cervical Pathology (ASCCP)–sponsored consensus conference
.
2006 Consensus guidelines for the management of women with abnormal cervical screening tests
.
J Low Genit Tract Dis
.
2007
;
11
(
4
):
201
222
.
7.
Bentz
JS
,
Hughes
JH
,
Fatheree
LA
,
Schwartz
MR
,
Souers
RJ
,
Wilbur
DC
.
Summary of the 2006 College of American Pathologists Gynecologic Cytology Proficiency Testing Program
.
Arch Pathol Lab Med
.
2008
;
132
(
5
):
788
794
.
8.
Hughes
JH
,
Bentz
JS
,
Fatheree
L
,
Souers
RJ
,
Wilbur
DC
;
for the Cytopathology Resource Committee of the College of American Pathologists
.
Changes in participant performance in the “test-taking” environment: observations from the 2006 College of American Pathologists Gynecologic Cytology Proficiency Testing Program
.
Arch Pathol Lab Med
.
2009
;
133
(
2
):
279
282
.
9.
Crothers
BA
,
Moriarty
AT
,
Fatheree
LA
,
Booth
CN
,
Tench
WD
,
Wilbur
DC
.
Appeals in gynecologic cytology proficiency testing: review and analysis of data from the 2006 College of American Pathologists gynecologic cytology proficiency testing program
.
Arch Pathol Lab Med
.
2009
;
133
(
1
):
44
48
.