Context.—

Obtaining diagnostic concordance for squamous intraepithelial lesions in cytology can be challenging.

Objective.—

To determine diagnostic concordance for biopsy-proven low-grade squamous intraepithelial lesion (LSIL) and high-grade squamous intraepithelial lesion (HSIL) Papanicolaou test slides in the College of American Pathologists PAP Education program.

Design.—

We analyzed 121 059 responses from 4251 LSIL and HSIL slides for the interval 2004 to 2013 using a nonlinear mixed-model fit for reference diagnosis, preparation type, and participant type. We evaluated interactions between the reference diagnosis and the other 2 factors in addition to a repeated-measures component to adjust for slide-specific performance.

Results.—

There was a statistically significant difference between misclassification of LSIL (2.4%; 1384 of 57 664) and HSIL (4.4%; 2762 of 63 395). There was no performance difference between pathologists and cytotechnologists for LSIL, but cytotechnologists had a significantly higher HSIL misclassification rate than pathologists (5.5%; 1437 of 27 534 versus 4.0%; 1032 of 25 630; P = .01), and both were more likely to misrepresent HSIL as LSIL (P < .001) than the reverse. ThinPrep LSIL slides were more likely to be misclassified as HSIL (2.4%; 920 of 38 582) than SurePath LSIL slides (1.5%; 198 of 13 196), but conventional slides were the most likely to be misclassified in both categories (4.5%; 266 of 5886 for LSIL, and 6.5%; 573 of 8825 for HSIL).

Conclusions.—

More participants undercalled HSIL as LSIL (false-negative) than overcalled LSIL as HSIL (false-positive) in the PAP Education program, with conventional slides more likely to be misclassified than ThinPrep or SurePath slides. Pathologists and cytotechnologists classify LSIL equally well, but cytotechnologists are significantly more likely to undercall HSIL as LSIL than are pathologists.

The College of American Pathologists (CAP) Cytopathology Committee maintains and administers the PAP Education (PAP Ed) program, which provides 2 mailings of 5 glass slide Papanicolaou (Pap) tests annually to participants for evaluation and interpretation using the Bethesda System Terminology for Reporting Cervical Cytology. Slides interpreted as an epithelial abnormality that are donated for entry into the program must have biopsy confirmation of the Pap test interpretation. All submitted slides are initially prescreened by a cytotechnologist to assess for quality, adequacy, and concordance with the contributor's Pap test diagnosis. Subsequently, each slide is reviewed by 3 committee pathologists for concordance with the biopsy diagnosis, and all 3 pathologists must agree to the reference diagnosis before the slide is accepted into the PAP Ed program. We have observed that many slides are rejected from submission into the program because of a lack of diagnostic concordance, and that complete concordance can be difficult to achieve in the program, even for accepted slides. This is particularly the case between low-grade and high-grade squamous intraepithelial lesions, when biopsy-confirmed low-grade squamous intraepithelial lesion (LSIL) slides contain a few cells showing some characteristics of high-grade squamous intraepithelial lesion (HSIL), and in slides where the diagnostic cells are few. We hypothesized that there may be differences in diagnostic concordance based on who was interpreting the slide (a cytotechnologist or a pathologist), the slide preparation (conventional, ThinPrep [Hologic Inc, Marlborough, Massachusetts], or SurePath [Becton, Dickinson and Company, Franklin Lakes, New Jersey]), and the interpretation of LSIL versus HSIL.

This analysis examined the misclassification of LSIL cases as HSIL and HSIL cases as LSIL. It included 121 059 responses based on 4251 slides that were evaluated in the CAP's PAP Ed program for the 10-year period between 2004 and 2013. Data from 2005-D was excluded from the analysis because this mailing served as a test cycle for the Pap Proficiency Testing program (PAP PT).

The initial analysis included both PAP PT and PAP Ed results. This created an additional level of complexity in the statistical model because the misclassification rates were significantly lower for PAP PT, which is a participant bias that we have noted in previous studies and is probably related to performance scores assigned during proficiency testing and the potential for examination failure with misclassification.1  The analysis would need to account for the different slide performance for both programs, which differs from the main research objective to identify slide and performance characteristics associated with LSIL/HSIL misclassification. To account for this confounding problem, only the PAP Ed results from validated slides used in the PAP PT program were included in the final analysis described below.

A nonlinear mixed model was fit with 3 main factors—reference diagnosis, preparation type, and participant type. The interactions between reference diagnosis and the other 2 factors were also included in the model, in addition to a repeated-measures component to adjust for slide-specific performance. A significance level of .05 was used for this analysis. The analysis was performed using SAS 9.2 (SAS Institute, Cary, North Carolina). Additional analysis included statistical comparison of P values for preparation types and participant types and was incorporated into the results below.

The Table provides a summary of the results. More HSIL cases were misclassified (interpreted as) LSIL (4.4%; 2762 of 63 395) than LSIL cases misclassified (interpreted as) HSIL (2.4%; 1384 of 57 664), and this difference was statistically significant (P < .001). The interaction between participant type by reference diagnosis was also significant (P = .01). There was no performance difference between pathologists and cytotechnologists for LSIL misclassification (P = .28), but cytotechnologists had a significantly higher rate of misclassifying HSIL as LSIL (1437 of 27 534; 5.2%) than did pathologists (1032 of 25 630; 4.0%; P = .003). However, both pathologists and cytotechnologists were more likely to misinterpret HSIL as LSIL (P < .001 for both) than the reverse.

Misclassification of Low-Grade Squamous Intraepithelial Lesion (LSIL) as High-Grade Squamous Intraepithelial Lesion (HSIL) and Vice Versa

Misclassification of Low-Grade Squamous Intraepithelial Lesion (LSIL) as High-Grade Squamous Intraepithelial Lesion (HSIL) and Vice Versa
Misclassification of Low-Grade Squamous Intraepithelial Lesion (LSIL) as High-Grade Squamous Intraepithelial Lesion (HSIL) and Vice Versa

In addition to these differences, there were significant associations for both preparation and participant type testing. Preparation type did not affect participant interpretation to the reference diagnosis overall (P = .29), but there were differences between preparation types, with only conventional slides showing no significant differences in misclassification of LSIL and HSIL (P = .17). However, ThinPrep LSIL slides were significantly more likely to be misclassified as HSIL (920 of 38 582; 2.4%) than SurePath LSIL slides (198 of 13 196; 1.5%; P < .001), but they were less likely to be misclassified than conventional LSIL slides (266 of 5886; 4.5%; P < .001). Conventional LSIL slides were also more likely to be misclassified than SurePath LSIL slides (P < .001). Conventional HSIL slides were more likely to be misclassified as LSIL (573 of 8825; 6.5%) than either ThinPrep HSIL slides (1807 of 43 653; 4.1%; P = .04) or SurePath HSIL slides (382 of 10 917; 3.5%; P = .02). There was no significant difference in misclassification between ThinPrep and SurePath HSIL slides (P = .39). Both ThinPrep and SurePath HSIL slides were significantly more likely to be misclassified as LSIL (P < .001 for both) than the respective LSIL slides misclassified as HSIL.

With changes in cervical cancer screening algorithms,2  the role of the Pap test in detecting precancerous lesions has been changing. The highly successful, historical screening algorithm of annual Pap tests throughout the sexually active period of a woman's life was effective because it provided multiple opportunities to detect an evolving squamous (or glandular) intraepithelial lesion that might have otherwise remained undetected because of its small size or high location in the endocervical canal. Over time, an evolving lesion became large enough or exposed enough that cells could be collected on a Pap test, allowing for triage to colposcopy and a confirmatory biopsy. With the historical algorithm, the most important consideration for a Pap test interpretation was to detect abnormal cells on the slide so that the patient could be identified as requiring further follow-up with colposcopy. The detection of any atypical squamous cells (ASCs), atypical glandular cells, or SIL was enough to warrant colposcopy. With longer screening intervals, a failure to recognize HSIL is likely to lead to a longer delay before the next intervention, potentially allowing the further progression of disease.

This study demonstrates that pathologists and cytotechnologists are significantly more likely to misinterpret biopsy-proven HSIL Pap tests as LSIL in an educational program than the reverse. This implies that some HSIL cases in clinical practice will remain undetected, especially when an acceptable approach to an LSIL Pap test for women ages 21 to 24 years is continued observation without biopsy confirmation.3  In our study, 4.4% (2762 of 63 395) of biopsy-proven HSIL Pap test slides were misinterpreted as LSIL in slides that had superior characteristics of HSIL. All of these slides had passed initial screening for preparation and staining quality and had been validated both by experienced cytologists and by participants in an educational program. Most of the HSIL slides accepted into the PAP Education program have well-preserved, well-visualized HSIL cells in sufficient numbers to be diagnostic. Therefore, it is likely that the misclassification rate in actual practice will be higher, because these variables cannot be controlled. Lower rates of reproducibility of SIL subclassification have been noted in large clinical trials as well as university practices.4,5  Adams et al5  noted that many cases with poor reproducibility also had biopsies that were difficult to subclassify as LSIL or HSIL. In addition, there may be fewer HSIL cells present on Pap tests in practice. Renshaw et al6  showed that Pap tests in the CAP Interlaboratory Comparison Program in Cervicovaginal Cytology with fewer than 100 HSIL cells present performed more poorly than those with more than 500 cells. For patients, the consequences of downgrading an HSIL Pap test to LSIL because of insufficient HSIL cells could result in a missed opportunity for therapeutic ablation and the persistence of a lesion that could progress to carcinoma. But cytologists may be unwilling to report the high-grade lesion in Pap tests with numerous LSIL cells and few HSIL cells without more evidence because of the clinical complications of some excisional procedures, such as cervical incompetence. Conversely, overdiagnosis of LSIL as HSIL, although less common, was seen in 2.4% of cases and could lead to overtreatment in some women if the more aggressive clinical management option of immediate excision is pursued by the clinician.

Theoretical arguments can be made for the superiority of conventional or liquid-based testing in distinguishing HSIL from LSIL. Liquid-based preparations may disperse HSIL cells, making them more difficult to find and interpret, whereas conventional preparations (Pap smears) tend to display similar cells in groups or linear strands, where recognition may be less challenging. Alternatively, liquid-based preparations remove obscuring factors, such as blood and inflammation, ensure adequate fixation of cells to prevent air-drying, and disperse cells across the slide to prevent cellular overlapping and thick areas that can hinder accurate interpretation. In this study, liquid-based preparations significantly outperformed conventional smears in misclassification of HSIL (3.5% SurePath; 4.1% ThinPrep; 6.5% conventional) and LSIL (1.5% SurePath; 2.4% ThinPrep, 4.5% conventional). However, it is also possible that the poorer overall performance of conventional slides may be due to declining participant familiarity with this preparation rather than fundamental properties of the preparation. Among liquid-based preparations, the difference between SurePath and ThinPrep was statistically significant for LSIL but not for HSIL misclassifications.

Of interest, cytotechnologists were significantly more likely to misclassify HSIL Pap tests as LSIL than were pathologists. This may be related to their clinical roles. All cytologists recognize the potential clinical impact of their interpretation, but cytotechnologists may be less willing to report HSIL than pathologists. If cytotechnologists have any doubt as to the interpretation, they may “downgrade” the lesion to LSIL. This would be expected in clinical practice, where cytotechnologists may recognize and mark potential HSIL cells but defer to the pathologist to render an HSIL interpretation. The PAP Ed program is intended to mirror actual practice where cytotechnologists and pathologists can share and discuss slides. In practice, however, cytotechnologists may mark the slides and refer them without this interaction.7 

The Cytopathology Committee has previously demonstrated that participants of both programs react differently to the same slides in PAP Ed versus PAP PT where the grading stakes are higher. In PAP PT, participants take a “defensive” position and tend to upgrade or misclassify slides as HSIL to prevent automatic failure from missing HSIL.1,7  In the study by Hughes et al,1  HSIL was misclassified as LSIL by 0.64% (701 of 109 470) of cytotechnologists in PAP Ed but 1.43% (632 of 44 218) in PAP PT, compared with 0.65% (714 of 109 856) of pathologists in PAP Ed and 0.65% (280 of 43 080) of pathologists in PAP PT. Pathologists were statistically less likely (P < .001) to misclassify HSIL as LSIL in PAP PT than in PAP Ed, but cytotechnologists were more likely (P < .001) to misclassify HSIL as LSIL in PAP PT. In part, these differences may be the result of grading in PAP PT, where cytotechnologists get 10 points for an HSIL or LSIL interpretation when the reference interpretation is either, but pathologists only get 5 points for LSIL interpreted as HSIL or vice versa. This places greater pressure on pathologists to be more definitive in discriminating between LSIL and HSIL. Our study eliminates that bias because our results are only from the PAP Ed program, but pathologists also may screen the PAP Ed slides independently. This might account for a higher misclassification rate due to screening errors in PAP Ed. In addition, some participants may take the view that educational programs do not require as much attention to detail because there are few consequences for an error.

Despite significant misclassification rates, performance appears to be improving over time. An earlier study of conventional slides from the 1996–1997 CAP educational program had much higher misclassification rates.8  The study noted lower rates in validated than educational slides, but misclassification rates even for validated slides exceeded 9%. This suggests that participants have improved their interpretive skills over time, probably because of the widespread adoption of The Bethesda System for Reporting Cervical Cytology criteria, as well as participation in interlaboratory glass slide challenge programs.

It is also possible that there are slide characteristics that influence classification, but that investigation was beyond the scope of this study and is currently in progress. For instance, the HSIL slides that were “downgraded” to LSIL may have had few HSIL cells, or HSIL cells with less hyperchromasia, nuclear enlargement, or nuclear irregularities than expected. These characteristics have been described previously as influencing the interpretation of cytologists.911  High-grade squamous intraepithelial lesion cells may have had more cytoplasm, mimicking metaplasia, or slides may have had abundant LSIL cells. In routine practice, some of these slides may be classified as ASC–cannot exclude HSIL (ASC-H), but in the educational setting, participants might be inclined to label these cases as LSIL because ASC-H and other atypical responses are not options on the answer form. The LSIL slides were less likely to be misclassified, and this may be the result of the intense validation process for slide inclusion into the program. Slides with LSIL that contain any cells suspicious for HSIL cells are likely to be eliminated during this process. Renshaw et al12  demonstrated previously that Pap tests in PAP Ed with a reference interpretation of LSIL perform exceptionally well, and those that do not usually have fewer than 50 LSIL cells. In the ASC of undetermined significance–Low-grade squamous intraepithelial lesion Triage Study (ALTS), Pap tests were more likely to accurately identify LSIL than a cervical biopsy because the characteristic features of nuclear enlargement, hyperchromasia, nuclear irregularities, dyskeratosis, and koilocytosis were more obvious on cytology.4 

The major bias in this study is selection bias. All of the slides evaluated had been selected for quality and performance; poorly performing slides are excluded from the program. As such, these slides do not reflect current cytology practice, and one must exercise caution in extrapolating these findings to clinical practice. There is also selection bias in the participant population. These may be individuals who have a heightened interest in performance improvement, who are required to participate for employment, or who participate solely to obtain continuing education credits. The manner in which each approaches the exercise might be different than his or her approach to an actual patient. For example, there are no liability concerns with educational materials, and participants may be under narrower time constraints to complete the exercise for laboratories with multiple participants, resulting in less attention to the slide. Clinical practice is fraught with complicated and difficult cases that create challenges in interpretation, so individual performance in this study is likely not to be reflective of actual practice. In actual practice, participants have more information about the clinical status of the patient, such as the clinic of reference (colposcopy, well-women clinics), the human papillomavirus status of the patient, and history of prior abnormal Pap test results. All of these variables could have an impact on the interpretation of a Pap test. How the error rate in HSIL classification found in our study will impact future patient care in the era of the “diagnostic” Pap test is yet to be seen.

This study was conceptualized by members of the CAP Cytopathology Committee.

1
Hughes
JH
,
Bentz
JS
,
Fatheree
L
,
Souers
RJ
,
Wilbur
DC
;
Cytopathology Resource Committee, College of American Pathologists
.
Changes in participant performance in the “test-taking” environment: observations from the 2006 College of American Pathologists Gynecologic Cytology Proficiency Testing Program
.
Arch Pathol Lab Med
.
2009
;
133
(
2
):
279
282
.
2
Saslow
D
,
Solomon
D
,
Lawson
HW
, et al.
American Cancer Society, American Society for Colposcopic and Cervical Pathology, and American Society for Clinical Pathology screening guidelines for the prevention and early detection of cervical cancer
.
J Low Genit Tract Dis
.
2012
;
16
(
3
):
175
204
.
3
Massad
LS
,
Einstein
MH
,
Huh
WK
, et al
;
2012 ASCCP Consensus Guidelines Conference
.
2012 updated consensus guidelines for the management of abnormal cervical cancer screening tests and cancer precursors
.
J Low Genit Tract Dis
.
2013
;
17
(
5 suppl 1
):
S1
S27
.
4
Stoler
MH
,
Schiffman
M
;
Atypical Squamous Cells of Undetermined Significance-Low-grade Squamous Intraepithelial Lesion Triage Study (ALTS) Group. Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study
.
JAMA
.
2001
;
21
;
285
(
11
):
1500
1505
.
5
Adams
KC
,
Absher
KJ
,
Brill
YM
,
Witzke
DB
,
Davey
DD
.
Reproducibility of subclassification of squamous intraepithelial lesions: conventional versus ThinPrep paps
.
J Low Genit Tract Dis
.
2003
;
7
(
3
):
203
208
.
6
Renshaw
AA
,
Schulte
MA
,
Plott
E
, et al.
Cytologic features of high-grade squamous intraepithelial lesion in ThinPrep Papanicolaou test slides: comparison of cases that performed poorly with those that performed well in the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology
.
Arch Pathol Lab Med
.
2004
;
128
(
7
):
746
748
.
7
Moriarty
AT
,
Crothers
BA
,
Bentz
JS
,
Souers
RJ
,
Fatheree
LA
,
Wilbur
DC
.
Automatic failure in gynecologic cytology proficiency testing: results from the College of American Pathologists proficiency testing program
.
Arch Pathol Lab Med
.
2009
;
133
(
11
):
1757
1760
.
8
Woodhouse
SL
,
Stastny
JF
,
Styer
PE
,
Kennedy
M
,
Praestgaard
AH
,
Davey
DD
.
Interobserver variability in subclassification of squamous intraepithelial lesions: results of the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology
.
Arch Pathol Lab Med
.
1999
;
123
(
11
):
1079
1084
.
9
O'Sullivan
JP
,
Chapman
PA
,
Jenkins
L
,
Smith
R.
Characteristics of high grade dyskaryotic cervical smears likely to be missed on rapid rescreening
.
Acta Cytol
.
2000
;
44
(
1
):
37
40
.
10
Bowditch
RC
,
Clarke
JM
,
Baird
PJ
,
Greenberg
ML
.
Morphologic analysis of false negative SurePath slides using Focalpoint GS computer-assisted cervical screening technology: an Australian experience
.
Diagn Cytopathol
.
2015
;
43
(
11
):
870
878
.
11
Leung
KM
,
Lam
KK
,
Tse
PY
,
Yeoh
GP
,
Chan
KW
.
Characteristics of false-negative ThinPrep cervical smears in women with high-grade squamous intraepithelial lesions
.
Hong Kong Med J
.
2008
;
14
(
4
):
292
295
.
12
Renshaw
AA
,
Dubray-Benstein
B
,
Haja
J
,
Hughes
JH
;
Cytopathology Resource Committee, College of American Pathologists. Cytologic features of low-grade squamous intraepithelial lesion in ThinPrep Papanicolaou test slides and conventional smears: comparison of cases that performed poorly with those that performed well in the College of American Pathologists Interlaboratory Comparison Program in Cervicovaginal Cytology
.
Arch Pathol Lab Med
.
2005
;
129
(
1
):
23
25
.

Author notes

The College of American Pathologists provided financial support for the research and preparation of this manuscript. All of the authors are or were members of the College of American Pathologists Cytopathology Committee or College of American Pathologists staff. Committee members receive reimbursement for travel and lodging costs associated with committee work. The authors have no other relevant financial interest in the products or companies described in this article.

Competing Interests

The opinions or assertions contained herein are the private views of the authors and do not reflect the official policy of the Department of the Army, Department of Defense, or US government. The authors have no relevant financial interest in the products or companies described in this article.