Context.—

Body fluid cytology is an important diagnostic tool used to identify various conditions. However, an accurate diagnosis in this setting can sometimes be challenging.

Objective.—

To identify the performance characteristics of body fluid cytology by analyzing participant responses from the College of American Pathologists Interlaboratory Comparison Program in Nongynecologic Cytopathology.

Design.—

Participant responses from 5102 slides were analyzed for concordance to the general category (GC) and to the reference diagnosis (RD). Nonlinear mixed models were used to analyze concordance.

Results.—

The overall GC concordance was 95.2%. The GC type, participant type, and preparation type were significantly associated with GC concordance (P < .001). Concordance for malignant cases was higher than it was for benign cases. Cytotechnologists had better GC concordance compared to pathologists. ThinPrep (Hologic, Marlborough, Massachusetts) slides had the highest GC concordance. Participant type, fluid type, preparation type, and participant interpretation were significantly associated with RD concordance (P < .001). Pathologists performed better than cytotechnologists did for RD concordance. Pericardial fluid had the lowest RD concordance, especially for cases with normal or reactive findings. Modified Giemsa–stained slides performed best for lymphoma and hematopoietic malignancy. Small cell carcinoma had the highest GC concordance, and its RD concordance was higher in pleural than in peritoneal fluids. Adenocarcinoma showed the highest concordance rates for both GC and RD.

Conclusions.—

This study illustrates the challenges associated with interpreting body fluid cytology, particularly in pericardial fluid, and the factors that may affect accurate diagnoses. The results also highlight the value of using multiple preparation types in challenging cases.

Body fluid cytology is an important diagnostic test for various malignant and benign conditions. Effusions can be caused by inflammatory, infectious, and benign; neoplastic or malignant; and primary or metastatic diseases. Such conditions in effusions may often have overlapping features and mimic one another both cytomorphologically and clinically, presenting diagnostic challenges. The aim of this study was to perform a large-scale evaluation of the current performance characteristics of body fluid cytology. We analyzed participant responses in the College of American Pathologists (CAP) Interlaboratory Comparison Program in Nongynecologic Cytopathology (NGC). The program provides participants with 5 cases on a quarterly basis. A brief, pertinent clinical history and ancillary test results, if available, are provided for each case. The slides are assigned a reference diagnosis that is unknown to the participant. The participants or laboratories provide both a general category response (eg, benign or malignant) and a specific reference diagnosis for the slide challenges. Many participants and laboratories are enrolled in the CAP NGC program as part of their continuing education or to measure their performance against other participants.1  In this study, we analyzed the responses from the participants for their concordance to the general category and the reference diagnosis to identify potential problem areas in cytologic diagnosis of body fluids.

The study evaluated participant responses to the CAP Interlaboratory Comparison Program in NGC. The analysis included 344 380 responses to 5102 evaluated body fluid slides. We selected several interaction terms, including fluid type, participant type, and preparation type, to determine what factors affect the accuracy of diagnoses using currently available methods. The slides included 3 fluid types (pleural, peritoneal, and pericardial) and 5 different slide preparation types (conventional, CytoSpin [Thermo Fisher Scientific, Waltham, Massachusetts], ThinPrep [Hologic, Marlborough, Massachusetts], modified Giemsa, and SurePath [Becton Dickinson and Company, Franklin Lakes, New Jersey]). Responses from 3 different types of participants (pathologists, cytotechnologists, and laboratories) were collected; those submitted responses between 2002 and 2013 were evaluated. The laboratory participant type usually represented the consensus response of the individual participants within a laboratory group. The analysis examined concordance to the general category (ie, negative, suspicious, positive, or unsatisfactory findings) and the reference diagnosis. Three CAP Cytopathology Committee members evaluated slides and agreed on the reference diagnosis before distributing the slides for evaluation. For slides with positive findings, general category concordance was defined as a positive or suspicious response. For slides with negative findings, concordance was defined as a negative response by a participant. To be included in the analysis, slides had to have at least 5 participant responses.

A nonlinear mixed model was used to analyze the data. The general category concordance rate was fit with 4 factors: fluid type, general category, participant type, and preparation type. For cases with malignant diagnoses, the model was fit with 3 factors: participant type, preparation type, and reference diagnosis. For cases with benign diagnoses, the model was fit with participant type and preparation type. The model also included interaction terms between the main factors and a repeated-measures component to model the slide factor-correlation structure. A P value ≤ .05 was considered significant for these analyses. All analyses were performed with SAS version 9.2 software (SAS Institute, Cary, North Carolina).

General Category

The concordance rates for the nonlinear mixed model in the general category are provided in Table 1. The overall concordance rate for the general category (benign versus malignant) was 95.2%. The general category type, participant type, and preparation type were all significantly associated with concordance to the general category (all P < .001). There were no performance differences based on fluid type. For the general category type, malignant cases had higher concordance rates than benign cases (95.8% and 84.2%, respectively). Among participants, cytotechnologists had a better general category concordance rate compared to pathologists (95.6% and 94.8%, respectively). With respect to preparation type, the concordance rates ranged from 94.5% to 96.4%, with ThinPrep preparations having the greatest concordance (96.4%). When analyzing the general category concordance based on the reference diagnosis, the highest concordance was for small cell carcinoma (98.4%) and the lowest was for findings of normal or reactive cases (83.3%) (Table 2).

The general category concordance rates based on fluid type are provided in Table 3. Among slide responses for pleural fluid, there were significant differences in concordance rates between pathologists and cytotechnologists (94.7% and 95.6%, respectively; P < .001) and between malignant and benign cases (95.7% and 82.7%, respectively; P < .001). The slide responses for peritoneal fluid showed a significant difference in concordance rate between malignant (95.9%) and benign cases (87.9%, P < .001). The concordance rate of slide responses for pericardial fluid was also significantly different between malignant (95.4%) and benign (74.9%) cases (P < .001). There were significant differences in concordance rate observed between modified Giemsa stains and each of the other preparation types in pericardial fluid slides (all P < .001). Additionally, there was a significant difference in concordance rate between modified Giemsa slides of pericardial fluid and those of modified Giemsa slides of pleural fluid or peritoneal fluid (both P < .001). Notably, SurePath slides of pericardial fluids had 99.7% concordance.

Reference Diagnosis

In addition to the general category, respondents were asked to provide specific diagnoses for the slides provided. The frequencies of diagnoses based on fluid type are provided in Supplemental Tables 1 through 3 (see supplemental digital content containing 13 tables, at www.archivesofpathology.org in the January 2018 table of contents). Fluid type, reference diagnosis, participant type, and preparation type were significantly associated with concordance to the reference diagnosis (Table 4; all P < .001). The concordance based on fluid type ranged from 65.5% for pericardial fluid to 73.8% for peritoneal fluid. Concordance based on the reference diagnosis varied widely from 19.9% for non–small cell carcinoma to 76.9% for adenocarcinoma. With respect to participant type, laboratories had the highest concordance (73.3%), followed by pathologists (71.6%) and cytotechnologists (69.7%). Preparation type had concordance rates ranging from 69.2% for CytoSpin to 76.8% for ThinPrep.

The concordance to the reference diagnosis was assessed based on fluid type (Table 5). There were significant differences in concordance rates observed among fluid types in cases of adenocarcinoma (P < .001 for pleural versus peritoneal and pericardial fluids), lymphoma or hematopoietic malignancies (P < .001 for pericardial versus pleural and peritoneal fluids), small cell carcinomas (P < .001 for pleural versus peritoneal fluid), and benign samples (P < .01 for pleural versus peritoneal fluid). When comparing adenocarcinoma cases based on slide preparation and fluid type, the only significant difference in concordance rate was observed for SurePath pericardial fluid (93.3%) compared with conventional preparations (76.3%, P < .001), ThinPrep (79.8%, P < .001), and CytoSpin (83.8%, P = .03) (Supplemental Table 4). Among adenocarcinoma cases, the concordance rate for pathologists was significantly higher than that for cytotechnologists for peritoneal fluid slides (81.7% and 76.2%, respectively; P < .001) (Supplemental Table 5). Concordance rates for lymphoma or hematopoietic malignancy cases were significantly different among preparation types for pleural fluid (range, 65%–88.7%; P ≤ .001 for conventional versus SurePath, modified Giemsa, and CytoSpin; P < .001 for CytoSpin versus SurePath and modified Giemsa; P < .001 for ThinPrep versus modified Giemsa); for peritoneal fluid (range, 53.8%–77.2%; P ≤ .001 for SurePath versus conventional, CytoSpin, and modified Giemsa; P = .03 for SurePath versus ThinPrep); and for pericardial fluid (range, 33.2%–67.4%; P = .02 for conventional versus ThinPrep; P = .01 for conventional versus modified Giemsa; P < .001 for ThinPrep versus modified Giemsa) (Supplemental Table 6) slides, with modified Giemsa preparations having the highest concordance for all fluid types. In lymphoma or hematopoietic malignancy cases, there were also significant differences in concordance rates between pathologist and cytotechnologists for pleural fluid (71.8% and 68.5%, respectively; P < .001), peritoneal fluid (73.4% and 68.0%, respectively; P < .001), and pericardial fluid slides (47.2% and 27.4%, respectively; P = .007) (Supplemental Table 7). Participant interpretation and the frequency of responses for each reference diagnosis are provided in Supplemental Tables 8 through 13.

In this study, we evaluated the performance characteristics of body fluid cytology slides in the CAP NGC program. The results of this analysis serve to highlight the challenges in the diagnosis of effusion cytology specimens.

In the general category, participants had a high overall concordance rate of 95.2%, indicating that they have a high probability of distinguishing between benign and malignant samples. When split into individual categories, the concordance rate was higher for malignant cases (95.8%) than it was for benign cases (84.2%). Similarly, when looking at the general concordance rates based on the reference diagnosis, the malignant diagnoses, such as adenocarcinoma, small cell carcinoma, and mesothelioma, had high concordance rates (94.8%–98.4%) compared with samples with normal or reactive findings (83.3%). For interaction terms based on fluid type, there were significant differences in concordance rates of benign and malignant cases for all 3 fluid types. Those values were consistent with those obtained from the evaluation of CAP NGC program responses for urinary tract cytology samples.2  In that study, the concordance rates for malignant and benign cases were 93.3% and 87.9%, respectively. The authors postulated that the testing nature of the assignment may have psychologically biased the participants to select a positive (malignant/suspicious) diagnosis over a negative one.2  In their analysis, similar to ours, a negative/benign-suspicious discrepancy carried the same weight as a negative/benign-positive discrepancy. It is also well known and previously shown that benign cases having many lymphoid cells or reactive mesothelial changes are often incorrectly interpreted as lymphoma or adenocarcinoma, respectively.3  It is also possible that the benign slides selected for the program may have been more challenging than benign cases typically seen in a clinical setting. Furthermore, concordance for a negative/benign diagnosis was achieved only by an exact negative/benign match, whereas concordance for positive/malignant cases could be achieved with either a positive or a suspicious diagnosis. The CAP program does not include an “atypical” category (ie, atypical but favor benign). It is, therefore, possible that concordance for the negative/benign diagnoses may be inherently more difficult to achieve.

The type of participant responding to the challenge also affected the concordance rate in the general category. Because the “laboratory” participant type can have multiple individuals contributing to the response, it is difficult to draw conclusions regarding diagnostic difficulty based on that concordance rate. However, a consensus response from the laboratory as a whole provides the most accurate response when compared with individuals. Responses from pathologists and cytotechnologists as individuals can provide clearer conclusions on diagnoses. In this study, cytotechnologists had a significantly higher general category concordance rate than pathologists. This result is consistent with those of the urinary tract cytology2  and another CAP NGC study evaluating conventional and liquid-based preparations of pulmonary bronchial brushing specimens.4  This result may be related to the frequency of primary screening performed by cytotechnologists compared with pathologists2  and that cytotechnologists rely more heavily on morphology alone in their daily practice, whereas pathologists more often use special studies. However, there was no difference between the concordance rates of cytotechnologists and pathologists in the general category in a CAP NGC study evaluating different preparations of gastrointestinal cytology specimens.5  Taken together, the difference between the performance of cytotechnologists and pathologists at the general category level, although statistically significant in our study, is relatively small and may be negligible in a clinical setting.

In contrast, when looking at the concordance rates for participant type for reference diagnoses in this study, pathologists had significantly higher concordance rates compared to cytotechnologists. The significantly increased concordance rate for pathologists compared with cytotechnologists is seen specifically in adenocarcinoma cases from peritoneal fluid and in all cases of lymphoma or hematopoietic malignancy, regardless of fluid type. The other CAP NGC program studies had varying results. The urinary tract cytology study was similar to that of this study, with pathologists having higher concordance rates for both urothelial carcinoma and polyomavirus cases compared with cytotechnologists.2  In the bronchial brushing study, pathologists had higher concordance rates when diagnosing carcinoid tumors. However, cytotechnologists had higher concordance rates when diagnosing non–small cell carcinoma and adenocarcinoma.4  In the gastrointestinal cytology study, cytotechnologists had higher concordance rates for cases of squamous cell carcinoma, whereas pathologists have higher rates for spindle cell neoplasms.5  These performance differences may be related to the level of exposure the individual has had to a particular tumor type. Pathologists are more likely to be exposed to specialized tumor types compared with cytotechnologists, whereas cytotechnologists screen samples from numerous body sites, which exposes them to more of the pervasive and common malignancies, thereby heightening their sensitivity for detecting those tumors.5 

At the reference diagnosis level, the concordance rates based on fluid type were significantly different. In the nonlinear mixed model, peritoneal fluid had the highest concordance rate, followed by pleural fluid and pericardial fluid. However, when looking at concordance rates stratified by reference diagnosis, the trends varied. The concordance rate of pleural fluid was significantly higher than that of peritoneal fluid in cases of small cell carcinoma and with findings of normal or reactive cells, and the concordance rate of peritoneal and pericardial fluids was higher than that of pleural fluid in cases of adenocarcinoma. These data highlight the difficulties in obtaining accurate diagnoses in effusion samples, especially pericardial fluid. This may be due to the rarity of pericardial fluid samples compared with pleural or peritoneal specimens, which might encourage a more cautious interpretation among observers.

The slide preparation type showed effects at both the general category and reference diagnosis level. At the general category level, there was a relatively small, but statistically significant, difference in concordance rates among the various preparations. The highest concordance rate was seen with ThinPrep slides (96.4%) and the lowest was with conventional slides (94.5%). This observation is consistent with the findings of other studies comparing ThinPrep to other methods.1,46  When looking at the general category concordance rate by fluid type, the only statistically significant changes in concordance rates were for pericardial fluid, with SurePath performing best (99.7%) and modified Giemsa slides performing the worst (76.7%). However, the relevance of that observation is minimal because nearly all of the pericardial fluid slides were adenocarcinoma cases, and none of the pericardial fluid adenocarcinoma cases were stained with modified Giemsa. At the reference diagnosis level, ThinPrep slides still had the highest concordance rate (76.8%); however, CytoSpin had the lowest rate (69.2%). For adenocarcinoma in pericardial fluid, SurePath preparations had significantly higher concordance rates compared to the other 3 preparation types. For lymphoma or hematopoietic malignancy cases, modified Giemsa slides had the highest concordance rates for all fluids, which further emphasizes the importance of performing a modified Giemsa stain on cases with potential lymphoma or other hematopoietic disease involvement. This result is not surprising because it is well known that modified Giemsa staining highlights cytoplasmic detail and lymphoglandular bodies in lymphomas, which are particularly important in cytologic evaluation of hematolymphoid cells.7  These results could indicate that ThinPrep preparations perform consistently among all diagnoses and fluid types, making it a good stain for primary screening and most diagnoses; however, the use of additional preparation types are encouraged, especially in more challenging cases.

Among the malignant reference diagnoses, small cell carcinoma had the highest general category concordance rate. When looking at the reference diagnosis by fluid type, small cell carcinoma had the highest concordance rate of all reference diagnoses for pleural fluid (83.3%) and the lowest rate for peritoneal fluid (45.1%). Small cell carcinomas are rarely found in fluid samples; however, when they are observed, it is usually in pleural fluid.8  That knowledge may have caused participants to exclude the correct response in other fluid types. Additionally, small cell carcinoma cells in effusions are difficult to distinguish from lymphocytes.9  In effusion samples, small cell carcinoma can present with a single-cell pattern and discohesive distribution, similar to lymphoma,8  and in turn, lymphoma can resemble benign lymphocytes in cases of chronic inflammation or infections such as tuberculosis.10  These strong similarities could easily lead participants away from accurate diagnoses, which was supported by the participant responses to small cell carcinoma cases in pericardial fluid. “Lymphoma/hematopoietic malignancy” and “lymphocytic effusion, indeterminate” were the 2 most common responses after small cell carcinoma and accounted for 34.4% of responses. In such cases, it may be helpful to perform a modified Giemsa stain on slides to more readily identify lymphocytic features.

Adenocarcinoma cases had the second highest concordance rate for the general category and the highest rate for reference diagnoses. This result is to be expected because it is the most frequent challenge in all 3 fluid types. However, adenocarcinoma can also be quite challenging to diagnose because there is substantial overlap of cytologic features between adenocarcinoma and other poorly differentiated non–small cell carcinomas.11  This difficulty is further highlighted by the participant interpretations of non–small cell carcinoma cases. The concordance rate of non–small cell carcinoma at the reference diagnosis level was only 19.9%, and the most frequent participant interpretation of those cases was adenocarcinoma for both pleural and peritoneal fluids. Mesothelioma is also frequently mistaken for adenocarcinoma. Mesotheliomas can present with increased cellularity and show cellular features such as 3-dimensional cell groups that are commonly mistaken for adenocarcinoma.3  In this study, adenocarcinoma accounted for 20.1% of participant interpretations for mesothelioma cases. In actual daily practice, the distinction between adenocarcinoma and mesothelioma requires knowledge of pleural invasion and support by immunohistochemical studies or tissue biopsy correlation. Although this type of information is usually provided for cases within the CAP NGC program, system-based limitations preclude detailed analysis of the database for the presence, absence, or details of that variable.

Taken together, the results of this study illustrate the challenges in achieving an accurate diagnosis in body fluid cytology, particularly in pericardial fluid. Overall, participants perform well and show a high concordance with the reference diagnosis, which probably reflects the high quality of slides in the program as well as the competence of the participants. The study also highlights the value of performing multiple preparations on certain samples to improve diagnostic accuracy, especially in cases of adenocarcinoma and lymphoma or hematopoietic malignancies.

1
Moriarty
AT,
Schwartz
MR,
Ducatman
BS,
et al
College of American Pathologists. A liquid concept—do classic preparations of body cavity fluid perform differently than ThinPrep cases?: observations from the College of American Pathologists Interlaboratory Comparison Program in Nongynecologic Cytology
.
Arch Pathol Lab Med
.
2008
;
132
(
11
):
1716
1718
.
2
Barkan
GA,
Laucirica
R,
Auger
M,
et al.
Performance characteristics of urinary tract cytology: observations from the College of American Pathologists Interlaboratory Comparison Program in Nongynecologic Cytopathology
.
Arch Pathol Lab Med
.
2015
;
139
(
8
):
1009
1013
.
3
Moriarty
AT,
Stastny
J,
Volk
EE,
Hughes
JH,
Miller
TR,
Wilbur
DC;
College of American Pathologists. Fluids—good and bad actors: observations from the College of American Pathologists Interlaboratory Comparison Program in Nongynecologic Cytology
.
Arch Pathol Lab Med
.
2004
;
128
(
5
):
513
518
.
4
Tabatabai
ZL,
Auger
M,
Kurtycz
DF,
et al.
Do liquid-based preparations of pulmonary bronchial brushing specimens perform differently from classically prepared cases for the diagnosis of malignancies?: observations from the College of American Pathologists Interlaboratory Comparison Program in Nongynecologic Cytology
.
Arch Pathol Lab Med
.
2015
;
139
(
2
):
178
183
.
5
Clayton
AC,
Bentz
JS,
Wasserman
PG,
et al
College of American Pathologists Cytopathology Resource Committee. Comparison of ThinPrep preparations to other preparation types in gastrointestinal cytology: observations from the College of American Pathologists Interlaboratory Comparison Program in Nongynecologic Cytology
.
Arch Pathol Lab Med
.
2010
;
134
(
8
):
1116
1120
.
6
Gabriel
C,
Achten
R,
Drijkoningen
M.
Use of liquid-based cytology in serous fluids: a comparison with conventional cytopreparatory techniques
.
Acta Cytol
.
2004
;
48
(
6
):
825
835
.
7
Gattuso
P,
Reddy
VB,
Masood
S.
Differential Diagnosis in Cytopathology
.
2nd ed. ed. Cambridge, United Kingdom: Cambridge University Press;
2015
.
8
Khalbuss
WE,
Yang
H,
Lian
Q,
Elhosseiny
A,
Pantanowitz
L,
Monaco
SE.
The cytomorphologic spectrum of small-cell carcinoma and large-cell neuroendocrine carcinoma in body cavity effusions: a study of 68 cases
.
Cytojournal
.
2011
;
8
:
18
.
9
Chhieng
DC,
Ko
EC,
Yee
HT,
Shultz
JJ,
Dorvault
CC,
Eltoum
IA.
Malignant pleural effusions due to small-cell lung carcinoma: a cytologic and immunocytochemical study
.
Diagn Cytopathol
.
2001
;
25
(
6
):
356
360
.
10
Cakir
E,
Demirag
F,
Aydin
M,
Erdogan
Y.
A review of uncommon cytopathologic diagnoses of pleural effusions from a chest diseases center in Turkey
.
Cytojournal
.
2011
;
8
:
13
.
11
Idowu
MO,
Powers
CN.
Lung cancer cytology: potential pitfalls and mimics—a review
.
Int J Clin Exp Pathol
.
2010
;
3
(
4
):
367
385
.

Author notes

Supplemental digital content is available for this article at www.archivesofpathology.org in the January 2018 table of contents.

The authors have no relevant financial interest in the products or companies described in this article.

All authors are current or past members of the College of American Pathologists Cytopathology Committee.

Competing Interests

Presented in part as a poster at the 26th Annual Meeting of the United States and Canadian Academy of Pathology; March 16, 2016; Seattle, Washington.

Supplementary data