Context.—The quality of diagnostic accuracy studies is determined by 2 key factors: risk of bias and comparability. Bias can distort accuracy estimates and poor reporting impairs comparability. While diagnostic accuracy studies for fine-needle aspiration cytology (FNAC) are frequently published, the methodologic issues associated with this body of literature have never been reviewed.

Objective.—To assess the quality of design and reporting of diagnostic test accuracy studies in FNAC.

Data Sources.—Diagnostic accuracy studies were identified by a Medline (US National Library of Medicine) search. Sixty-four FNAC diagnostic test accuracy studies were randomly selected for structured review with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) survey. Studies were divided between 2 time periods: 2000-2001 and 2009-2011.

Conclusions.—Diagnostic test accuracy studies of FNAC suffer from numerous deficiencies in study design, which negatively affect the reliability of accuracy estimates.

Reliable estimates of accuracy are important for any diagnostic test. Such estimates determine the benefit of the test relative to other diagnostic methods and guide the development of diagnostic algorithms. Diagnostic studies are subject to unique sources of bias that can distort estimates of sensitivity and specificity. Thus, risk of bias is an important dimension of study quality.

Studies are useful only if their results can be applied to a particular clinical problem. Clinical problems are typically formulated with the PICO (population, index test, comparator, and outcome) format. Thus, to determine whether the results of a study can be applied to a clinical problem, one must determine whether the PICO dimensions of a study are applicable to the clinical problem. There are many factors that can affect diagnostic accuracy. This comparison can only be done if a study provides a complete description of the PICO parameters. Thus, comparability or quality of reporting is a second dimension of study quality. Quality of reporting also affects the ability to assess risk of bias.

In recent years, there has been increasing awareness of deficiencies in study design and reporting in diagnostic test accuracy studies.15  The companion article by Schmidt and Factor in this issue explains the theory that underlies the concerns regarding bias and quality of reporting in diagnostic accuracy studies. The article also provides a framework for evaluating the value of a diagnostic accuracy study with respect to a particular clinical question. In this article, we illustrate those principles by evaluating the quality of accuracy studies for a specific diagnostic test: fine-needle aspiration cytology (FNAC). Diagnostic accuracy studies of FNAC are frequently published in the cytology literature, but a systematic review of the methodology and quality of reporting of this body of literature has never been undertaken. Our objective was to conduct a broad survey of FNAC diagnostic test accuracy studies to identify common problems in study design or reporting. To that end, we used the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) survey instrument to evaluate a random selection of studies.

See also p 558.

Study Design and Rationale

Our objective was 2-fold. First, we wanted to capture a broad selection of FNAC diagnostic accuracy studies, not restricted to particular tissue types or to specific journals. Second, we questioned whether more recent studies, in the advent of quality assurance measures such as Standards for Reporting of Diagnostic Accuracy (STARD) and QUADAS, had more improved methods of reporting than older studies. For this reason, we selected studies from 2 time periods (2000–2001 and 2009–2011). We conducted a literature search to identify all FNAC diagnostic accuracy studies within the designated time periods. A random sample was then selected from each time period to provide a broad spectrum of studies.

Literature Search

Potentially relevant diagnostic accuracy studies were identified by a Medline (US National Library of Medicine) search using the search terms fine-needle or fine-needle biopsy AND sensitivity and specificity. Studies were identified in 2 distinct time periods: 2000–2001 (period 1) and 2009–2011 (period 2). Studies were limited to English language.

Study Eligibility

The titles and abstracts of the studies identified by electronic search were screened for eligibility by one author (R.L.S.). Full reports were obtained for all articles that passed the initial screen. Studies were eligible for inclusion if they contained more than 10 cases with numerical data on diagnostic accuracy (sensitivity and specificity) for FNAC. Studies were not limited by anatomic site, journal, or specialty.

Study Selection

Thirty-two articles were randomly selected from each time period (2000-2001 and 2009-2011).

Survey Development

Studies were evaluated with the QUADAS6,7  instrument. QUADAS is a validated7  survey instrument that was developed in 20036 and was designed to be used to evaluate the quality (ie, quality of reporting and threats of bias) of diagnostic accuracy studies in systematic reviews. It has gained increasing acceptance in meta-analysis of diagnostic test accuracy studies and is now widely used to assess study quality. For example, the Cochrane Collaboration Diagnostic Test Accuracy Group recommends that researchers use QUADAS for assessment of study quality.8 

Each of the QUADAS survey questions are designed to assess a specific methodologic issue by using a generic framework for quality assessment, but it cannot anticipate all of the factors that are required to assess studies in a particular context. It may be necessary for researchers to develop specific criteria to evaluate each QUADAS question in a particular context.8  For example, a QUADAS survey question addressing the adequacy of method reporting will be different for a study using molecular techniques versus one using FNAC. The Cochrane Handbook8  recommends a common subset of survey items but provides a list of potential additional items. Following this guideline, we used the basic set of QUADAS items recommended by the Cochrane Handbook and added questions that we felt were important in the context of FNAC diagnostic accuracy studies. An explanation of each QUADAS survey item and the associated evaluation criteria for our survey of FNAC studies is presented in the Appendix.

The criteria for each QUADAS item were refined by using an iterative process as recommended by the Cochrane Handbook.8  We tested the criteria on a subset of studies to ensure that the criteria were objective and were interpreted the same way by all evaluators. We had several test runs on a small number of studies before finalizing the assessment criteria. We then evaluated the entire set of 64 articles by using the finalized criteria.

Study Evaluation

Each study was independently evaluated by 3 authors (R.E.F., R.L.S., B.L.W.) using a set of assessment criteria (see Appendix) developed for each QUADAS item.

Score interpretation: Although the scales for each item are the same, the criteria used to evaluate each item are different such that the scores of different items are not comparable. Higher scores generally indicate higher quality and we are able to offer a qualitative assessment of quality, based on the distribution of scores within each item.

Statistical Analysis

Survey data were recorded in a database (Microsoft Access 2010 Microsoft Corp, Redmond, Washington) and all statistical analysis was conducted with Stata Statistical Software release 12 (Stata Corp, College Station, Texas). Agreement was measured with percentage agreement and Cohen κ. Differences in QUADAS scores over time were assessed by item with the nonparametric Mann-Whitney rank sum test. A global change in all items over time was assessed by probit analysis. The impact of journal type was assessed by using probit analysis and adjusting for the impact of survey item. Results were considered statistically significant if P < .05. Bonferroni corrections were used for multiple comparisons. A sample size (n = 32) was selected to ensure 80% power for detecting a 20% change in the overall quality score between 2 different time periods.

Search

We identified 278 studies in period 1 (2000–2001) and 115 studies in period 2 (2009–2011). These were screened to obtain 81 and 73 eligible studies for periods 1 and 2, respectively, (Figure 1). We randomly selected 32 studies from each time period for evaluation.971 

Figure 1.

Flow diagram for study design. The figure shows how studies were identified and allocated to evaluators. Each evaluator received 64 studies (32 from each time period). The QUADAS (Quality Assessment of Diagnostic Accuracy Studies) criteria were reviewed and revised twice before the final evaluation.

Figure 1.

Flow diagram for study design. The figure shows how studies were identified and allocated to evaluators. Each evaluator received 64 studies (32 from each time period). The QUADAS (Quality Assessment of Diagnostic Accuracy Studies) criteria were reviewed and revised twice before the final evaluation.

Close modal

Characteristics of Included Studies

The most common tissue types were breast, lymph nodes, salivary glands, and thyroid, which accounted for 14%, 17%, 14%, and 24% of the total studies, respectively, (Table 1). Diagnostic accuracy studies of FNAC are published in several different categories of journals, including pathology journals, specialized cytopathology journals, surgery journals, and general medicine journals. The included studies were mostly drawn from specialized cytology journals, surgery journals, and general medical journals, which were almost equally represented in our sample (Table 2). Our sample included a relatively small proportion of studies published in pathology journals. The included studies were mainly conducted in the United States and Europe (Table 3). Our sample included 54 retrospective studies, 9 prospective studies, and 1 that could not be determined. Prospective studies were spread evenly over tissue types and publication period, but appeared more frequently in medical journals.

Table 1.

Tissue Types of Included Studies

Tissue Types of Included Studies
Tissue Types of Included Studies
Table 2.

Journal Types of Included Studies

Journal Types of Included Studies
Journal Types of Included Studies
Table 3.

Distribution of Study Locations for Included Studies

Distribution of Study Locations for Included Studies
Distribution of Study Locations for Included Studies

Quality Assessment

The average quality score for each QUADAS item for both time periods is presented in Table 4. The distribution of scores aggregated over both time periods is presented in Figure 2. Our findings are summarized below for each domain.

Table 4.

Quality Assessment of Diagnostic Accuracy Studies (QUADAS) Score Summarya

Quality Assessment of Diagnostic Accuracy Studies (QUADAS) Score Summarya
Quality Assessment of Diagnostic Accuracy Studies (QUADAS) Score Summarya
Figure 2.

Summary of QUADAS (Quality Assessment of Diagnostic Accuracy Studies) survey results. Each bar shows the relative proportion of yes, unclear, and no scores for each item. Scores for each item use different criteria and are not directly comparable, but all items use the same ordinal scale in which “yes” indicates higher quality and “no” indicates lower quality.

Figure 2.

Summary of QUADAS (Quality Assessment of Diagnostic Accuracy Studies) survey results. Each bar shows the relative proportion of yes, unclear, and no scores for each item. Scores for each item use different criteria and are not directly comparable, but all items use the same ordinal scale in which “yes” indicates higher quality and “no” indicates lower quality.

Close modal

Patient Selection

Forty-five percent of the studies in our sample had inadequate population descriptions, probably because most studies in our sample were retrospective, and these studies often fail to describe patterns of patient referral. We found no significant difference in reporting over time.

Index Test

Only 30% of the studies in our sample provided any description of the diagnostic criteria used. Nearly all (91%) failed to report whether clinical information was available to the cytologist. Overall, studies specified anywhere from 0 to 14 of 17 parameters that could be used to describe the index test. On average, studies reported on only 4 or 5 test parameters, and only 7 studies described 9 or more of the 17 possible test parameters. In addition, we found that reporting was quite variable: the parameters that were reported were often not consistent even within studies conducted at 1 anatomic location.

Reference Test

All studies used histopathology and clinical follow-up as a reference standard. These are generally regarded as accurate; however, no studies mentioned the interrater reliability of the test. Blinding of the reference test was specifically mentioned in only 3 studies (4.7%). We found that 10 studies explicitly incorporated FNAC results as a condition for the final diagnosis. Very few studies described the time period between the index test (FNAC) and the reference test (histopathology or clinical follow-up). Only 2 of 64 studies provided any description of the methods associated with the reference test. There was a statistically significant increase in the percentage of studies with incorporation bias over time (Appendix, item 13; P = .007). The increase in incorporation was positively correlated to the number of articles on FNAC of mediastinal lymph nodes. Otherwise no change was seen over time.

Patient Flows

Patient flows were often difficult to ascertain from the descriptions provided in the “Methods” section. We found that 60% of the studies in our sample had partial verification (Appendix, item 4); nearly 80% of the studies in our sample had differential verification (Appendix, item 5), and only 30% of the studies provided an adequate description of withdrawals. Flow diagrams were only provided in 8 studies (12.5%) and they were sometimes incomplete. Studies generally reported indeterminate results (78%), but the manner in which these results were used varied widely. For example, inadequate results were sometimes not mentioned at all, or were excluded from analysis without specifying the number of cases. In some studies, atypical and suspicious cases were added to positive cases for analysis, but the number of cases was not specified. There was an increase in the number of studies reporting withdrawals (P = .05) between the 2 time periods; however, the increase was not statistically significant after Bonferroni adjustment.

Factors Associated With Study Quality

Changes in Study Quality Over Time

Overall, there was no indication of a general change in study quality over time (P = .81).

Effect of Journal Type

Scores were significantly lower for studies published in surgery journals than for other journal categories (P = .009).

Effect of Type of Study

Prospective studies were higher quality than retrospective studies, demonstrating significantly higher QUADAS scores for items 1 (P = .02), 4 (P = .009), 8 (P < .001), and 13 (P = .08) in the Appendix. Prospective studies also never had QUADAS scores lower than retrospective studies.

Interrater Agreement

Interrater agreement was assessed for each QUADAS item by using both percentage agreement and Cohen κ (Table 5), which found an agreement of 92%.

Table 5.

Interrater Agreementa

Interrater Agreementa
Interrater Agreementa

We applied the QUADAS criteria to a random selection of FNAC diagnostic accuracy studies in order to provide a representative overview of the FNAC diagnostic accuracy literature.

Our sample included a broad spectrum of studies from different organ systems. Studies were performed by a range of author types, and came from a variety of journals from numerous countries. We applied modified QUADAS criteria to the studies by using independent assessments by 3 authors and obtained a high level of agreement. We therefore believe our evaluation of these studies was both reproducible and reliable. The questions in the QUADAS survey are designed to assess 2 broad areas of methodologic quality: risk of bias and study comparability. We discuss our results below with respect to each domain of the PICO format.

Population (Patient Selection and Description)

A study is useful when it is applicable to both particular clinical situations and populations in other studies. Ultimately, studies in diagnostic accuracy should be comparable to one another, and this is enabled by informative reporting of populations and study design. A comparable study is said to have external validity.

Overall, the studies in our survey did a poor job of reporting patient selection. As described in the companion review by Schmidt and Factor in this issue, assessment of spectrum bias depends on a comparison of patient populations. Eighty percent of the studies in our sample were retrospective and lacked comparable information such as severity of disease, reason for referral, or whether prior testing had been performed. In contrast, prospective studies were generally designed to investigate the accuracy of FNAC in a much more specific population of patients (eg, those with resectable peripheral lung lesions less than 3 cm in diameter with indeterminate imaging). Overall, most studies lacked adequate information to assess external validity.

Index Test

Fine-needle aspiration cytology is a complex, multistep process that can be performed in a variety of ways. The accuracy of the test may depend on methodology. For example, results may differ by needle size, the number of passes, the experience of the person performing the biopsy, the use of guidance, the availability of rapid onsite specimen evaluation, staining methods, the use of ancillary methods, the experience of the pathologist reading the slides, and others. To make valid comparisons between studies, these need to be clearly reported. In our survey of FNAC studies, we found considerable variability in the reporting of methodology.

The availability of clinical information can also influence test interpretation. From a clinical perspective, the ideal way to perform fine-needle aspirations is to have information about the patient's history before the test. Sometimes this information is not available, and sometimes it is, but the cytopathologist is blinded to the information for the purpose of the study. There is no way to know this if studies neglect to report this information. From a research perspective, variable reporting of clinical history causes difficulties in comparing diagnostic accuracy studies. We found that most studies did not report whether cytologic assessment was blinded to clinical information.

Fine-needle aspiration cytology can be subject to significant interrater differences72  due to errors or threshold effects. Errors arise when a cytopathologist fails to appreciate a feature that is present (or sees one that is not present). Threshold effects arise when cytopathologists use different criteria to classify cases. This will affect sensitivity and specificity. Renshaw et al73  demonstrated that threshold effects account for a significant proportion of interobserver differences in surgical pathology specimens. Potential threshold differences can only be identified if the criteria for diagnosis are described or referenced in a study. In our survey of FNAC studies, few reported diagnostic criteria.

Reference Test

Classification bias results from an imperfect reference test. Classification bias can falsely depress estimates of both sensitivity and specificity, depending upon the disease prevalence. The impact of classification bias can be quite substantial and misleading. Although the gold standard of histopathology and/or clinical follow-up is generally regarded as accurate, these tests are imperfect and can vary from site to site (depending on the skill of the pathologist) or between tissue types. Some indication of the reliability of the gold standard should be stipulated in accuracy studies to assess the potential for classification bias. An alternative is to reference the reliability of the gold standard in the literature or the interrater reliability at a particular site. None of the studies in our sample addressed this issue.

The time interval between the index test and the reference test can affect accuracy. A long interval between FNAC and surgical follow-up may allow for disease progression. Consequently, the reference test will detect more positive cases if the interval between the index test and reference test is long. On the other hand, a sufficiently long period is required for medical follow-up of negative cases. Thus, it is important for researchers to report the time interval between the index test and reference test. We found that researchers rarely reported the time interval to follow-up.

In clinical practice, the surgical pathologist is often aware of cytology results before verifying a case and uses this information like an ancillary test. Having this information is helpful clinically but can confound the results of the reference standard in diagnostic accuracy studies. This is known as review bias. While we are not aware of any studies that have assessed the influence of FNAC results on the interpretation of histopathology, it is an empirical problem with reporting results, especially for retrospective studies. While prospective studies can be designed to incorporate blinded appraisal as part of the study, the results in retrospective studies have already been established. To overcome this limitation, one solution is to reassess material by using blinded appraisal by pathologists unfamiliar with the cases as malignant or benign. At the very least, studies should mention this as a possible limitation. In our survey, studies rarely reported whether pathologists were blinded to results, or discussed this as a possible limitation.

We also found that few studies provided any description of the reference test. The histologic reference test often appears to be taken for granted as a gold standard, but variability is possible due to differences in fixation, ancillary tests, the experience level of the pathologist interpreting the slides, etc. Some mention of how the reference test was interpreted would improve comparability of studies.

Incorporation bias occurs when the reference test explicitly incorporates the results of the index test as part of the criteria. In our survey, we found several examples of incorporation bias particularly in endobronchial ultrasound FNAC studies of the lung and mediastinum, in which positive FNAC results were used as the reference standard. Such studies cannot reliably report a sensitivity or positive predictive value. Since histologic verification is not always possible, alternative means of reporting include either citing evidence that the predictive value of FNAC mediastinal nodes is almost perfect, or confirming the FNAC results by independent, blinded cytology review. This is a serious design flaw in these studies. While this may be acceptable clinical practice, it is problematic in the context of a diagnostic accuracy study.

Patient Flow

Partial verification was a common problem in our study survey. Most studies used a retrospective design. With this method, cases are obtained from surgery records, which excludes patients who are not referred to surgery. This leads to biased estimates of sensitivity and specificity. In the context of FNAC, partial verification will generally cause a positive bias in estimates of sensitivity and a negative bias in specificity. The impact of this type of bias can be significant. Partial verification bias can be corrected by accounting for all patients who received the index test (FNAC). An “ideal” study would use the same reference standard for all patients; however, it is not practical or ethical to refer all patients to surgery. An alternative would be to provide clinical follow-up of patients who had negative FNAC results, and at a minimum, studies should provide a flow diagram to indicate the number of patients who received the index test and the number who were verified. Though methods are available to estimate the magnitude of bias due to partial verification, none of the studies we reviewed included this as a topic for discussion. Our impression is that the impact of partial verification is not appreciated and that this source of bias is common in the cytology literature.

We also found that withdrawals are poorly reported. A withdrawal is defined as a patient who receives the index test and is referred for verification but fails to receive verification. Withdrawals can distort accuracy estimates if the number of withdrawals is high and if the population of withdrawals differs from the population that receives verification. The impact of withdrawals is difficult to predict between studies but would be aided by improved reporting.

Accuracy statistics are influenced by the way in which inadequate and indeterminate results are handled. For example, some studies fail to mention whether there were inadequate results; some exclude inadequate results from accuracy calculations; and others categorize inadequate results as false negatives. Studies also frequently differ in the way that they handle “atypical,” “suspicious,” or “indeterminate” results. This variability in reporting makes comparison of studies very challenging if not impossible. In our survey, there was inconsistency in reporting. For purposes of comparability, it would be preferable for studies to report each diagnostic category obtained for the index test (fine-needle aspiration) rather than report aggregate statistics, which combine categories. This would enable researchers to compare studies, apply their own assumptions, and apply consistent methods when conducting meta-analyses of diagnostic test accuracy studies.

Indeterminate results can also be a source of bias. The rate of indeterminate diagnoses can reflect different diagnostic thresholds. For example, one study might classify many difficult borderline cases as indeterminate, whereas another study might classify such cases as malignant or benign. One would expect the diagnostic accuracy to be higher in a study in which difficult cases were generally referred. Thus, a large difference in referral rates can distort accuracy estimates and present a source of bias.

Our results indicate that significant deficiencies are common in the design and reporting of FNAC diagnostic test accuracy studies. Our findings are summarized in Table 6.

Table 6.

Summary of Findings

Summary of Findings
Summary of Findings

We found several sources of bias that could distort estimates of sensitivity and specificity. We found several significant issues related to patient flows that present common and significant source of bias. Overall, we believe the issues related to patient flow present the highest risk of bias. The quality of reporting with respect to patient selection, FNAC method description, and indeterminates (patient flow) present the greatest concerns for comparability.

The high prevalence of several significant sources of bias in cytopathology is an important finding. It is possible that many estimates of sensitivity and specificity in the current literature are affected. These estimates are used to guide clinical decisions and inform guideline development. Thus, it is important that researchers become aware of these problems to improve the design of future studies. While no study is perfect, researchers should take steps to improve study designs to reduce the risk of bias. At a minimum, researchers should be aware of study limitations and provide estimates of bias in accuracy estimates. Similarly, those who depend on diagnostic accuracy estimates need to be aware of the potential for bias in data derived from FNAC accuracy studies. To our knowledge, the issues surrounding bias in cytopathology studies have not been appreciated. We recently found that only a small fraction of the studies in the existent literature qualified for inclusion in a Cochrane review on diagnostic accuracy owing to issues with bias.

Our study has several strengths and limitations. The strengths include being the first study to systematically examine methodologic issues across a wide range of FNAC diagnostic accuracy studies. We randomly selected studies from 2 time periods and a range of countries, journal types, and tissues. Thus, the sample is likely to be representative. We used QUADAS, which is a widely used and validated tool for quality assessment of diagnostic studies. We took care to develop objective criteria to assess the QUADAS items and obtained a high level of interrater agreement. Thus, we believe our assessments are accurate. Limitations include having a small sample size, which may not have provided sufficient power to detect small differences in item assessments, and for some of the items, we used simple assessment criteria that were unambiguous but sacrificed some information content.

Diagnostic test accuracy studies of FNAC suffer from numerous deficiencies in study design, which negatively affect the reliability of accuracy estimates. This has important implications both in the assessment of individual studies and in the comparison of collected studies.

1
Smidt
N
,
Rutjes
AWS
,
van der Windt
DAWM
,
et al
.
The quality of diagnostic accuracy studies since the STARD statement: has it improved?
Neurology
.
2006;
67
(
5
):
792
797
.
2
Smidt
N
,
Rutjes
AWS
,
van der Windt
DAWM
,
et al
.
Quality of reporting of diagnostic accuracy studies
.
Radiology
.
2005;
235
(
2
):
347
353
.
3
Whiting
P
,
Rutjes
AWS
,
Dinnes
J
,
Reitsma
JB
,
Bossuyt
PMM
,
Kleijnen
J
.
A systematic review finds that diagnostic reviews fail to incorporate quality despite available tools
.
J Clin Epidemiol
.
2005;
58
(
1
):
1
12
.
4
Lijmer
JG
,
Mol
BW
,
Heisterkamp
S
,
et al
.
Empirical evidence of design-related bias in studies of diagnostic tests [erratum in JAMA. 2000;283(15):1963]
.
JAMA
.
1999;
282
(
11
):
1061
1066
.
5
Whiting
P
,
Rutjes
AWS
,
Reitsma
JB
,
Glas
AS
,
Bossuyt
PMM
,
Kleijnen
J
.
Sources of variation and bias in studies of diagnostic accuracy: a systematic review
.
Ann Intern Med
.
2004;
140
(
3
):
189
202
.
6
Whiting
P
,
Rutjes
AWS
,
Reitsma
JB
,
Bossuyt
PMM
,
Kleijnen
J
.
The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews
.
BMC Med Res Methodol
.
2003;
3
:
25
.
7
Whiting
PF
,
Weswood
ME
,
Rutjes
AWS
,
Reitsma
JB
,
Bossuyt
PNM
,
Kleijnen
J
.
Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies
.
BMC Med Res Methodol
.
2006;
6
:
9
.
8
Deeks
JJBP
,
Gatsonis
C
,
eds
.
Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0.0
.
The Cochrane Collaboration;
2009
.
9
Amrikachi
M
,
Ramzy
I
,
Rubenfeld
S
,
Wheeler
TM
.
Accuracy of fine-needle aspiration of thyroid
.
Arch Pathol Lab Med
.
2001;
125
(
4
):
484
488
.
10
Ashraf
MJ
,
Azarpira
N
,
Nowroozizadeh
B
,
et al
.
Fine needle aspiration cytology of palatine tonsils: a study of 112 consecutive adult tonsillectomies
.
Cytopathology
.
2010;
21
(
3
):
170
175
.
11
Asimacopoulos
EP
,
Berry
M
,
Garfield
B
,
et al
.
The diagnostic efficacy of fine-needle aspiration using cytology and culture in tuberculous lymphadenitis
.
Int J Tuberc Lung Dis
.
2010;
14
(
1
):
93
98
.
12
Balaji
J
,
Sundaram
SS
,
Rathinam
SN
,
Rajeswari
PA
,
Kumari
MLV
.
Fine needle aspiration cytology in childhood TB lymphadenitis
.
Indian J Pediatr
.
2009;
76
(
12
):
1241
1246
.
13
Bargren
AE
,
Meyer-Rochow
GY
,
Sywak
MS
,
Delbridge
LW
,
Chen
H
,
Sidhu
SB
.
Diagnostic utility of fine-needle aspiration cytology in pediatric differentiated thyroid cancer
.
World J Surg
.
2010;
34
(
6
):
1254
1260
.
14
Bartels
S
,
Talbot
JM
,
DiTomasso
J
,
et al
.
The relative value of fine-needle aspiration and imaging in the preoperative evaluation of parotid masses
.
Head Neck
.
2000;
22
(
8
):
781
786
.
15
Berger
LPV
,
Scheffer
RCH
,
Weusten
BLAM
,
et al
.
The additional value of EUS-guided Tru-cut biopsy to EUS-guided FNA in patients with mediastinal lesions
.
Gastrointest Endosc
.
2009;
69
(
6
):
1045
1051
.
16
Bezabih
M
.
Cytological diagnosis of soft tissue tumours
.
Cytopathology
.
2001;
12
(
3
):
177
183
.
17
Bonzanini
M
,
Gilioli
E
,
Brancato
B
,
et al
.
The cytopathology of ductal carcinoma in situ of the breast: a detailed analysis of fine needle aspiration cytology of 58 cases compared with 101 invasive ductal carcinomas
.
Cytopathology
.
2001;
12
(
2
):
107
119
.
18
Brennan
PA
,
Davies
B
,
Poller
D
,
et al
.
Fine needle aspiration cytology (FNAC) of salivary gland tumours: repeat aspiration provides further information in cases with an unclear initial cytological diagnosis
.
Br J Oral Maxillofac Surg
.
2010;
48
(
1
):
26
29
.
19
Brooks
AD
,
Shaha
AR
,
DuMornay
W
,
et al
.
Role of fine-needle aspiration biopsy and frozen section analysis in the surgical management of thyroid tumors
.
Ann Surg Oncol
.
2001;
8
(
2
):
92
100
.
20
Chao
T-Y
,
Chien
M-T
,
Lie
C-H
,
Chung
Y-H
,
Wang
J-L
,
Lin
M-C
.
Endobronchial ultrasonography-guided transbronchial needle aspiration increases the diagnostic yield of peripheral pulmonary lesions: a randomized trial
.
Chest
.
2009;
136
(
1
):
229
236
.
21
Civardi
G
,
Vallisa
D
,
Berte
R
,
et al
.
Ultrasound-guided fine needle biopsy of the spleen: high clinical efficacy and low risk in a multicenter Italian study
.
Am J Hematol
.
2001;
67
(
2
):
93
99
.
22
Costas
A
,
Castro
P
,
Martin-Granizo
R
,
Monje
F
,
Marron
C
,
Amigo
A
.
Fine needle aspiration biopsy (FNAB) for lesions of the salivary glands
.
Br J Oral Maxillofac Surg
.
2000;
38
(
5
):
539
542
.
23
Daneshbod
Y
,
Daneshbod
K
,
Khademi
B
.
Diagnostic difficulties in the interpretation of fine needle aspirate samples in salivary lesions: diagnostic pitfalls revisited
.
Acta Cytol
.
2009;
53
(
1
):
53
70
.
24
de Vos tot Nederveen Cappel
RJ
,
Bouvy
ND
,
Bonjer
HJ
,
van Muiswinkel
JM
,
Chadha
S
.
Fine needle aspiration cytology of thyroid nodules: how accurate is it and what are the causes of discrepant cases?
Cytopathology
.
2001;
12
(
6
):
399
405
.
25
Dedivitis
RA
,
de Carvalho
MB
,
Rapoport
A
.
Transcutaneous fine needle aspiration biopsy of the preepiglottic space
.
Acta Cytol
.
2000;
44
(
2
):
158
162
.
26
DeWitt
J
,
Gress
FG
,
Levy
MJ
,
et al
.
EUS-guided FNA aspiration of kidney masses: a multicenter U.S. experience
.
Gastrointest Endosc
.
2009;
70
(
3
):
573
578
.
27
Dobrinja
C
,
Trevisan
G
,
Liguori
G
,
Romano
A
,
Zanconati
F
.
Sensitivity evaluation of fine-needle aspiration cytology in thyroid lesions
.
Diagn Cytopathol
.
2009;
37
(
3
):
230
235
.
28
Eloubeidi
MA
,
Black
KR
,
Tamhane
A
,
Eltoum
IA
,
Bryant
A
,
Cerfolio
RJ
.
A large single-center experience of EUS-guided FNA of the left and right adrenal glands: diagnostic utility and impact on patient management
.
Gastrointstl Endosc
.
2010;
71
(
4
):
745
753
.
29
Fassina
AS
,
Borsato
S
,
Fedeli
U
.
Fine needle aspiration cytology (FNAC) of adrenal masses
.
Cytopathology
.
2000;
11
(
5
):
302
311
.
30
Fernandez-Villar
A
,
Botana
M
,
Leiro
V
,
Gonzalez
A
,
Represas
C
,
Ruano-Ravina
A
.
Validity and reliability of transbronchial needle aspiration for diagnosing mediastinal adenopathies
.
BMC Pulm Med
.
2010;
10
:
24
.
31
Fritscher-Ravens
A
,
Soehendra
N
,
Schirrow
L
,
et al
.
Role of transesophageal endosonography-guided fine-needle aspiration in the diagnosis of lung cancer
.
Chest
.
2000;
117
(
2
):
339
345
.
32
Greaves
TS
,
Olvera
M
,
Florentine
BD
,
et al
.
Follicular lesions of thyroid: a 5-year fine-needle aspiration experience
.
Cancer
.
2000;
90
(
6
):
335
341
.
33
Haberal
AN
,
Toru
S
,
Ozen
O
,
Arat
Z
,
Bilezikci
B
.
Diagnostic pitfalls in the evaluation of fine needle aspiration cytology of the thyroid: correlation with histopathology in 260 cases
.
Cytopathology
.
2009;
20
(
2
):
103
108
.
34
Hatada
T
,
Ishii
H
,
Ichii
S
,
Okada
K
,
Fujiwara
Y
,
Yamamura
T
.
Diagnostic value of ultrasound-guided fine-needle aspiration biopsy, core-needle biopsy, and evaluation of combined use in the diagnosis of breast lesions
.
J Am Coll Surg
.
2000;
190
(
3
):
299
303
.
35
Hikichi
T
,
Irisawa
A
,
Bhutani
MS
,
et al
.
Endoscopic ultrasound-guided fine-needle aspiration of solid pancreatic masses with rapid on-site cytological evaluation by endosonographers without attendance of cytopathologists
.
J Gastroenterol
.
2009;
44
(
4
):
322
328
.
36
Hirdes
MMC
,
Schwartz
MP
,
Tytgat
KMAJ
,
et al
.
Performance of EUS-FNA for mediastinal lymphadenopathy: impact on patient management and costs in low-volume EUS centers
.
Surg Endosc
.
2010;
24
(
9
):
2260
2267
.
37
Hur
J
,
Lee
H-J
,
Nam
JE
,
et al
.
Diagnostic accuracy of CT fluoroscopy-guided needle aspiration biopsy of ground-glass opacity pulmonary lesions
.
AJR Am J Roentgenol
.
2009;
192
(
3
):
629
634
.
38
Ibrahim
AE
,
Bateman
AC
,
Theaker
JM
,
et al
.
The role and histological classification of needle core biopsy in comparison with fine needle aspiration cytology in the preoperative assessment of impalpable breast lesions
.
J Clin Pathol
.
2001;
54
(
2
):
121
125
.
39
Jafari
A
,
Royer
B
,
Lefevre
M
,
Corlieu
P
,
Perie
S
,
St Guily
JL
.
Value of the cytological diagnosis in the treatment of parotid tumors
.
Otolaryngol Head Neck Surg
.
2009;
140
(
3
):
381
385
.
40
Jimenez-Heffernan
JA
,
Vicandi
B
,
Lopez-Ferrer
P
,
Hardisson
D
,
Viguer
JM
.
Value of fine needle aspiration cytology in the initial diagnosis of Hodgkin's disease: analysis of 188 cases with an emphasis on diagnostic pitfalls
.
Acta Cytolog
.
2001;
45
(
3
):
300
306
.
41
Jung
J
,
Park
H
,
Park
J
,
Kim
H
.
Accuracy of preoperative ultrasound and ultrasound-guided fine needle aspiration cytology for axillary staging in breast cancer
.
ANZ J Surg
.
2010;
80
(
4
):
271
275
.
42
Kim
A
,
Lee
J
,
Choi
JS
,
Won
NH
,
Koo
BH
.
Fine needle aspiration cytology of the breast. Experience at an outpatient breast clinic
.
Acta Cytolog
.
2000;
44
(
3
):
361
367
.
43
Lee
YT
,
Lai
LH
,
Sung
JJY
,
Ko
FWS
,
Hui
DSC
.
Endoscopic ultrasonography-guided fine-needle aspiration in the management of mediastinal diseases: local experience of a novel investigation
.
Hong Kong Med J
.
2010;
16
(
2
):
121
125
.
44
Lin
HS
,
Komisar
A
,
Opher
E
,
Blaugrund
SM
.
Follicular variant of papillary carcinoma: the diagnostic limitations of preoperative fine-needle aspiration and intraoperative frozen section evaluation
.
Laryngoscope
.
2000;
110
(
9
):
1431
1436
.
45
Liu
F-H
,
Liou
M-J
,
Hsueh
C
,
Chao
T-C
,
Lin
J-D
.
Thyroid follicular neoplasm: analysis by fine needle aspiration cytology, frozen section, and histopathology
.
Diagn Cytopathol
.
2010;
38
(
11
):
801
805
.
46
Ljung
BM
,
Drejet
A
,
Chiampi
N
,
et al
.
Diagnostic accuracy of fine-needle aspiration biopsy is determined by physician training in sampling technique
.
Cancer
.
2001;
93
(
4
):
263
268
.
47
Lumachi
F
,
Borsato
S
,
Brandes
AA
,
et al
.
Fine-needle aspiration cytology of adrenal masses in noncancer patients: clinicoradiologic and histologic correlations in functioning and nonfunctioning tumors
.
Cancer
.
2001;
93
(
5
):
323
329
.
48
Martinez-Onsurbe
P
,
Ruiz Villaespesa A, Sanz Anquela JM, Valenzuela Ruiz PL. Aspiration cytology of 147 adnexal cysts with histologic correlation
.
Acta Cytolog
.
2001;
45
(
6
):
941
947
.
49
Meda
BA
,
Buss
DH
,
Woodruff
RD
,
et al
.
Diagnosis and subclassification of primary and recurrent lymphoma: the usefulness and limitations of combined fine-needle aspiration cytomorphology and flow cytometry
.
Am J Clin Pathol
.
2000;
113
(
5
):
688
699
.
50
Meng
MV
,
Cha
I
,
Ljung
BM
,
Turek
PJ
.
Testicular fine-needle aspiration in infertile men: correlation of cytologic pattern with biopsy histology
.
Am J Surg Pathol
.
2001;
25
(
1
):
71
79
.
51
Moller
K
,
Papanikolaou
IS
,
Toermer
T
,
et al
.
EUS-guided FNA of solid pancreatic masses: high yield of 2 passes with combined histologic-cytologic analysis
.
Gastrointest Endosc
.
2009;
70
(
1
):
60
69
.
52
Newkirk
KA
,
Ringel
MD
,
Jelinek
J
,
et al
.
Ultrasound-guided fine-needle aspiration and thyroid disease
.
Otolaryngol Head Neck Surg
.
2000;
123
(
6
):
700
705
.
53
Ng
VY
,
Thomas
K
,
Crist
M
,
Wakely
PE
Jr,
Mayerson
J
.
Fine needle aspiration for clinical triage of extremity soft tissue masses
.
Clin Orthop Relat Res
.
2010;
468
(
4
):
1120
1128
.
54
Pauzar
B
,
Staklenac
B
,
Loncar
B
.
Fine needle aspiration biopsy of follicular thyroid tumors
.
Coll Antropol
.
2010;
34
(
1
):
87
91
.
55
Phadke
DM
,
Lucas
DR
,
Madan
S
.
Fine-needle aspiration biopsy of vertebral and intervertebral disc lesions: specimen adequacy, diagnostic utility, and pitfalls
.
Arch Pathol Lab Med
.
2001;
125
(
11
):
1463
1468
.
56
Pinchot
SN
,
Al-Wagih
H
,
Schaefer
S
,
Sippel
R
,
Chen
H
.
Accuracy of fine-needle aspiration biopsy for predicting neoplasm or carcinoma in thyroid nodules 4 cm or larger
.
Arch Surg
.
2009;
144
(
7
):
649
655
.
57
Pisano
ED
,
Fajardo
LL
,
Caudry
DJ
,
et al
.
Fine-needle aspiration biopsy of nonpalpable breast lesions in a multicenter clinical trial: results from the radiologic diagnostic oncology group V
.
Radiology
.
2001;
219
(
3
):
785
792
.
58
Que Hee
CG
,
Perry
CF
.
Fine-needle aspiration cytology of parotid tumours: is it useful?
ANZ J Surg
.
2001;
71
(
6
):
345
348
.
59
Ravetto
C
,
Colombo
L
,
Dottorini
ME
.
Usefulness of fine-needle aspiration in the diagnosis of thyroid carcinoma: a retrospective study in 37,895 patients
.
Cancer
.
2000;
90
(
6
):
357
363
.
60
Renshaw
AA
.
Accuracy of thyroid fine-needle aspiration using receiver operator characteristic curves
.
Am J Clin Pathol
.
2001;
116
(
4
):
477
482
.
61
Rosen
DG
,
Laucirica
R
,
Verstovsek
G
.
Fine needle aspiration of male breast lesions
.
Acta Cytolog
.
2009;
53
(
4
):
369
374
.
62
Schiro
AJ
,
Pinchot
SN
,
Chen
H
,
Sippel
RS
.
Clinical efficacy of fine-needle aspiration biopsy of thyroid nodules in males
.
J Surg Res
.
2010;
159
(
2
):
645
650
.
63
Sneige
N
,
Tulbah
A
.
Accuracy of cytologic diagnoses made from touch imprints of image-guided needle biopsy specimens of nonpalpable breast abnormalities
.
Diagn Cytopathol
.
2000;
23
(
1
):
29
34
.
64
Sood
T
,
Handa
U
,
Mohan
H
,
Goel
P
.
Evaluation of aspiration cytology of ovarian masses with histopathological correlation
.
Cytopathology
.
2010;
21
(
3
):
176
185
.
65
Stoll
LM
,
Yung
RCW
,
Clark
DP
,
Li
QK
.
Cytology of endobronchial ultrasound-guided transbronchial needle aspiration versus conventional transbronchial needle aspiration
.
Cancer Cytopathol
.
2010;
118
(
5
):
278
286
.
66
Stramandinoli
R-T
,
Sassi
L-M
,
Pedruzzi
P-AG
,
et al
.
Accuracy, sensitivity and specificity of fine needle aspiration biopsy in salivary gland tumours: a retrospective study
.
Med Oral Patol Oral Cir Bucal
.
2010;
15
(
1
):
e32
e37
.
67
Ward
WG
Sr,
Kilpatrick
S
.
Fine needle aspiration biopsy of primary bone tumors
.
Clin Orthop
.
2000
(
373
):
80
87
.
68
Westenend
PJ
,
Sever
AR
,
Beekman-De Volder HJ, Liem SJ. A comparison of aspiration cytology and core needle biopsy in the evaluation of breast lesions
.
Cancer
.
2001;
93
(
2
):
146
150
.
69
Yang
GC
,
Liebeskind
D
,
Messina
AV
.
Ultrasound-guided fine-needle aspiration of the thyroid assessed by Ultrafast Papanicolaou stain: data from 1135 biopsies with a two- to six-year follow-up
.
Thyroid
.
2001;
11
(
6
):
581
589
.
70
Zbaren
P
,
Schar
C
,
Hotz
MA
,
Loosli
H
.
Value of fine-needle aspiration cytology of parotid gland masses
.
Laryngoscope
.
2001;
111
(
11, pt 1
):
1989
1992
.
71
Zhang
S
,
Defrias
DV
,
Alasadi
R
,
Nayar
R
.
Endoscopic ultrasound-guided fine needle aspiration (EUS-FNA): experience of an academic centre in the USA
.
Cytopathology
.
2010;
21
(
1
):
35
43
.
72
Gerhard
R
,
da Cunha Santos
G
.
Inter- and intraobserver reproducibility of thyroid fine needle aspiration cytology: an analysis of discrepant cases
.
Cytopathology
.
2007;
18
(
2
):
105
111
.
73
Renshaw
AA
,
Cartagena
N
,
Granter
SR
,
Gould
EW
.
Agreement and error rates using blinded review to evaluate surgical pathology of biopsy material
.
Am J Clin Pathol
.
2003;
119
(
6
):
797
800
.
74
Landis
JR
,
Koch
GG
.
The measurement of observer agreement for categorical data
.
Biometrics
.
1977;
33
(
1
):
159
174
.
75
Schmidt
RL
,
Factor
RE
,
Affolter
KE
,
et al
.
Methods specification for diagnostic test accuracy studies in fine-needle aspiration cytology: a survey of reporting practice
.
Am J Clin Pathol
.
2012;
137
(
1
):
132
141
.

APPENDIX

Description of Quality Assessment of Diagnostic Accuracy Studies Criteria

    • Was the patient population described in sufficient detail to determine whether the results obtained in this population are applicable to another population of interest?

    • Issue: Spectrum bias, external validity.

    • Yes: The referral criteria are described, inclusion/exclusion criteria are described, and population characteristics are described. The population includes consecutive patients.

    • No: The population is not well described and, in addition, there is some unusual feature that may limit the external validity of the study (nonconsecutive patients, unusual comorbidity, etc).

    • Unclear: The study fails to meet the criteria for “yes” but there is no reason to believe that the study population is unusual. For example, the study was based on a large sample of consecutive patients but the referral criteria and patient characteristics are not described.

    • Is the reference standard likely to correctly classify the target condition?

    • Issue: Classification bias.

    • Yes: All studies were evaluated as yes, as histopathology is generally considered to be a reliable gold standard.

    • No: Not used.

    • Unclear: Not used.

    • Is the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between tests?

    • Issue: Disease progression bias.

    • Yes: Summary statistics for the time period between fine-needle aspiration cytology (FNAC) and histologic verification are given and the maximum time is less than 3 months or only a small proportion of cases exceed 3 months.

    • No: The time period is explicitly mentioned and the maximum time for more than 5% of cases is greater than 3 months.

    • Unclear: Statistics for the time period are not mentioned or if an insignificant percentage (<5%) of the samples exceed a time period of 3 months.

    • Did the whole sample or a random selection of the sample receive verification with a reference standard of diagnosis?

    • Issue: Partial verification bias.

    • Yes: The study is designed to follow up all patients who received FNAC with some type (histologic or long-term clinical) of follow-up.

    • No: The study is designed so that a significant portion (>10%) of patients who received FNAC did not receive verification. Retrospective studies based on surgery records will generally be included in this category.

    • Unclear: It is not possible to determine whether the intent of the study design was to follow up all patients.

    • Did patients receive the same reference standard regardless of the index test result?

    • Issue: Differential verification bias.

    • Yes: All patients who received FNAC received verification (the answer to question 4 must be yes in order for question 5 to be yes) and it can be determined that all patients received the same type of verification.

    • No: If there is partial verification (ie, answer to question 4 is “no”) or if 2 different types of verification were used (histologic verification and clinical follow-up).

    • Unclear: The type of verification cannot be determined.

    • Were the reference standard results interpreted without knowledge of the results of the index test?

    • Issue: Diagnostic review bias.

    • Yes: If the study specifically states that pathologists were blinded to the results of the FNAC.

    • No: If the study does not specifically state whether pathologists were blinded to the results of FNAC.

    • Unclear: If the study specifically states that the pathologists were not blinded to the results.

    • Were the same clinical data available when the test results were interpreted as would be available when the test is used in practice?

    • Issue: Clinical review bias.

    • Yes: If the study specifically states that cytologists were not blinded to clinical information.

    • No: If the study states that pathologists were blinded to clinical information.

    • Unclear: If the study does not specifically state whether pathologists were blinded to the results of FNAC.

    • Were uninterpretable/intermediate test results reported?

    • Issue: Bias due to handling of indeterminate results.

    • Yes: If the study specifically mentions inconclusive and inadequate results. Also mark yes if the number of inconclusive or inadequate results is specifically stated as zero.

    • No: If the study does not mention uninterpretable or intermediate results.

    • Unclear: Not used.

    • Did the study provide a clear definition of what was considered to be a positive result?

    • Issue: Threshold effects.

    • Yes: If the study specifically mentions a reference or describes the diagnostic criteria that were used.

    • No: If the study does not cite a reference or describe the diagnostic criteria used.

    • Unclear: Not used.

    • Were withdrawals from the study explained?

    • Issue: Withdrawal bias.

    • Yes: If the difference between those who received FNAC and were referred for follow-up and those who actually did not receive follow-up are specifically discussed.

    • No: Withdrawals are not discussed or if there is an unexplained difference between the number who received FNAC and the number verified.

    • Unclear: If all those who received FNAC received a final diagnosis but the study does not indicate whether there were withdrawals.

    • Was the index test described in sufficient detail to permit its replication?

    • Issue: Quality of reporting: Index test.

    • Yes: At least 8 of the 17 test parameters were specified.

    • No: Less than 3 of the items were specified.

    • Unclear: Four to 8 test parameters were specified.

    • Explanation: We based our evaluation criteria on the results of a recent study in which we evaluated the rate at which FNAC diagnostic test accuracy studies specified commonly cited test parameters.75  We identified 17 test parameters that are often specified; however, we found considerable variability in reporting. Studies most often reported 4 parameters, with a range from 0 to 13. While the number of parameters required to adequately describe FNAC is unknown, we based our criteria on current practice as found in our study. Studies that specified 9 or more parameters were found to be in the upper quartile and we therefore adopted this as a reasonable criterion for a complete description. Similarly, studies that specified less than 4 parameters were in the bottom quartile and we adopted this as a criterion for inadequate method description.

    • Was the reference test described in sufficient detail to permit its replication?

    • Issue: Quality of reporting: Reference test.

    • Yes: The study provided any description of the reference method.

    • No: The study provided no description of the reference method.

    • Unclear: Not used.

    • Was the reference standard independent of the index test (ie, the index test did not form part of the reference standard)?

    • Issue: Incorporation bias.

    • Yes: If the reference test does not explicitly incorporate the index test. Awareness of FNAC results would not count as incorporation in the reference test unless the FNAC diagnosis was explicitly used to render a final diagnosis.

    • No: If the index test is used as part of the reference test (eg, in computed tomography–guided hilar node biopsies a positive FNAC results is sufficient to call a sample a true positive. In this case, the index test is also the reference test).

    • Unclear: If multiple tests are used and whether the index test forms part of the reference test (eg, fine-needle aspiration, acid fast bacteria smear, and culture for tuberculosis).

    • Was data on the interrater reliability of the reference standard provided?

    • Issue: Classification bias. Although the reference standards (histopathology, clinical follow-up) are generally considered to be accurate, these tests are imperfect and can vary by site (skill of the pathologist) or by tissue. Thus, it is useful for studies to provide some indication of the interrater reliability of the reference standard.

    • Yes: Data showing the interrater reliability for the study site are presented and are acceptable (less than 10% discrepancy).

    • No: Data showing unacceptable interrater reliability at the study site are presented.

    • Unclear: No data showing interrater reliability are presented.

Author notes

Dr Layfield is now Professor/Chair of the Department of Pathology and Anatomical Sciences, University of Missouri, Columbia Missouri.

The authors have no relevant financial interest in the products or companies described in this article.