Context.—The quality of diagnostic accuracy studies is determined by 2 key factors: risk of bias and comparability. Bias can distort accuracy estimates and poor reporting impairs comparability. While diagnostic accuracy studies for fine-needle aspiration cytology (FNAC) are frequently published, the methodologic issues associated with this body of literature have never been reviewed.
Objective.—To assess the quality of design and reporting of diagnostic test accuracy studies in FNAC.
Data Sources.—Diagnostic accuracy studies were identified by a Medline (US National Library of Medicine) search. Sixty-four FNAC diagnostic test accuracy studies were randomly selected for structured review with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) survey. Studies were divided between 2 time periods: 2000-2001 and 2009-2011.
Conclusions.—Diagnostic test accuracy studies of FNAC suffer from numerous deficiencies in study design, which negatively affect the reliability of accuracy estimates.
Reliable estimates of accuracy are important for any diagnostic test. Such estimates determine the benefit of the test relative to other diagnostic methods and guide the development of diagnostic algorithms. Diagnostic studies are subject to unique sources of bias that can distort estimates of sensitivity and specificity. Thus, risk of bias is an important dimension of study quality.
Studies are useful only if their results can be applied to a particular clinical problem. Clinical problems are typically formulated with the PICO (population, index test, comparator, and outcome) format. Thus, to determine whether the results of a study can be applied to a clinical problem, one must determine whether the PICO dimensions of a study are applicable to the clinical problem. There are many factors that can affect diagnostic accuracy. This comparison can only be done if a study provides a complete description of the PICO parameters. Thus, comparability or quality of reporting is a second dimension of study quality. Quality of reporting also affects the ability to assess risk of bias.
In recent years, there has been increasing awareness of deficiencies in study design and reporting in diagnostic test accuracy studies.1–5 The companion article by Schmidt and Factor in this issue explains the theory that underlies the concerns regarding bias and quality of reporting in diagnostic accuracy studies. The article also provides a framework for evaluating the value of a diagnostic accuracy study with respect to a particular clinical question. In this article, we illustrate those principles by evaluating the quality of accuracy studies for a specific diagnostic test: fine-needle aspiration cytology (FNAC). Diagnostic accuracy studies of FNAC are frequently published in the cytology literature, but a systematic review of the methodology and quality of reporting of this body of literature has never been undertaken. Our objective was to conduct a broad survey of FNAC diagnostic test accuracy studies to identify common problems in study design or reporting. To that end, we used the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) survey instrument to evaluate a random selection of studies.
Study Design and Rationale
Our objective was 2-fold. First, we wanted to capture a broad selection of FNAC diagnostic accuracy studies, not restricted to particular tissue types or to specific journals. Second, we questioned whether more recent studies, in the advent of quality assurance measures such as Standards for Reporting of Diagnostic Accuracy (STARD) and QUADAS, had more improved methods of reporting than older studies. For this reason, we selected studies from 2 time periods (2000–2001 and 2009–2011). We conducted a literature search to identify all FNAC diagnostic accuracy studies within the designated time periods. A random sample was then selected from each time period to provide a broad spectrum of studies.
Potentially relevant diagnostic accuracy studies were identified by a Medline (US National Library of Medicine) search using the search terms fine-needle or fine-needle biopsy AND sensitivity and specificity. Studies were identified in 2 distinct time periods: 2000–2001 (period 1) and 2009–2011 (period 2). Studies were limited to English language.
The titles and abstracts of the studies identified by electronic search were screened for eligibility by one author (R.L.S.). Full reports were obtained for all articles that passed the initial screen. Studies were eligible for inclusion if they contained more than 10 cases with numerical data on diagnostic accuracy (sensitivity and specificity) for FNAC. Studies were not limited by anatomic site, journal, or specialty.
Thirty-two articles were randomly selected from each time period (2000-2001 and 2009-2011).
Studies were evaluated with the QUADAS6,7 instrument. QUADAS is a validated7 survey instrument that was developed in 20036 and was designed to be used to evaluate the quality (ie, quality of reporting and threats of bias) of diagnostic accuracy studies in systematic reviews. It has gained increasing acceptance in meta-analysis of diagnostic test accuracy studies and is now widely used to assess study quality. For example, the Cochrane Collaboration Diagnostic Test Accuracy Group recommends that researchers use QUADAS for assessment of study quality.8
Each of the QUADAS survey questions are designed to assess a specific methodologic issue by using a generic framework for quality assessment, but it cannot anticipate all of the factors that are required to assess studies in a particular context. It may be necessary for researchers to develop specific criteria to evaluate each QUADAS question in a particular context.8 For example, a QUADAS survey question addressing the adequacy of method reporting will be different for a study using molecular techniques versus one using FNAC. The Cochrane Handbook8 recommends a common subset of survey items but provides a list of potential additional items. Following this guideline, we used the basic set of QUADAS items recommended by the Cochrane Handbook and added questions that we felt were important in the context of FNAC diagnostic accuracy studies. An explanation of each QUADAS survey item and the associated evaluation criteria for our survey of FNAC studies is presented in the Appendix.
The criteria for each QUADAS item were refined by using an iterative process as recommended by the Cochrane Handbook.8 We tested the criteria on a subset of studies to ensure that the criteria were objective and were interpreted the same way by all evaluators. We had several test runs on a small number of studies before finalizing the assessment criteria. We then evaluated the entire set of 64 articles by using the finalized criteria.
Each study was independently evaluated by 3 authors (R.E.F., R.L.S., B.L.W.) using a set of assessment criteria (see Appendix) developed for each QUADAS item.
Score interpretation: Although the scales for each item are the same, the criteria used to evaluate each item are different such that the scores of different items are not comparable. Higher scores generally indicate higher quality and we are able to offer a qualitative assessment of quality, based on the distribution of scores within each item.
Survey data were recorded in a database (Microsoft Access 2010 Microsoft Corp, Redmond, Washington) and all statistical analysis was conducted with Stata Statistical Software release 12 (Stata Corp, College Station, Texas). Agreement was measured with percentage agreement and Cohen κ. Differences in QUADAS scores over time were assessed by item with the nonparametric Mann-Whitney rank sum test. A global change in all items over time was assessed by probit analysis. The impact of journal type was assessed by using probit analysis and adjusting for the impact of survey item. Results were considered statistically significant if P < .05. Bonferroni corrections were used for multiple comparisons. A sample size (n = 32) was selected to ensure 80% power for detecting a 20% change in the overall quality score between 2 different time periods.
We identified 278 studies in period 1 (2000–2001) and 115 studies in period 2 (2009–2011). These were screened to obtain 81 and 73 eligible studies for periods 1 and 2, respectively, (Figure 1). We randomly selected 32 studies from each time period for evaluation.9–71
Characteristics of Included Studies
The most common tissue types were breast, lymph nodes, salivary glands, and thyroid, which accounted for 14%, 17%, 14%, and 24% of the total studies, respectively, (Table 1). Diagnostic accuracy studies of FNAC are published in several different categories of journals, including pathology journals, specialized cytopathology journals, surgery journals, and general medicine journals. The included studies were mostly drawn from specialized cytology journals, surgery journals, and general medical journals, which were almost equally represented in our sample (Table 2). Our sample included a relatively small proportion of studies published in pathology journals. The included studies were mainly conducted in the United States and Europe (Table 3). Our sample included 54 retrospective studies, 9 prospective studies, and 1 that could not be determined. Prospective studies were spread evenly over tissue types and publication period, but appeared more frequently in medical journals.
The average quality score for each QUADAS item for both time periods is presented in Table 4. The distribution of scores aggregated over both time periods is presented in Figure 2. Our findings are summarized below for each domain.
Forty-five percent of the studies in our sample had inadequate population descriptions, probably because most studies in our sample were retrospective, and these studies often fail to describe patterns of patient referral. We found no significant difference in reporting over time.
Only 30% of the studies in our sample provided any description of the diagnostic criteria used. Nearly all (91%) failed to report whether clinical information was available to the cytologist. Overall, studies specified anywhere from 0 to 14 of 17 parameters that could be used to describe the index test. On average, studies reported on only 4 or 5 test parameters, and only 7 studies described 9 or more of the 17 possible test parameters. In addition, we found that reporting was quite variable: the parameters that were reported were often not consistent even within studies conducted at 1 anatomic location.
All studies used histopathology and clinical follow-up as a reference standard. These are generally regarded as accurate; however, no studies mentioned the interrater reliability of the test. Blinding of the reference test was specifically mentioned in only 3 studies (4.7%). We found that 10 studies explicitly incorporated FNAC results as a condition for the final diagnosis. Very few studies described the time period between the index test (FNAC) and the reference test (histopathology or clinical follow-up). Only 2 of 64 studies provided any description of the methods associated with the reference test. There was a statistically significant increase in the percentage of studies with incorporation bias over time (Appendix, item 13; P = .007). The increase in incorporation was positively correlated to the number of articles on FNAC of mediastinal lymph nodes. Otherwise no change was seen over time.
Patient flows were often difficult to ascertain from the descriptions provided in the “Methods” section. We found that 60% of the studies in our sample had partial verification (Appendix, item 4); nearly 80% of the studies in our sample had differential verification (Appendix, item 5), and only 30% of the studies provided an adequate description of withdrawals. Flow diagrams were only provided in 8 studies (12.5%) and they were sometimes incomplete. Studies generally reported indeterminate results (78%), but the manner in which these results were used varied widely. For example, inadequate results were sometimes not mentioned at all, or were excluded from analysis without specifying the number of cases. In some studies, atypical and suspicious cases were added to positive cases for analysis, but the number of cases was not specified. There was an increase in the number of studies reporting withdrawals (P = .05) between the 2 time periods; however, the increase was not statistically significant after Bonferroni adjustment.
Factors Associated With Study Quality
Changes in Study Quality Over Time
Overall, there was no indication of a general change in study quality over time (P = .81).
Effect of Journal Type
Scores were significantly lower for studies published in surgery journals than for other journal categories (P = .009).
Effect of Type of Study
Prospective studies were higher quality than retrospective studies, demonstrating significantly higher QUADAS scores for items 1 (P = .02), 4 (P = .009), 8 (P < .001), and 13 (P = .08) in the Appendix. Prospective studies also never had QUADAS scores lower than retrospective studies.
Interrater agreement was assessed for each QUADAS item by using both percentage agreement and Cohen κ (Table 5), which found an agreement of 92%.
We applied the QUADAS criteria to a random selection of FNAC diagnostic accuracy studies in order to provide a representative overview of the FNAC diagnostic accuracy literature.
Our sample included a broad spectrum of studies from different organ systems. Studies were performed by a range of author types, and came from a variety of journals from numerous countries. We applied modified QUADAS criteria to the studies by using independent assessments by 3 authors and obtained a high level of agreement. We therefore believe our evaluation of these studies was both reproducible and reliable. The questions in the QUADAS survey are designed to assess 2 broad areas of methodologic quality: risk of bias and study comparability. We discuss our results below with respect to each domain of the PICO format.
Population (Patient Selection and Description)
A study is useful when it is applicable to both particular clinical situations and populations in other studies. Ultimately, studies in diagnostic accuracy should be comparable to one another, and this is enabled by informative reporting of populations and study design. A comparable study is said to have external validity.
Overall, the studies in our survey did a poor job of reporting patient selection. As described in the companion review by Schmidt and Factor in this issue, assessment of spectrum bias depends on a comparison of patient populations. Eighty percent of the studies in our sample were retrospective and lacked comparable information such as severity of disease, reason for referral, or whether prior testing had been performed. In contrast, prospective studies were generally designed to investigate the accuracy of FNAC in a much more specific population of patients (eg, those with resectable peripheral lung lesions less than 3 cm in diameter with indeterminate imaging). Overall, most studies lacked adequate information to assess external validity.
Fine-needle aspiration cytology is a complex, multistep process that can be performed in a variety of ways. The accuracy of the test may depend on methodology. For example, results may differ by needle size, the number of passes, the experience of the person performing the biopsy, the use of guidance, the availability of rapid onsite specimen evaluation, staining methods, the use of ancillary methods, the experience of the pathologist reading the slides, and others. To make valid comparisons between studies, these need to be clearly reported. In our survey of FNAC studies, we found considerable variability in the reporting of methodology.
The availability of clinical information can also influence test interpretation. From a clinical perspective, the ideal way to perform fine-needle aspirations is to have information about the patient's history before the test. Sometimes this information is not available, and sometimes it is, but the cytopathologist is blinded to the information for the purpose of the study. There is no way to know this if studies neglect to report this information. From a research perspective, variable reporting of clinical history causes difficulties in comparing diagnostic accuracy studies. We found that most studies did not report whether cytologic assessment was blinded to clinical information.
Fine-needle aspiration cytology can be subject to significant interrater differences72 due to errors or threshold effects. Errors arise when a cytopathologist fails to appreciate a feature that is present (or sees one that is not present). Threshold effects arise when cytopathologists use different criteria to classify cases. This will affect sensitivity and specificity. Renshaw et al73 demonstrated that threshold effects account for a significant proportion of interobserver differences in surgical pathology specimens. Potential threshold differences can only be identified if the criteria for diagnosis are described or referenced in a study. In our survey of FNAC studies, few reported diagnostic criteria.
Classification bias results from an imperfect reference test. Classification bias can falsely depress estimates of both sensitivity and specificity, depending upon the disease prevalence. The impact of classification bias can be quite substantial and misleading. Although the gold standard of histopathology and/or clinical follow-up is generally regarded as accurate, these tests are imperfect and can vary from site to site (depending on the skill of the pathologist) or between tissue types. Some indication of the reliability of the gold standard should be stipulated in accuracy studies to assess the potential for classification bias. An alternative is to reference the reliability of the gold standard in the literature or the interrater reliability at a particular site. None of the studies in our sample addressed this issue.
The time interval between the index test and the reference test can affect accuracy. A long interval between FNAC and surgical follow-up may allow for disease progression. Consequently, the reference test will detect more positive cases if the interval between the index test and reference test is long. On the other hand, a sufficiently long period is required for medical follow-up of negative cases. Thus, it is important for researchers to report the time interval between the index test and reference test. We found that researchers rarely reported the time interval to follow-up.
In clinical practice, the surgical pathologist is often aware of cytology results before verifying a case and uses this information like an ancillary test. Having this information is helpful clinically but can confound the results of the reference standard in diagnostic accuracy studies. This is known as review bias. While we are not aware of any studies that have assessed the influence of FNAC results on the interpretation of histopathology, it is an empirical problem with reporting results, especially for retrospective studies. While prospective studies can be designed to incorporate blinded appraisal as part of the study, the results in retrospective studies have already been established. To overcome this limitation, one solution is to reassess material by using blinded appraisal by pathologists unfamiliar with the cases as malignant or benign. At the very least, studies should mention this as a possible limitation. In our survey, studies rarely reported whether pathologists were blinded to results, or discussed this as a possible limitation.
We also found that few studies provided any description of the reference test. The histologic reference test often appears to be taken for granted as a gold standard, but variability is possible due to differences in fixation, ancillary tests, the experience level of the pathologist interpreting the slides, etc. Some mention of how the reference test was interpreted would improve comparability of studies.
Incorporation bias occurs when the reference test explicitly incorporates the results of the index test as part of the criteria. In our survey, we found several examples of incorporation bias particularly in endobronchial ultrasound FNAC studies of the lung and mediastinum, in which positive FNAC results were used as the reference standard. Such studies cannot reliably report a sensitivity or positive predictive value. Since histologic verification is not always possible, alternative means of reporting include either citing evidence that the predictive value of FNAC mediastinal nodes is almost perfect, or confirming the FNAC results by independent, blinded cytology review. This is a serious design flaw in these studies. While this may be acceptable clinical practice, it is problematic in the context of a diagnostic accuracy study.
Partial verification was a common problem in our study survey. Most studies used a retrospective design. With this method, cases are obtained from surgery records, which excludes patients who are not referred to surgery. This leads to biased estimates of sensitivity and specificity. In the context of FNAC, partial verification will generally cause a positive bias in estimates of sensitivity and a negative bias in specificity. The impact of this type of bias can be significant. Partial verification bias can be corrected by accounting for all patients who received the index test (FNAC). An “ideal” study would use the same reference standard for all patients; however, it is not practical or ethical to refer all patients to surgery. An alternative would be to provide clinical follow-up of patients who had negative FNAC results, and at a minimum, studies should provide a flow diagram to indicate the number of patients who received the index test and the number who were verified. Though methods are available to estimate the magnitude of bias due to partial verification, none of the studies we reviewed included this as a topic for discussion. Our impression is that the impact of partial verification is not appreciated and that this source of bias is common in the cytology literature.
We also found that withdrawals are poorly reported. A withdrawal is defined as a patient who receives the index test and is referred for verification but fails to receive verification. Withdrawals can distort accuracy estimates if the number of withdrawals is high and if the population of withdrawals differs from the population that receives verification. The impact of withdrawals is difficult to predict between studies but would be aided by improved reporting.
Accuracy statistics are influenced by the way in which inadequate and indeterminate results are handled. For example, some studies fail to mention whether there were inadequate results; some exclude inadequate results from accuracy calculations; and others categorize inadequate results as false negatives. Studies also frequently differ in the way that they handle “atypical,” “suspicious,” or “indeterminate” results. This variability in reporting makes comparison of studies very challenging if not impossible. In our survey, there was inconsistency in reporting. For purposes of comparability, it would be preferable for studies to report each diagnostic category obtained for the index test (fine-needle aspiration) rather than report aggregate statistics, which combine categories. This would enable researchers to compare studies, apply their own assumptions, and apply consistent methods when conducting meta-analyses of diagnostic test accuracy studies.
Indeterminate results can also be a source of bias. The rate of indeterminate diagnoses can reflect different diagnostic thresholds. For example, one study might classify many difficult borderline cases as indeterminate, whereas another study might classify such cases as malignant or benign. One would expect the diagnostic accuracy to be higher in a study in which difficult cases were generally referred. Thus, a large difference in referral rates can distort accuracy estimates and present a source of bias.
Our results indicate that significant deficiencies are common in the design and reporting of FNAC diagnostic test accuracy studies. Our findings are summarized in Table 6.
We found several sources of bias that could distort estimates of sensitivity and specificity. We found several significant issues related to patient flows that present common and significant source of bias. Overall, we believe the issues related to patient flow present the highest risk of bias. The quality of reporting with respect to patient selection, FNAC method description, and indeterminates (patient flow) present the greatest concerns for comparability.
The high prevalence of several significant sources of bias in cytopathology is an important finding. It is possible that many estimates of sensitivity and specificity in the current literature are affected. These estimates are used to guide clinical decisions and inform guideline development. Thus, it is important that researchers become aware of these problems to improve the design of future studies. While no study is perfect, researchers should take steps to improve study designs to reduce the risk of bias. At a minimum, researchers should be aware of study limitations and provide estimates of bias in accuracy estimates. Similarly, those who depend on diagnostic accuracy estimates need to be aware of the potential for bias in data derived from FNAC accuracy studies. To our knowledge, the issues surrounding bias in cytopathology studies have not been appreciated. We recently found that only a small fraction of the studies in the existent literature qualified for inclusion in a Cochrane review on diagnostic accuracy owing to issues with bias.
Our study has several strengths and limitations. The strengths include being the first study to systematically examine methodologic issues across a wide range of FNAC diagnostic accuracy studies. We randomly selected studies from 2 time periods and a range of countries, journal types, and tissues. Thus, the sample is likely to be representative. We used QUADAS, which is a widely used and validated tool for quality assessment of diagnostic studies. We took care to develop objective criteria to assess the QUADAS items and obtained a high level of interrater agreement. Thus, we believe our assessments are accurate. Limitations include having a small sample size, which may not have provided sufficient power to detect small differences in item assessments, and for some of the items, we used simple assessment criteria that were unambiguous but sacrificed some information content.
Diagnostic test accuracy studies of FNAC suffer from numerous deficiencies in study design, which negatively affect the reliability of accuracy estimates. This has important implications both in the assessment of individual studies and in the comparison of collected studies.
Description of Quality Assessment of Diagnostic Accuracy Studies Criteria
Was the patient population described in sufficient detail to determine whether the results obtained in this population are applicable to another population of interest?
Issue: Spectrum bias, external validity.
Yes: The referral criteria are described, inclusion/exclusion criteria are described, and population characteristics are described. The population includes consecutive patients.
No: The population is not well described and, in addition, there is some unusual feature that may limit the external validity of the study (nonconsecutive patients, unusual comorbidity, etc).
Unclear: The study fails to meet the criteria for “yes” but there is no reason to believe that the study population is unusual. For example, the study was based on a large sample of consecutive patients but the referral criteria and patient characteristics are not described.
Is the reference standard likely to correctly classify the target condition?
Issue: Classification bias.
Yes: All studies were evaluated as yes, as histopathology is generally considered to be a reliable gold standard.
No: Not used.
Unclear: Not used.
Is the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between tests?
Issue: Disease progression bias.
Yes: Summary statistics for the time period between fine-needle aspiration cytology (FNAC) and histologic verification are given and the maximum time is less than 3 months or only a small proportion of cases exceed 3 months.
No: The time period is explicitly mentioned and the maximum time for more than 5% of cases is greater than 3 months.
Unclear: Statistics for the time period are not mentioned or if an insignificant percentage (<5%) of the samples exceed a time period of 3 months.
Did the whole sample or a random selection of the sample receive verification with a reference standard of diagnosis?
Issue: Partial verification bias.
Yes: The study is designed to follow up all patients who received FNAC with some type (histologic or long-term clinical) of follow-up.
No: The study is designed so that a significant portion (>10%) of patients who received FNAC did not receive verification. Retrospective studies based on surgery records will generally be included in this category.
Unclear: It is not possible to determine whether the intent of the study design was to follow up all patients.
Did patients receive the same reference standard regardless of the index test result?
Issue: Differential verification bias.
Yes: All patients who received FNAC received verification (the answer to question 4 must be yes in order for question 5 to be yes) and it can be determined that all patients received the same type of verification.
No: If there is partial verification (ie, answer to question 4 is “no”) or if 2 different types of verification were used (histologic verification and clinical follow-up).
Unclear: The type of verification cannot be determined.
Were the reference standard results interpreted without knowledge of the results of the index test?
Issue: Diagnostic review bias.
Yes: If the study specifically states that pathologists were blinded to the results of the FNAC.
No: If the study does not specifically state whether pathologists were blinded to the results of FNAC.
Unclear: If the study specifically states that the pathologists were not blinded to the results.
Were the same clinical data available when the test results were interpreted as would be available when the test is used in practice?
Issue: Clinical review bias.
Yes: If the study specifically states that cytologists were not blinded to clinical information.
No: If the study states that pathologists were blinded to clinical information.
Unclear: If the study does not specifically state whether pathologists were blinded to the results of FNAC.
Were uninterpretable/intermediate test results reported?
Issue: Bias due to handling of indeterminate results.
Yes: If the study specifically mentions inconclusive and inadequate results. Also mark yes if the number of inconclusive or inadequate results is specifically stated as zero.
No: If the study does not mention uninterpretable or intermediate results.
Unclear: Not used.
Did the study provide a clear definition of what was considered to be a positive result?
Issue: Threshold effects.
Yes: If the study specifically mentions a reference or describes the diagnostic criteria that were used.
No: If the study does not cite a reference or describe the diagnostic criteria used.
Unclear: Not used.
Were withdrawals from the study explained?
Issue: Withdrawal bias.
Yes: If the difference between those who received FNAC and were referred for follow-up and those who actually did not receive follow-up are specifically discussed.
No: Withdrawals are not discussed or if there is an unexplained difference between the number who received FNAC and the number verified.
Unclear: If all those who received FNAC received a final diagnosis but the study does not indicate whether there were withdrawals.
Was the index test described in sufficient detail to permit its replication?
Issue: Quality of reporting: Index test.
Yes: At least 8 of the 17 test parameters were specified.
No: Less than 3 of the items were specified.
Unclear: Four to 8 test parameters were specified.
Explanation: We based our evaluation criteria on the results of a recent study in which we evaluated the rate at which FNAC diagnostic test accuracy studies specified commonly cited test parameters.75 We identified 17 test parameters that are often specified; however, we found considerable variability in reporting. Studies most often reported 4 parameters, with a range from 0 to 13. While the number of parameters required to adequately describe FNAC is unknown, we based our criteria on current practice as found in our study. Studies that specified 9 or more parameters were found to be in the upper quartile and we therefore adopted this as a reasonable criterion for a complete description. Similarly, studies that specified less than 4 parameters were in the bottom quartile and we adopted this as a criterion for inadequate method description.
Was the reference test described in sufficient detail to permit its replication?
Issue: Quality of reporting: Reference test.
Yes: The study provided any description of the reference method.
No: The study provided no description of the reference method.
Unclear: Not used.
Was the reference standard independent of the index test (ie, the index test did not form part of the reference standard)?
Issue: Incorporation bias.
Yes: If the reference test does not explicitly incorporate the index test. Awareness of FNAC results would not count as incorporation in the reference test unless the FNAC diagnosis was explicitly used to render a final diagnosis.
No: If the index test is used as part of the reference test (eg, in computed tomography–guided hilar node biopsies a positive FNAC results is sufficient to call a sample a true positive. In this case, the index test is also the reference test).
Unclear: If multiple tests are used and whether the index test forms part of the reference test (eg, fine-needle aspiration, acid fast bacteria smear, and culture for tuberculosis).
Was data on the interrater reliability of the reference standard provided?
Issue: Classification bias. Although the reference standards (histopathology, clinical follow-up) are generally considered to be accurate, these tests are imperfect and can vary by site (skill of the pathologist) or by tissue. Thus, it is useful for studies to provide some indication of the interrater reliability of the reference standard.
Yes: Data showing the interrater reliability for the study site are presented and are acceptable (less than 10% discrepancy).
No: Data showing unacceptable interrater reliability at the study site are presented.
Unclear: No data showing interrater reliability are presented.
Dr Layfield is now Professor/Chair of the Department of Pathology and Anatomical Sciences, University of Missouri, Columbia Missouri.
The authors have no relevant financial interest in the products or companies described in this article.