Approximately 15% to 30% of thyroid nodules that undergo fine-needle aspiration are classified as cytologically indeterminate, presenting management challenges for patients and clinicians alike. During the past several years, several molecular tests have been developed to reduce the diagnostic uncertainty of indeterminate thyroid fine-needle aspirations.
To review the methodology, clinical validation, and recent peer-reviewed literature for 4 molecular tests that are currently marketed for cytologically indeterminate thyroid fine-needle aspiration specimens: Afirma, ThyroSeq, ThyGenX/ThyraMIR, and RosettaGX Reveal.
Peer-reviewed literature retrieved from PubMed search, data provided by company websites and representatives, and authors' personal experiences.
The 4 commercially available molecular tests for thyroid cytology offer unique approaches to improve the risk stratification of thyroid nodules. Familiarity with data from the validation studies as well as the emerging literature about test performance in the postvalidation setting can help users to select and interpret these tests in a clinically meaningful way.
Fine-needle aspiration (FNA) cytology plays an important role in the risk stratification of thyroid nodules. For patients who meet clinical, laboratory, and/or sonographic criteria for biopsy, FNA cytology can be helpful for guiding subsequent management. For example, cytologically benign nodules may be managed by a nonsurgical watchful-waiting approach in most cases, whereas cytologically malignant nodules are usually referred for surgical resection. Nevertheless, approximately 15% to 30% of thyroid FNAs are classified in one of the cytologically indeterminate categories of the Bethesda System for Reporting Thyroid Cytopathology: atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS; Bethesda III) or follicular neoplasm/suspicious for follicular neoplasm (FN/SFN; Bethesda IV).1,2 Because of the low (5%–15% for AUS/FLUS) to moderate (15%–30% for FN/SFN) cancer risks associated with these indeterminate categories, management recommendations have generally been conservative. Repeating the FNA has been suggested as the appropriate follow-up for AUS/FLUS nodules, with the consideration of diagnostic lobectomy for nodules that remain cytologically indeterminate on repeat FNA; diagnostic lobectomy has been recommended for nodules in the FN/SFN category.
Although lobectomy may be considered necessary and adequate management for premalignant nodules or low- to intermediate-risk cancers, it is not the ideal management for all cytologically indeterminate nodules.3 For the majority of cases in the AUS/FLUS and FN/SFN categories, lobectomy reveals a histologically benign nodule, and surgery may be regarded as overtreatment. On the other hand, lobectomy may be deemed insufficient treatment for cases in which a high-risk cancer is diagnosed on histologic examination. In the latter scenario, patients may need to undergo reoperation to complete the thyroidectomy, especially if they may require radioactive iodine ablation.
Molecular testing has thus emerged as a tool for stratifying cytologically indeterminate thyroid nodules into clinically meaningful risk categories. The goals of ancillary molecular testing for thyroid cytology include (1) avoidance of unnecessary surgery for benign nodules and (2) distinguishing high-risk cancers that merit total thyroidectomy from premalignant or low/intermediate-risk nodules for which lobectomy may be the preferred initial surgical step. Currently, 4 tests are commercially available for thyroid FNAs: Afirma (Veracyte, Inc, South San Francisco, California), ThyroSeq v2 (CBLPath, Inc, Rye Brook, New York, and University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania), ThyGenX/ThyraMIR (Interpace Diagnostics, Inc, Parsippany, New Jersey), and RosettaGX Reveal (Rosetta Genomics, Inc, Philadelphia, Pennsylvania) (Figure 1, A through D, and Table 1). In this review, we discuss differences in test methodology and appraise the recent literature regarding test performance. Readers are referred to recent reviews3–5 for additional discussion about the first 3 tests.
INFERRING TEST PERFORMANCE FROM PUBLISHED VALIDATION STUDIES
In the clinical validation studies for each of these molecular tests for thyroid FNAs, histopathology of the aspirated nodule serves as the gold standard for determining whether a nodule is malignant or benign. Clinical sensitivity and specificity refer to a test's ability to classify nodules correctly as histologically malignant or benign, respectively, and are considered fixed characteristics of a test as determined by a clinical validation study.
Estimates of cancer risk based on test results, on the other hand, are derived from positive and negative predictive values (PPVs and NPVs, respectively). In the context of molecular testing for thyroid FNAs, PPV indicates the cancer risk based on an abnormal (positive) test result, and NPV indicates the probability of benignity based on a negative test result. The complement of the NPV, or 1 − NPV, is equivalent to the cancer risk associated with a negative test result. In contrast to sensitivity and specificity, PPV and NPV (or 1 − NPV) vary with the pretest probability of disease in the tested population. Thus, the PPV and 1 − NPV for each test can be plotted as functions of the pretest probability of cancer to illustrate the malignancy risk associated with positive or negative test results, respectively (Figure 2, A through H). The shapes of these predictive value curves are determined by the Bayes theorem using a test's reported sensitivity and specificity. Tests with high specificity have sharply bent PPV curves (Figure 2, C and E), and tests with high sensitivity have sharply bent 1 − NPV curves (Figure 2, B, D, F, and H). The ability of a negative molecular test result to help rule out malignancy depends on the pretest probability of cancer in the tested population (Figure 2).
The prevalence of malignancy for the cytologically indeterminate Bethesda categories (ie, AUS/FLUS and FN/SFN) is often used as an approximation for pretest probability of cancer. Cancer prevalence for cytologically indeterminate nodules may vary by institution because of dissimilarities in patient populations or the heterogeneous nature of the AUS/FLUS category, as well as differences among cytopathologists in their thresholds for using the cytologically indeterminate Bethesda categories.6–8 Therefore, for optimal interpretation of these tests, each institution should determine the prevalence of malignancy for nodules classified in the cytologically indeterminate categories.9 The incorporation of clinical and sonographic data may also provide a more accurate estimate of pretest disease probability.
Several caveats must be considered when comparing the performance characteristics of these tests. First, these molecular tests define a positive test result in different ways, limiting the utility of comparing PPVs as a measure of test performance. For tests with binary outcomes such as Afirma and RosettaGX Reveal, “suspicious” (or “suspicious by microRNA [miRNA] profiling” for the latter test) results are used in the calculation of true-positive and false-positive outcomes (Figure 1, A and D). In contrast, neither ThyroSeq nor ThyGenX/ThyraMIR report binary outcomes in practice. Instead, although a negative test result confers a defined low residual cancer risk, the finding of a specific genetic alteration allows estimation of a more granular cancer risk and phenotype for thyroid nodules based on the mutation/gene fusion present (Figure 1, B and C). Molecular assessment of cancer risk may be further adjusted based on the allelic frequency of the genetic change and/or the results of miRNA profiling (for samples tested by both ThyGenX and ThyraMIR). Although the detection of any oncogenic mutation/fusion or high-risk miRNA profile by these 2 tests has been considered a positive result for the purposes of statistical analysis, it should be noted that these tests provide a gradation of cancer risk that is not entirely captured by the PPV alone.10
Secondly, the gold standard histopathology in these validation studies generally consists of only 2 categories (benign or malignant) to facilitate statistical analysis. However, this practice runs counter to the recognition that thyroid neoplasms encompass a broad continuum of biological behavior rather than a dichotomous one.3,11 Users of these tests should recognize that the PPVs publicized by each of these commercially available tests are therefore limited in their ability to convey a nuanced and clinically meaningful risk of disease.
AFIRMA GENE EXPRESSION CLASSIFIER
The Afirma Gene Expression Classifier (GEC) is a microarray-based test that uses a proprietary algorithm to risk-stratify cytologically indeterminate nodules as having either benign (GEC-B) or suspicious (GEC-S) messenger RNA (mRNA) expression profiles (Table 1 and Figure 1, A). A prospective, blinded, multicenter clinical validation study characterized the performance of the test among 210 AUS/FLUS and FN/SFN nodules with a pretest cancer rate of 24% to 25%. The study had a 15% postunblinding sample exclusion rate. Based on the results of this study, Afirma has high sensitivity (90%) and modest specificity (approximately 50%) for malignancy, corresponding to posttest malignancy risks of 5% to 6% for GEC-B and 37% to 38% for GEC-S results among cytologically indeterminate nodules (Table 1 and Figure 2, A and B).12 As a test optimized for high sensitivity and NPV, Afirma's clinical utility is based on its ability to help exclude malignancy in the setting of indeterminate cytology. The specificity of the Afirma test may be augmented by 2 “malignancy classifiers”: the Afirma MTC test, which assays for gene expression profiles specific for medullary thyroid carcinoma,13,14 and the Afirma BRAF test, which assays for the expression profile associated with the BRAF V600E mutation. Regarding the latter test, the low prevalence of BRAF V600E mutations among AUS/FLUS and FN/SFN aspirates, particularly in Western populations, may limit the utility of Afirma BRAF as a reflex test for GEC-S samples.15,16
The cellular composition of samples submitted for Afirma testing is determined in part by gene expression cassettes that screen for mRNA profiles associated with medullary thyroid carcinoma (the Afirma MTC test, described above), parathyroid tissue, and metastatic tumors.12 Only samples that do not trigger these screening cassettes are analyzed by the main 142-gene classifier; samples that are flagged by these screening cassettes are resulted as having a suspicious Afirma result without subsequent analysis by the main GEC.
Can GEC-B Nodules Be Managed Like Cytologically Benign Nodules?
Based on the results of its clinical validation study, Afirma has been marketed for identifying benign nodules when the cytology is indeterminate. This statement implies that cytologically indeterminate nodules with GEC-B results can be managed with clinical observation rather than diagnostic lobectomy in most cases, similar to cytologically benign nodules. A study by Angell et al17 endorses this approach using retrospective ultrasonographic nodule measurements as a proxy for biologic behavior; the authors found that the proportion of cytologically indeterminate GEC-B nodules exhibiting growth (greater than 20% increase in 2 or more dimensions, or greater than 50% increase in volume) was comparable to that of cytologically benign nodules during a median follow-up period of 13 to 14 months. Based on this finding, the authors suggest that GEC-B nodules may be clinically observed similarly to cytologically benign nodules. Assessment of additional sonographic parameters apart from size may be helpful in the broader confirmation of these findings.
The Afirma validation study12 remains the benchmark for assessing its test performance. Nevertheless, the growing list of publications summarizing institutional experiences with Afirma may also shed some light on how Afirma has been functioning in the real-world clinical setting. Interpretation of these postvalidation analyses requires caution because of (1) the retrospective design of these studies and (2) the fact that in clinical practice, decisions about nodule management are not blinded to Afirma results. In particular, most GEC-B nodules are not resected, and the small fraction selected for surgery is unlikely to be representative of GEC-B nodules as a whole.
With these caveats in mind, combined analysis of published postvalidation studies reveals 101 GEC-B nodules with histologic follow-up, 12 of which were classified by authors as malignant (Table 2).17–28 The reasons for these false-negative results are not clear, although they likely represent a combination of sampling error, the intrinsic false-negative rate of the test, and variations in thresholds for diagnosing nodules as malignant on histopathology.7 Taking into account the selection bias inherent in examining only GEC-B nodules that were selected for resection, the actual false-negative rate of Afirma is undoubtedly significantly lower than 12%, in keeping with the high sensitivity of the test. Furthermore, among the cases considered to be false-negative in these studies is one nodule that could presently be reclassified as noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP; see discussion below).25 Taken together, the independent reports summarizing the everyday clinical experience with Afirma appear to support its role as an effective test for identifying a subset of cytologically indeterminate nodules that may warrant a watchful waiting approach because of their low malignancy risk.
Has Afirma Reduced Unnecessary Surgeries for Cytologically Indeterminate Nodules?
Afirma has been marketed as a test that can reduce the number of unnecessary thyroid surgeries. In support of this claim, authors have pointed to the low surgical rates for GEC-B nodules compared with either (1) historical surgical rates for cytologically indeterminate nodules prior to the introduction of Afirma testing (“pre-Afirma”) or (2) surgical rates for GEC-S nodules (Table 3).18,29 Likewise, others have reported the similarity between surgical rates for GEC-B nodules and cytologically benign nodules, as well as the durability of the low surgical rates for GEC-B nodules during a 3-year follow-up period.30,31 These comparisons demonstrate clinician and patient compliance with Afirma's recommendation for nonsurgical management for GEC-B nodules; however, these studies do not specifically address reduction in overall surgical rates due to Afirma, nor do they consider whether the surgeries were unnecessary. Indeed, for studies that provide data regarding institutional surgical rates and histologic outcomes for cytologically indeterminate nodules in the pre-Afirma versus Afirma-tested cohorts, there is no significant difference in overall surgical rates between these groups (Table 3).20,24,27,32
For the purposes of this analysis, if we define unnecessary thyroid surgery as resection of cytologically indeterminate, histologically benign nodules, the impact of Afirma appears to vary among institutions (Table 4). Data from the University of California at Los Angeles experience demonstrate a significantly lower rate of unnecessary surgery in the Afirma-tested cohort (51%) compared with pre-Afirma controls (66%) (P = .004).27,32,33 In contrast, rates of unnecessary surgery were not significantly altered by Afirma testing in the series by Chaudhary et al20 (68% pre-Afirma, 66% post-Afirma; P = .89) or Sacks et al24 (75% pre-Afirma, 70% post-Afirma; P = .57).
The explanation for these institutional differences is uncertain. On one hand, Sacks et al24 speculate that in their Afirma-tested cohort, a GEC-S result may have had an unduly high effect on prompting surgery in spite of a moderate PPV of ∼40%, thus offsetting the reduction in surgery afforded by GEC-B results. Different institutional criteria for selecting nodules for ancillary testing (ie, reflex testing on nodules with a single indeterminate FNA versus testing on nodules with persistently indeterminate cytology on repeat FNA) may have also contributed to the apparent differences in clinical utility of the test among various institutions. The retrospective nature of these postvalidation studies precludes definitive conclusions about the impact that Afirma has made in clinical practice. Nevertheless, the collective data suggest that the clinical utility of Afirma can be institution specific on a population-wide level; detailed examination of the differences in ordering and interpreting the test among these institutions may be helpful for refining the guidelines for optimal test use.
THYROSEQ
ThyroSeq v2 is a multigene test that is based on the targeted DNA and RNA next-generation sequencing analysis of 56 genes (ie, point mutations and small insertions/deletions in 14 genes and 42 types of gene fusions) and expression levels of 16 genes (Figure 1, B, and Table 1). Next-generation sequencing offers high sensitivity of detection and ability to quantify the proportion of cells carrying a given mutation. Primary diagnostic information provided by the test is based on the analysis of mutation hotspots and gene fusions that have been found in ∼90% of papillary thyroid carcinomas (PTCs) and other thyroid cancers.34 In addition, the expression levels of several gene mRNAs are used to assess sample adequacy and define a proportion of thyroid follicular cells within the sample. The proportion of thyroid follicular cells is defined based on the expression of thyroglobulin, thyroid transcription factor 1, cytokeratin-7, and sodium/iodide symporter (SLC5A5) mRNAs. In addition, the expression levels of the SLC5A5 mRNAs provide information on the functional status of cells within the nodule and allow detection of hyperfunctional thyroid nodules, which are typically benign. The presence of C-cell/medullary thyroid carcinoma and parathyroid cells is defined by the expression of calcitonin mRNA and parathyroid hormone mRNA, respectively, and reported as suspicious for medullary carcinoma or parathyroid nodule. In samples with high expression of calcitonin mRNA, the findings of RET or RAS mutations allow provision of additional information related to the germline or somatic status of the disease.
Clinical validation of ThyroSeq v2 was reported in 2 single-institution studies of thyroid nodules with indeterminate cytology and known surgical outcome.35,36 A study of 143 nodules with FN/SFN (Bethesda IV) cytology (91 retrospectively and 52 prospectively collected samples)35 and a prospective study of 96 patients with AUS/FLUS (Bethesda III) cytology36 reported a high sensitivity (90%–91%) and specificity (92%–93%) of ThyroSeq v2 (Table 1). In the study cohorts with a pretest cancer prevalence of 23% to 27%, ThyroSeq v2 demonstrated an NPV of 96% to 97% and a PPV of 77% to 83% (Figure 2, C and D). This general PPV of the test is further refined in each individual sample based on the type of detected genetic alteration.35,36 Indeed, many mutations, such as BRAF V600E and RET/PTC, confer a close to 100% probability of cancer, whereas other mutations, such as RAS, are diagnostic of a clonal tumor with ∼80% probability of typically low-risk cancer or NIFTP, as discussed later in this review.11 Other mutations, such as PTEN and EIF1AX, serve as a marker of clonal neoplasm but alone are not sufficient for full cancer development.37 Furthermore, evaluation of the expression of cell lineage genes established a diagnosis of parathyroid nodule in 0.6% of “thyroid” nodules with AUS/FLUS cytology, which was confirmed by subsequent parathyroid surgery or clinical and laboratory diagnosis of hyperparathyroidism.36
The performance and clinical utility of ThyroSeq v2 have been evaluated in recent independent studies. Valderrabano et al38 assessed the test performance and clinical utility in 190 nodules (from 182 patients) with AUS/FLUS and FN/SFN cytology. ThyroSeq v2 performance was calculated using 102 nodules that were surgically excised. Considering NIFTP in the malignant category, the prevalence of malignancy in this cohort was 20%. ThyroSeq v2 showed 70% sensitivity, 77% specificity, 42% PPV, and 91% NPV. The performance of ThyroSeq v2 in their study was significantly better in FN/SFN than in AUS/FLUS nodules (area under the curve 0.84 versus 0.57, respectively; P = .03). The authors reported 5 specimens to be false-negative, with an additional case showing a low level of KRAS mutation. Among the 5 mutation-negative nodules, 2 were NIFTP, 2 were papillary carcinomas, and 1 was a minimally invasive follicular carcinoma with vascular invasion. One of the papillary carcinomas was a 0.6-cm microcarcinoma that did not match by size a 1.1-cm biopsied nodule as measured by ultrasound, making correlation between molecular testing and final pathology questionable. Of note, a previous study from the same institution reported the PPV of the Afirma test at 16%,23 highlighting the variability in performance of molecular tests in different patient populations. The study of ThyroSeq also provided information on clinical utility of the test.38 The authors found that in their group of AUS/FLUS nodules, a negative or positive test result did not affect significantly the rate of malignancy in the nodules that underwent surgical excision. However, in the nodules with FN/SFN cytology, a positive test result increased the risk of malignancy 2.5-fold, with PPV of 53% to 65%.38 Overall, ThyroSeq-positive nodules in this study were more frequently resected than ThyroSeq-negative nodules (73% versus 48%, respectively; P < .001). Another independent study by Toraldo et al39 evaluated ThyroSeq performance in 148 nodules with indeterminate cytology, of which 45 underwent surgery, and found 95% sensitivity, 60% specificity, 66% PPV, and 94% NPV in their patient population.
In addition to diagnostic utility, ThyroSeq and other tests based on the detection of mutations may provide information on cancer aggressiveness, which could be taken into account when considering the extent of surgery. Although prognostic association of BRAF V600E remains controversial, TERT promoter mutations have been established as independent predictors of disease recurrence and cancer-related mortality in well-differentiated thyroid cancer.40–43 Furthermore, the co-occurrence of BRAF or RAS mutations with TERT or TP53 mutations may identify a small subset of thyroid cancers with the most unfavorable outcome.44–46 Single studies have demonstrated that testing for mutational markers was helpful to define the optimal extent of initial surgery and reduce the proportion of 2-step surgery, that is, initial lobectomy followed by completion thyroidectomy, for many patients.47,48 This is particularly applicable to thyroid nodules found positive for BRAF V600E, TERT mutations, and RET, ALK, or NTRK1/3 fusions, which confer a very high (close to 100%) probability of intermediate-risk or high-risk thyroid cancer. Upfront total thyroidectomy is currently a recommended surgical approach to these nodules.3 For nodules with RAS and other RAS-like mutations, which confer a high probability of either low-risk cancer or precancer NIFTP, diagnostic lobectomy may represent the optimal surgical approach. However, clinical utility of ThyroSeq in informing the extent of surgery is not fully defined yet and should be examined in prospective studies.
miRNA-BASED TESTS
MicroRNAs are small noncoding RNAs that regulate gene expression by influencing the stability and translation of mRNA. Compared with mRNA, miRNAs are relatively stable and can be isolated from routinely prepared formalin-fixed histopathology or alcohol-fixed cytology samples, making them a practical analyte for molecular diagnostics.49–52 Several research studies, which used mostly surgically excised tumor tissues, reported that a number of miRNAs are differentially expressed among various benign and malignant thyroid nodules.52–57 This provided a rationale for 2 recent commercially available tests that use miRNA expression for diagnostic use in cytologically indeterminate thyroid FNA specimens: ThyraMIR (a complementary test to ThyGenX) and RosettaGX Reveal.
ThyGenX/ThyraMIR
ThyGenX is a targeted next-generation sequencing test that assays for a small number of mutations in 5 genes (BRAF, KRAS, HRAS, NRAS, and PIK3CA) and 3 gene fusions (RET-PTC1, RET-PTC3, and PAX8-PPARG) associated with thyroid neoplasia. As described above, the detection of one of these genetic changes can help guide surgical management, depending on the mutation or gene fusion that is identified. Nodules harboring BRAF V600E mutations or clonal RET-PTC1/3 gene fusions have a virtually 100% risk of PTC and may warrant total thyroidectomy upfront.58–63 Apart from BRAF V600E mutations and RET-PTC1/3 gene fusions, the remainder of the genetic changes in the ThyGenX panel is limited by low specificity for malignancy. RAS mutations, BRAF K601E mutations, and PAX8-PPARG gene fusions have been identified in association with both premalignant/benign and malignant neoplasms.11,58,59,62–65 Accordingly, a diagnostic lobectomy may be the preferred initial surgical approach for nodules with these genetic changes. Similarly, a lobectomy is also recommended when ThyGenX is negative for this panel of mutations and gene fusions, because of the relatively low sensitivity and NPV of ThyGenX alone among cytologically indeterminate nodules.
The ThyraMIR miRNA expression classifier was introduced to address the challenges with the sensitivity and specificity of the ThyGenX panel. ThyraMIR measures the expression levels of 10 miRNAs (Table 5) by quantitative real-time polymerase chain reaction and uses a proprietary algorithm to classify each nodule as having either a high-risk or low-risk miRNA profile. A multi-institutional cross-sectional cohort study of ThyGenX and ThyraMIR using 109 thyroid FNA samples found that combining both tests achieved an optimal sensitivity (89%) and specificity (85%) profile among cytologically indeterminate nodules.66,67 Based on these findings, ThyraMIR is currently offered as a reflex test when the ThyGenX test is either (1) negative or (2) positive for RAS mutations, BRAF K601E mutation, PIK3CA mutation, or PAX8-PPARG fusion (ie, any mutation/fusion other than BRAF V600E or RET-PTC1/3) (Figure 1, C).
For samples assayed by both ThyGenX and ThyraMIR, Interpace Diagnostics estimates cancer risk based on results of both tests. A negative ThyGenX result coupled with a low-risk ThyraMIR result is associated with approximately 6% risk of malignancy among AUS/FLUS and FN/SFN nodules based on the 109 cases in the validation cohort with 32% cancer prevalence (Table 1).66 For other permutations of ThyGenX and ThyraMIR results (eg, RAS mutation coupled with low-risk ThyraMIR profile), estimates of cancer risk are given based on the laboratory data. Specimen adequacy for molecular testing is based on spectrophotometric assessment of nucleic acid quantity and quality.
RosettaGX Reveal
Similar to ThyraMIR, RosettaGX Reveal analyzes miRNA expression patterns to classify cytologically indeterminate thyroid aspirates as having either a “benign” or “suspicious by miRNA profiling” result. Rosetta GX Reveal's test panel measures 24 miRNAs, 6 sequences of which closely overlap with ThyraMIR's panel of 10 miRNAs (Table 5). Markers associated with thyroid epithelial cells are included to help ensure that miRNA analysis is being performed on thyroid epithelial cells rather than blood components.68 In contrast to the other molecular tests that are currently commercially available for thyroid aspirates, RosettaGX Reveal's miRNA classifier was analytically validated on cellular material recovered from direct cytology smears stained by Papanicolaou and Romanowsky-type stains (Figure 1, D).68 RosettaGX Reveal is thus marketed as a test that requires only a single adequately cellular cytology slide, with both direct smears and liquid-based cytology (eg, ThinPrep) slides accepted as substrates for molecular testing.
RosettaGX Reveal was clinically validated in a retrospective multicenter study of 189 cytologically indeterminate samples with histologic follow-up.69 Of note, this validation cohort included suspicious for malignancy (Bethesda V) cases in addition to AUS/FLUS and FN/SFN cases. In this overall validation cohort with 32% prevalence of cancer, RosettaGX Reveal was found to have 85% sensitivity, 72% specificity, 91% NPV, and 59% PPV. The test reached superior sensitivity (98%) and NPV (99%) and modest increases in specificity (78%) and PPV (62%) when analysis was limited only to cases in which all 3 pathologists involved in the validation study (2 study pathologists and 1 referring pathologist) concurred on the reference histopathology diagnosis (n = 150; 27% prevalence of cancer). The improvement in RosettaGX Reveal's sensitivity and NPV in this agreement set was likely driven in part by the exclusion of 14 encapsulated follicular variant of PTCs from the validation set, 5 of which were misclassified as benign by RosettaGX Reveal. Importantly, the validation cohort did not include any oncocytic (Hürthle cell) carcinomas, and therefore the performance of RosettaGX Reveal in detecting these tumors in the clinical setting remains unknown. This represents a key limitation of the test, which should be taken into account particularly if the clinical intention is to avoid surgery for nodules with a negative test result.
Although the low concordance among cytopathologists for using the suspicious for malignancy Bethesda category7 may justify its inclusion in RosettaGX Reveal's validation cohort as an indeterminate category, the validation studies for Afirma, ThyroSeq, and ThyGenX/ThyraMIR have largely limited their analyses to aspirates in the AUS/FLUS and FN/SFN categories (Table 1).12,35,36,66,69 Such differences in the composition of the validation cohort among these studies should be considered when comparing their results; for comparison purposes, Table 1 and the solid PPV and 1 − NPV curves in Figure 2 include only data from the AUS/FLUS and FN/SFN cases from each of the validation studies. The predictive value curves for the somewhat idealized cohort (agreement set with inclusion of suspicious for malignancy cases) in the RosettaGX Reveal validation study are shown with dotted curves for comparison (Figure 2, G and H).
The ability to use cytology slides as the starting material for molecular analysis circumvents the need to collect additional FNA material into a nucleic acid preservative solution specifically for molecular testing, theoretically reducing the number of FNA passes that are required from each patient. Furthermore, the ability to analyze miRNAs from the same population of cells that have undergone visual cytologic examination may help reduce the potential for sampling error, as can be seen when separate FNA passes are obtained for cytology and molecular testing. One drawback to this approach is the need to sacrifice a diagnostic cytology slide for nucleic acid extraction. Rosetta Genomics offers slide scanning services prior to processing the slide for molecular testing.
MOLECULAR TESTING FOR THYROID FNA IN THE NIFTP ERA
The noninvasive subset of encapsulated/well-circumscribed follicular variant of PTC was recently proposed to be renamed as NIFTP, to reflect the indolent clinical behavior of these tumors in the absence of capsular or vascular invasion.11 Retrospective analysis of archival cases has shown that the preoperative FNA cytology of NIFTPs fall predominantly into AUS/FLUS, FN/SFN, or suspicious for malignancy categories.70–74 Cytologic and molecular features may help distinguish NIFTP from conventional-type PTCs and infiltrative (ie, not encapsulated or well-circumscribed) follicular variant of PTC, with NIFTP showing lesser degrees of nuclear atypia, increased association with RAS mutations rather than BRAF V600E mutations, and distinct miRNA profile.75–80 However, neither cytologic nor mutational analysis appears to reliably distinguish NIFTP from encapsulated/well-circumscribed follicular variant of PTC, and the primary distinction between these 2 entities remains the histologic identification of capsular or vascular invasion in the latter.70,72,80
All 4 molecular tests described in this review were developed prior to the NIFTP nomenclature change. Therefore, noninvasive, encapsulated/well-circumscribed follicular variant of PTC (tumors that would now be classified as NIFTP) in the training and validation sets for these tests were likely classified as having malignant reference histology. In keeping with this idea, several institutions have found that NIFTPs can be recognized as suspicious by Afirma25,81,82 or harboring a mutation or gene fusion by genotyping tests such as ThyroSeq.38,81 Although the reclassification of NIFTP out of the malignant category is expected to affect the predictive values of these tests, the overall management implications may not be significantly affected. For tests with modest PPV (Afirma GEC, RosettaGX Reveal), an abnormal result (suspicious GEC or miRNA profile) typically prompts diagnostic lobectomy, which is currently considered appropriate treatment for NIFTP. Similarly, for tests such as ThyroSeq and ThyGenX/ThyraMIR, the granularity provided by specific genotypes (eg, isolated RAS mutations, PAX8-PPARG, or THADA fusions) makes it possible to suggest a diagnostic lobectomy for nodules with a high likelihood of being NIFTP. Importantly, NIFTP is considered as a premalignant tumor with a potential to invade and spread hematogenously, and as such it requires surgical excision to examine the tumor capsule to rule out invasion. Therefore, although not malignant, NIFTP is an indication for surgery, and its detection preoperatively by molecular tests may not necessarily represent a false-positive result.
CONCLUSIONS
All 4 tests described above highlight the clinical relevance of molecular diagnostics applied to thyroid cytopathology. Nevertheless, each of these tests shows differences in methodology and performance characteristics in their effort to improve risk stratification among cytologically indeterminate thyroid nodules. The emerging literature about the performance of these tests in day-to-day clinical practice can offer insights into factors that may influence the optimal use of molecular testing for thyroid cytopathology.6,83 Prospective, blinded, multi-institutional studies that use a large set of FNA samples representing all thyroid cancer types and having low postunblinding sample exclusion rates are required for further understanding of the performance and clinical utility of these diagnostic tests. In the 2015 update to the management guidelines for thyroid nodules, the American Thyroid Association includes general recommendations about the use and interpretation of ancillary molecular testing in the context of clinical, ultrasonographic, and cytologic findings. In recognition of the relatively early point in our experience with these tests, the American Thyroid Association guidelines do not advocate a particular test or a specific algorithm for selecting nodules for molecular testing, with the exception of nodules that are cytologically suspicious for papillary carcinoma (Bethesda V), where testing for BRAF and other genetic markers was listed in the recommendation.3 Studies that underpin evidence-based guidelines for test selection and interpretation will be needed to develop a standardized approach to molecular testing for thyroid cytopathology, particularly as we move towards a more personalized, risk-based approach to the diagnosis and management of thyroid nodules.
References
Author notes
Dr Nikiforova is a consultant for Quest Diagnostics, and her employer, University of Pittsburgh Medical Center, has a commercial contract with CBLPath, Inc (Rye Brook, New York) for ThyroSeq test distribution. Dr Nishino has no relevant financial interest in the products or companies described in this article.
Competing Interests
Presented in part at the V Molecular Cytopathology: Focus on Next Generation Sequencing in Cytopathology meeting; October 18, 2016; Napoli, Italy.