Mutational signatures have been described in the literature and a few centers have implemented pipelines for clinical reporting.
To describe the performance of a mutational signature caller with clinical samples sequenced on a targeted next-generation sequencing panel with a small genomic footprint.
One thousand six hundred eighty-two clinical samples were analyzed for the presence of mutational signatures using deconstructSigs on variant calls with at least 20 variant reads.
Signature 10 (associated with POLe mutation) achieved separation of cases and controls in hypermutated samples. Signatures 4 (associated with tobacco smoking) and 7 (associated with ultraviolet radiation) as an indicator of pulmonary or cutaneous primary sites showed moderate sensitivity and high specificity at optimal cutpoints. Mutational signatures in malignancies with unknown primaries were somewhat consistent with the clinically suspected primary site, with an apparent dose-response relationship between the number of variants analyzed and the ability of mutational signature analysis to correctly suggest a primary site.
Mutational signatures represent an opportunity for orthogonal testing of primary site, which may be particularly useful in supporting cutaneous or pulmonary sites in poorly differentiated neoplasms. Tobacco smoking, ultraviolet radiation, and POLe mutational signatures are the most appropriate signatures for implementation. Even relatively small numbers of variants appear capable of supporting a clinically suspected primary.
Mutational signatures have been described in the literature1 and reflect the presence of mutagenic exposures present during a tumor's evolution. Various types of mutational signatures have been described, including patterns of single-base substitution (eg, C>A), double-base substitution (eg, CC>AA), and indel substitution. Typically, the variants observed are categorized in the trinucleotide context in which the altered base occurs in the reference genome (eg, T[C>A]T versus T[C>A]A, and so on). This subcategorization by trinucleotide context permits resolution of different types of patterns that would otherwise be indistinguishable. Certain mutational signatures are strongly associated with environmental exposures (eg, signature 4 with tobacco smoking and signature 7 with ultraviolet radiation1 ), and by extension, could be considered indicators of a tumor's primary site. For example, signature 4 could be considered an indicator of a mutagenic process characteristically present in a pulmonary primary and likewise for signature 7 with respect to a cutaneous primary.
A few centers have implemented pipelines for mutational signature detection in clinical samples,2 although the tool is predominantly implemented in research settings. Recently described algorithms have permitted resolution at a case (rather than cohort) level basis,3 an important capability if this process is to be translated to clinical samples.
There are 2 unique challenges to clinical implementation of mutational signature analysis. First, most clinical assays perform targeted sequencing with higher sensitivity but more limited scope compared with the exome-wide and genome-wide coverage in which these signatures were initially described. At our institution, the first version of our targeted sequencing panel covered 198 genes, whereas version 2 covers 130 genes. Second, most clinical assays, including ours, perform tumor-only sequencing of formalin-fixed, paraffin-embedded tissue. Mutational signature analysis is known to be complicated by formalin-induced or sequencing artifacts, which are present in addition to mutations due to bona fide mutagenic exposures.4,5 In addition, tumor-only sequencing cannot reliably distinguish germline from somatic variants within the variants available for analysis, thereby complicating the ability to detect signatures unique to the tumor's evolution.6
Mutational signatures nevertheless offer an opportunity to provide orthogonal evidence of primary site, and deserve attention especially in cases of unknown primary that have otherwise exhausted means of categorization. Here, we describe our experience evaluating the performance of a mutational signature caller for use with targeted sequencing data in clinical tumor samples.
MATERIALS AND METHODS
Patients and Samples
This analysis was approved by our institutional review board. Clinical samples sequenced on our institutional targeted panel from the first clinical run of version 1 up to version 2 run 338, numbering 1682 in total, were analyzed as described below. The specimens included 840 samples run on version 1 of our panel and 842 run on version 2 of our panel (see below for further details regarding the panel). The primary site was recorded from the requisition sheet and compared with clinical documentation where necessary. POLe status was abstracted from genotype information within the sequencing results for version 2 of the targeted panel, and the medical record for version 1 (which did not cover POLe). For subjects with at least 9 variants suitable for analysis (see below, n = 195), we additionally extracted information regarding pack-years and smoking history from clinical documentation for further assessment of signature 4.
Sequencing
Our institutional targeted panel is offered on a clinical basis for tumor-only sequencing; version 1 covered 198 genes, whereas version 2 covers 130 genes. Selected exons of POLe are covered on version 2, but were not covered on version 1. The method uses hybridization capture-based targeted sequencing on an Illumina MiSeq system with a minimum analytic detection limit of 5%. By way of brief description, this is a clinically available laboratory-developed test that is focused on the identification of actionable mutations in genes associated with malignant behavior, such as EGFR, TP53, BRCA1/2, and many others. Some genes have full-exonic coverage, whereas others have targeted hotspot coverage, all with next-generation sequencing. Most variants analyzed are point mutations; however, indels and select rearrangements are additionally identified by our pipeline. Version 1 of our panel covers 301 559 base pairs, whereas version 2 of our panel covers 231 955 base pairs. Initial variant calling is predicated on total read depth of at least 100 reads with a minimum of 1% for variant allele fraction. Variants with a variant allele fraction more than 5% are automatically reported provided they meet other quality criteria, whereas variants between 1% and 5% variant allele fraction are further considered for reporting based on a number of interpretive factors. For the purposes of this analysis, variants initially called by the pipeline were further subset to only those with at least 20 variant reads (further described below in Mutational Signature Analysis). Variants for analysis are prefiltered by our pipeline to exclude variants with greater than 1% population frequency in ExAC, 1000 genomes or ESP (Exome Sequencing Project), which greatly reduces the number of germline polymorphisms in the remaining data. For greater detail with respect to sequencing methodology, the reader is referred to previously published literature.7
Mutational Signature Analysis
The R package deconstructSigs3 was run on all clinical cases available for analysis (n = 1682) with COSMIC mutational signatures version 1 (provided as part of the package, based on sequencing of more than 10 000 exomes and more than 1000 whole genomes across 40 types of malignancies) as reference. These signatures have diverse etiologies that are currently only partly understood. Etiologies and characteristic descriptions of mutational signatures used as controls in this analysis are included in Table 1, and the reader is referred to the Cosmic database8 for a more continually updated accounting of common mutational signatures. Only variants with 20 or more variant allele reads were submitted for analysis. In preliminary analyses, this demonstrated superior performance, compared with filtering by allele fraction or no filtering (data not shown). Variants were analyzed with a trinucleotide context normalized to the trinucleotide contexts present on the particular version of our targeted sequencing panel (ie, version 1 versus version 2). The resulting proportions of mutational signatures were analyzed on a per signature basis to classify cases by primary site (pulmonary versus nonpulmonary for signature 4 and cutaneous versus noncutaneous for signature 7) or POLe genotype status (wild type versus mutant for signature 10). Optimal cutpoints were calculated using the Youden Index, which maximizes the sum of sensitivity and specificity across all sites, using the Optimal Cutpoints package in R9 ; scores above the optimal cutpoint were considered “positive” for the signature in question going forward. Performance characteristics were subsequently analyzed with respect to sensitivity and specificity, with specific attention paid to the numbers of variants analyzed. Additional visualization of results was performed using easyROC.10
Analysis of Cases Designated With Unknown Primary Site
Specimens submitted for sequencing with “unknown” designations under primary site on the requisition sheet were categorized as to clinically suspected primary site and primary sites suggested by 30 different mutational signatures present in the COSMIC version 1 control signatures. A primary site was considered suggested by a particular signature if the signature had been observed in that organ in COSMIC. These cases were subsequently analyzed for concordance of the clinically suspected primary site in comparison to those suggested by mutational signature analysis. The number of variants analyzed (variants with ≥20 variant allele reads, as described above) were recorded. The number of variants analyzed was then compared between cases with concordant versus discordant suspected primary sites, using the Mann-Whitney U test in R for determination of statistical significance.
RESULTS
Mean on-target read depth was 1613 reads (see table 1 in the supplemental digital content containing 3 tables at https://meridian.allenpress.com/aplm in the November 2021 table of contents). One hundred eighty-seven samples (187 of 1682; 11%) could not be evaluated, mostly due to a paucity of variants for analysis; this included 1 sample with an unknown primary, which was not included for downstream concordance analysis. The samples that failed mutational signature analysis included 5 of 187 (2.6%) cutaneous primaries and 67 of 187 (36%) pulmonary primaries. The cohort that successfully ran included 73 of 1495 cutaneous primaries (4.8%) and 780 of 1495 (52%) pulmonary primaries. There was no minimum number of variants to be analyzed; however, in cases with very few variants, occasionally no pattern is deemed an appropriate fit from among the 30 reference Cosmic version 1 mutational signatures. Of the remaining 1495 samples, 1300 (88%) had between 1 and 9 (exclusive) variants and 195 (12%) had 9 or more variants available for analysis (see Figure 1).
Of the 1495 successfully analyzed samples, performance characteristics are summarized in Table 2 and are further explained below. Samples included predominantly solid organ tumors (93%; 1388 of 1495), with the remaining 7% (107 of 1495) consisting of poorly differentiated, hematolymphoid, and melanocytic neoplasms (see Figure 2); these samples were sequenced before the introduction of a separate panel targeted to hematolymphoid neoplasms.
The optimal cutpoint for detection of a pulmonary primary by signature 4 in all samples was calculated as 0.089, yielding a sensitivity of 14.9% and specificity of 95% with an area under the curve (AUC) of 0.55 and a positive likelihood ratio of 2.95 at that cutpoint. In samples with at least 9 variants for analysis, sensitivity by optimal cutpoint analysis rose to 39.0%, with a specificity of 91.1% at a cutoff of 0.086 (see Figure 3, A through D); the AUC was 0.66 with a positive likelihood ratio of 4.39 at that cutpoint. We had information regarding smoking status and pack-year history for this subset of subjects with at least 9 variants for analysis (n = 195); of these, 122 were ever-smokers, with average 32.5 pack-years (median 30 pack-years). Within the subset of ever-smokers, signature 4 had an optimal cutpoint for detection of a pulmonary primary of 0.20, with a sensitivity of 0.41 and specificity of 0.97 at that cutpoint; the positive likelihood ratio was 15.4 with an AUC of 0.69 (see Figure 4, A through D).
Sensitivity by optimal cutpoint analysis for the detection of a cutaneous primary by signature 7 was 49.3% in all cases, with a specificity of 91.5% at a cutpoint of 0.195; the AUC was 0.70, with a positive likelihood ratio of 5.61 at that cutpoint. In cases with at least 9 variants for analysis, sensitivity rose to 88.6%, with a specificity of 83.1% at 0.195 (see Figure 5, A through D) and an AUC of 0.90; the positive likelihood ratio was 5.25 at that cutpoint.
Signature 10 showed separation of cases (n = 2) and controls (n = 14) in samples with at least 30 variants for analysis (total n = 16), with any proportion signature 10 greater than 0.7 considered positive (see Figure 6). These 2 POLe mutant cases were sequenced on both versions of our panel and the results of mutational signature analysis from the first version are displayed in Figure 6; the results from the second version of our panel were similarly positive (data not shown).
Cases with unknown primary site (n = 24) included a variety of tumors with various possible primaries ranging from pulmonary (n = 6) to salivary gland (n = 3) and several others (see Supplemental Table 1). The mutational signatures detected likewise included a variety of signatures, with several cases showing signature 1 (associated with aging), and various other signatures fully delineated in Table 1. The median number of variants analyzed in cases with concordant primary site (n = 10) suggested by mutational signature analysis and the most likely site by clinical suspicion was 7 variants, whereas cases with discordant primary sites suspected by mutational signature analysis and clinical suspicion (n = 14) was 4 variants (see Figure 7; P = .02, Mann-Whitney U test), consistent with the prior finding that the number of variants positively correlates with classification performance.
DISCUSSION
Mutational signature analysis is a promising analytical technique developed in research settings with thus far limited clinical implementation. The clinical utility of this technique has been uncertain to date given its development in exome and genome-sequenced samples using fresh tissue, whereas the majority of clinical solid tumor sequencing is performed with the use of formalin-fixed tissue on targeted panels. Our results support the notion that mutational signature analysis can be useful in clinical tumor-only targeted sequencing of formalin-fixed, paraffin-embedded tissue. The findings presented in this study suggest that the panel size and number of mutations limit the general applicability to a broad class of primary sites, but that high specificity can be achieved for a subset of signatures, namely signature 7 as an indicator of a cutaneous primary and signature 10 as a reflection of POLe status, with the conspicuous caveat that we only had 2 true positives to evaluate for the latter. To increase the signal to noise ratio in the variant data, it is important to prefilter the variants, in our case by requiring a minimum of 20 variant supporting reads. Further filtration strategies, such as limiting reportable signatures to cases with a certain minimum number of variants suitable for analysis, improve the sensitivity and specificity, but reduce the number of applicable cases (eg, 12% of cases with ≥9 mutations in our cohort). Likewise, there is nontrivial attrition (11%) at the outset of the analysis in cases with very few variants for analysis. This attrition, which affects cases without variants with a suitable match to any of the COSMIC version 1 signatures, is likely to bias the results somewhat in favor of specimens with a better match to control signatures, perhaps painting a rosier portrait of analytical performance. We note that both cutaneous and pulmonary sites appear to be underrepresented in specimens that fail to run through deconstructSigs; this may reflect a higher likelihood for detectable mutations on our panel and/or higher tumor mutation burdens in general, among other possibilities. We estimate based on our clinical sign-out experience with our panel that some subjects may have 1 to 2 or more germline variants present in the variants for analysis, which may further suppress the ability to detect mutagenic exposures unique to the tumor's evolution, and may by extension further limit sensitivity.
The best performing signature in our data is signature 10 (associated with POLe dysfunction), where we achieve separation between 2 positive cases and controls, including cases sequenced with version 1 of our panel that did not cover POLe at all. The 2 positive cases showed similar levels of positivity with sequencing results from both versions of our panel, demonstrating some replicability across slightly different panel compositions.
POLe mutations are associated with improved outcomes in patients treated with immune checkpoint inhibitors,11 with clinical trials underway to assess prospective performance of immunotherapies in these patients. The cited study found similar outcomes for patients with exonuclease and nonexonuclease domain POLe mutations, and made the case for extended exonic coverage for POLe in targeted panel sequencing. Our results suggest that even in panels with limited or no POLe coverage, mutational signature analysis could reliably detect POLe mutant cases, as long as adequate coverage is present across the rest of the panel to detect a sufficient number of mutations for input to the analysis.
Signature 4 (associated with tobacco smoking) shows relatively low AUC with low sensitivity for the detection of a pulmonary primary site. High specificity may be achieved at very low cutpoints, but overall, the positive likelihood ratio remains relatively moderate at approximately 3 for all cases and approximately 4 for cases with 9 or more variants. Within a circumscribed set of specimens for which we were able to collect a positive history of smoking and 9 or more variants for analysis (n = 122), the positive likelihood ratio is quite high at 15 at an optimal cutpoint of 0.20; however, this analysis is somewhat limited in generalizability given the small numbers of specimens qualifying for the analysis. Theoretically, the signature may be of use in cases with an ambiguous primary site and no immunohistochemical support for a specific lineage, but given the limited positive likelihood ratio in all-comers (regardless of input variants and smoking history), it is difficult to envision significant clinical utility for this type of analysis. The limitations of this mutational signature analysis to support assignment of a pulmonary primary in our cohort likely reflect a variety of factors. First, our panel is limited in size and we may not have the genomic breadth to permit sufficient sensitivity for the number of mutations necessary to correctly evaluate the presence of signature 4. This may be a less critical issue at centers with larger panel size. Second, signature 4 may also be present in upper aerodigestive tract malignancies with a history of tobacco smoke exposure and mutational signature classification may detect true presence of the signature in a nonpulmonary primary. For example, 1 case classified as “positive” for signature 4 was a sample from a former pipe smoker with an esophageal adenocarcinoma, which therefore resulted in this case being classified as a “false positive” by our previously established rules. Along these same lines, tobacco smoking is relatively uncommon in our catchment area,12 and pulmonary primaries at our center are consequently likely to reflect other mutagenic exposures, such as aging. Low pretest probability for the presence of this signature in pulmonary primaries may be less of an obstacle in centers treating populations with heavy tobacco exposure. Third, some amount of inaccuracy is expected to be caused by misclassification between similar signatures. For example, statistically random C>A transversions can be expected to underlie occasional misclassification as signature 4. Finally, the gold standard used for this study for determination of primary site was clinical assessment, so if cases were incorrectly classified on the basis of clinical evaluation, that would also affect sensitivity and specificity in this analysis.
The marginal diagnostic utility of detecting signature 4 will be highest in scenarios such as metastases with unknown or ambiguous primaries, where a pulmonary site may be favored. One case in our cohort that was positive for signature 4 was signed out as “favor breast primary” on the basis of weak GATA3 immunohistochemical positivity in a fine-needle aspiration of a lung mass; the patient was nevertheless treated as though she had a pulmonary primary because she had no breast lesions. The limited positive likelihood ratio in our cohort (in the 2–5 range) suggests signature 4 may have little to add beyond the clinical assessment; however, in rare circumstances mutational signature analysis may have a role to play in corroborating clinical or pathologic impressions of primary site.
Signature 7, associated with ultraviolet radiation, shows moderate sensitivity and high specificity for determination of a cutaneous primary in our cohort at low, optimal cutpoints with moderate, positive likelihood ratios of approximately 5. Although signature 7 has been identified in some noncutaneous primaries (just as signature 4 is sometimes detected in nonpulmonary primaries), it is nevertheless relatively specific as its presence is greatly enriched in malignancies arising at sun-exposed sites. Within our cohort, there were several cases demonstrating the clinical relevance of detecting signature 7, including neuroendocrine carcinomas of unknown primary that were treated as metastatic Merkel cell carcinoma, and cases in which the differential diagnosis included metastatic melanoma. The sensitivity and specificity of this signature suggest that it may be most useful in cases where the determination of a cutaneous primary would alter clinical management.
The very limited analysis of mutational signatures in malignancies with unknown primary site demonstrates a statistically significant difference in the number of variants analyzed in specimens whose clinically suspected most likely primary site was concordant with possible primaries suggested by mutational signatures as opposed to those whose sites were discordant. This finding suggests there is some detectable signal with respect to mutational signatures associated with various primary sites; however, the analysis is limited by very small sample size, low overall mutational burden, and the fact that the underlying signatures are not themselves validated. We consider this type of analysis to be primarily hypothesis-generating and suggestive of future avenues for exploration of the application of mutational signature analysis in clinical practice.
Although we have not individually validated all 30 single-base substitution mutational signatures in COSMIC version 1 (and have focused most of our efforts on the 3 signatures described above), our analysis of all 30 mutational signatures in cases of unknown primary within our cohort demonstrates that mutational signatures are more apt to suggest a clinically likely primary site in the presence of adequate input variants. Given that some cases exhaust immunohistochemical or other readily available means to determine primary site, mutational signature analysis may represent a complementary tool that can provide important additional evidence in challenging cases. Our experience shows that signatures 10 and 7 show clinical performance compatible with suitability for clinical implementation.
We thank Norm Cyr for assistance in formatting figures.
References
Author notes
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the November 2021 table of contents.
The authors have no relevant financial interest in the products or companies described in this article.