Next-generation sequencing–based assays are being increasingly used in the clinical setting for the detection of somatic variants in solid tumors, but limited data are available regarding the interlaboratory performance of these assays.
To examine proficiency testing data from the initial College of American Pathologists (CAP) Next-Generation Sequencing Solid Tumor survey to report on laboratory performance.
CAP proficiency testing results from 111 laboratories were analyzed for accuracy and associated assay performance characteristics.
The overall accuracy observed for all variants was 98.3%. Rare false-negative results could not be attributed to sequencing platform, selection method, or other assay characteristics. The median and average of the variant allele fractions reported by the laboratories were within 10% of those orthogonally determined by digital polymerase chain reaction for each variant. The median coverage reported at the variant sites ranged from 1922 to 3297.
Laboratories demonstrated an overall accuracy of greater than 98% with high specificity when examining 10 clinically relevant somatic single-nucleotide variants with a variant allele fraction of 15% or greater. These initial data suggest excellent performance, but further ongoing studies are needed to evaluate the performance of lower variant allele fractions and additional variant types.
The past several years have seen the development or expansion of several national and international efforts that aim to accelerate the implementation of precision medicine.1–4 One area of focus is oncology, and molecular alterations are currently being used in clinical practice as biomarkers to assist in diagnostic categorization and prognostication, as well as the selection and monitoring of therapies.5 Increasingly, massively parallel sequencing methods, or next-generation sequencing (NGS)–based methods, are used to identify DNA and RNA biomarkers for clinical management. A central issue for test performance is to ensure robust molecular results to guide clinical care. In this manuscript, we describe the analytic accuracy of clinical NGS-based molecular oncology testing for the detection of somatic variants in solid tumors based on a large interlaboratory comparison using blinded, engineered specimens.
A previous publication from our group surveyed clinical laboratories regarding specimen requirements, assay characteristics, and other trends in NGS-based oncology testing.6 Two key findings from that survey guided the development of this proficiency testing program: (1) most laboratories perform targeted sequencing of tumor-only specimens to detect single-nucleotide variants (SNVs) and small insertions or deletions with a reported 5% to 10% variant allele fraction as the lower limit of detection, and (2) testing is primarily performed using amplicon-based, predesigned commercial kits using benchtop sequencers. With this guidance, we developed a proficiency-testing program for NGS-based oncology tests designed to detect recurring somatic variants in solid tumor specimens. This and other proficiency testing programs are designed to ensure the accuracy and reliability of patient test results provided by clinical laboratories.7–9 Here, we report the performance of laboratories for this initial Next-Generation Sequencing Solid Tumor Proficiency Testing Survey.
MATERIALS AND METHODS
Data were derived from the initial College of American Pathologists (CAP) Next-Generation Sequencing Solid Tumor Survey (NGSST-A 2016). The specimens were sent to laboratories on May 9, 2016, and the laboratories returned the survey results by June 18, 2016. Laboratories were concurrently sent 3 independent specimens containing linearized plasmids with engineered somatic variants mixed with genomic DNA derived from the GM24385 cell line to achieve variant allele fractions (VAFs) ranging from 15% to 50% (Table 1).
Specimens were generated by a commercial reference material vendor under good laboratory practice. Specimen 1 contained variants in AKT1, BRAF, FBXW7, IDH1, and KRAS; specimen 2 contained variants in EGFR and NRAS; specimen 3 contained variants in ALK, KIT, and PIK3CA. The synthetic DNA inserts contain a somatic variant with approximately 500 bp of flanking genomic sequence on each side of the variant. The flanking genomic DNA sequence was matched to the diluent genomic DNA (ie, GM24385). The VAFs were orthogonally confirmed by digital polymerase chain reaction (PCR) by measuring absolute mutant copies and wild-type copies. The VAFs were calculated by putting the absolute copies in the following formula for each variant in specimens 1, 2, and 3:
Laboratories were instructed to perform NGS using the methodology routinely performed on clinical samples in their laboratory for the detection of somatic SNVs, insertions, and deletions in solid tumors. This could include targeted gene panels, whole-exome, or whole-genome sequencing. As is standard for CAP surveys, laboratories were instructed that variant confirmation by a secondary orthogonal methodology must follow the laboratory's standard procedure for testing clinical specimens, but it cannot be referred to another laboratory. Following testing, laboratories reported the variants detected in each specimen by selecting from a master variant list containing 90 variants in 15 genes (AKT1, ALK, BRAF, EGFR, ERBB2, FBXW7, FGFR2, GNAS, IDH1, KIT, KRAS, MET, NRAS, PIK3CA, and STK11). In addition, laboratories provided read depth and VAF for each reported variant as well as the laboratory's assay performance characteristics.
Of the 111 laboratories that reported results, 108 performed targeted sequencing of mutation hotspots or cancer genes, 2 performed whole-exome sequencing, and 1 performed a combination of whole-genome, whole-exome, and targeted sequencing. The number of laboratories whose assay enabled the detection of a particular variant ranged from 85 (77%) for the FBXW7 p.R465H variant to 111 (100%) for the KRAS p.G13D variant (Table 1). The percentage of laboratories that correctly identified each variant ranged from 96.7% (87 of 90) for the ALK p.R1275Q variant to 100% for the BRAF p.V600E (110 of 110) and KRAS p.G13D (111 of 111) variants. The overall accuracy observed for all variants was 98.3% (993 of 1010).
There were 17 false-negative results reported by 12 laboratories (Figure 1)—8 laboratories reported a single false-negative result, 3 laboratories reported 2 false-negative results, and 1 laboratory reported 3 false-negative results. There were 3 false-positive results in which the laboratory correctly detected the presence of a variant in a gene but reported it as a different variant from expected (eg, the laboratory did not report detection of the EGFR p.G719S variant, but instead reported detection of the EGFR p.G719C variant).
Among the 12 laboratories with false-negative results, there was a diversity of sequencing platforms and selection methods (Table 2). Likewise, there was no significant association between laboratories with false-negative results and the reported lower limit of detection based on allele fraction for SNVs, average coverage, or minimum coverage (Table 3).
The median and average of the VAFs reported by the laboratories were within 10% of those determined by digital PCR for each variant (Figure 2; Table 1). The maximum observed SD was 8.5 (range, 2.9–8.5; Table 1), and at least 75% of laboratories reported VAFs within 20% of the engineered value for each variant.
The median coverage reported at the variant sites ranged from 1922 to 3297, although a significant range of coverage was reported (Figure 3; Table 1). The coverage data were further evaluated to determine whether there were differences between amplicon- and hybridization-based capture approaches. Of the 108 laboratories that performed only targeted sequencing of mutation hotspots or cancer genes, 107 laboratories provided sufficient information to determine whether they were using amplicon- or hybridization-based selection methods. Of these laboratories, 83.2% (89 of 107) used amplification-based selection methods, and 16.8% (18 of 107) used hybridization-based selection methods. The median coverage depth reported across all engineered variant positions was 3445 reads for amplification-based methods (range, 100–99 999) and 959.5 reads (range, 47–4244) for hybridization-based methods.
Surveys about NGS-based oncology testing practices from our group6 and others10 reveal a couple of observed trends. First, most laboratories are performing targeted sequencing of tumor-only specimens to detect SNVs and small insertions and deletions with a reported lower limit of detection of 5% to 10% VAF. Second, the testing is primarily performed using amplicon-based, predesigned commercial kits with benchtop sequencers. These trends were still observed in the current study; however, as was noted previously, there is a significant diversity of practice with respect to both the wet laboratory and bioinformatics processes. These test results from clinical molecular laboratories are used to guide patient care, including assistance with diagnosis, determination of prognosis, and the selection and monitoring of therapy. Consequently, evaluation of a set of common, well-characterized specimens is critical to evaluate analytical performance across platforms and approaches.
A total of 111 laboratories reported results from their NGS-based assay for the identification of somatic variants in solid tumor specimens. The laboratories were provided with 3 engineered specimens containing a total of 10 recurring somatic variants with a VAF between 15% and 50%. The overall accuracy observed for all variants was very high at 98.3%, and this high degree of accuracy was observed for each individual variant ranging from 96.7% to 100%. It is difficult to calculate analytic specificity because the exact target region for each assay is not known. However, only 3 potential false-positive results were reported, suggesting a high degree of analytic specificity, and as discussed further below, these could be transcription errors.
A total of 12 of 111 laboratories (10.8%) accounted for the 17 false-negative and 3 concurrent false-positive results. All 3 false-positive results were associated with a concurrent false-negative result involving the same codon of the same gene, indicating mischaracterization of a single mutation. There is no expected clinical consequence of these errors (eg, EGFR p.G719C versus p.G719S), and these may be attributed in part to the nature of the survey result form, which is not typical of clinical laboratory reports and is more prone to subtle transcription errors. Consequently, the combined accuracy of the assays for the variants tested may be as high as 98.6% (996 of 1010).
Prior to this and related studies, discussion about the accuracy and reliability of NGS-based oncology testing was often based on concordance data in which multiple alignment and variant-calling pipelines were applied to exome or genome sequencing data for germ line sequencing applications. These studies often demonstrated low to moderate concordance when multiple analysis pipelines were applied to the same exome or genome data.11,12 It was unclear at the time whether these studies could be extrapolated to clinical NGS-based oncology testing because of differences in application (germ line versus somatic), target region (exome or genome versus targeted panel), and setting (research versus clinical laboratory). The data presented in this manuscript both directly and rigorously address the accuracy and reliability of clinical NGS-based oncology testing, indicating very high interlaboratory agreement for the detection of somatic SNVs. The robust detection of clinically relevant somatic SNVs in this study is consistent with 2 prior studies focused on bioinformatics analysis of NGS-based oncology testing within clinical laboratories as well as 1 analytical validation study performed for the National Cancer Institute (NCI)-MATCH trial. A multi-institutional exchange of FASTQ files between 6 clinical laboratories showed a high rate of concordance for SNV detection and complete concordance for the detection of clinically significant SNVs.13 Likewise, a pilot CAP proficiency testing program in which somatic variants were introduced via an in silico approach demonstrated 97% accuracy for detection of somatic SNVs and indels with VAFs greater than 15%.14 Finally, a validation study of an NGS-based oncology assay performed by 4 clinical laboratories as part of the NCI-MATCH trial demonstrated a high level of reproducibility in the detection of reportable somatic variants.15
In this study, variants in BRAF, KRAS, and EGFR were detected with accuracy rates of 100%, 100%, and 97.2%, respectively. These are consistent with the acceptable proficiency testing results observed for US Food and Drug Administration companion diagnostics (FDA-CD) and diverse laboratory-developed tests (LDTs) for BRAF (93.0% FDA-CD, 96.6% LDTs), KRAS (98.8% FDA-CD, 97.4% LDTs), and EGFR (99.1% FDA-CD, 97.6% LDTs).16 Overall, the above studies indicate that clinical laboratories are able to detect clinically significant somatic SNVs from solid tumor specimens with high accuracy and reliability.
The 17 false-negative results were observed for variants with engineered allele fractions ranging from 20% to 50% (Table 1). Of the 12 laboratories with an error, 11 reported a lower limit of detection for SNVs of 15% or less VAF, and 1 laboratory did not specify its lower limit of detection. Because the variants that were not detected had VAFs greater than 18% when measured by digital PCR, at least 11 of the laboratories should have been able to detect these variants based on their reported assay characteristics. Furthermore, there was no clear association between laboratories with an error and sequencing platform, selection method, reported lower limit of detection, or average and minimum number of reads covering the targeted bases. Because laboratories were only asked to report coverage results for positions at which they identified a variant, we are unable to determine whether low coverage contributed to these false-negative results.
Recent joint consensus recommendations from the Association for Molecular Pathology (AMP), American Society of Clinical Oncology (ASCO), and College of American Pathologists (CAP) indicate that VAF “should be evaluated and included in the report when appropriate.”10 The average and median VAFs observed for each variant closely approximate the engineered VAF and that measured by an orthogonal method, digital PCR (Figure 2; Table 1). The observed SD ranged from 2.9 to 8.5 for each variant, and at least 75% of laboratories reported VAFs within 20% of the engineered value for each variant. Collectively, these data indicate that VAF measured by most laboratories is a reasonable approximation of the actual VAF. However, a minority of laboratories in this study reported VAFs that deviated significantly from the expected value. This suggests that laboratories that report VAFs from their NGS-based molecular oncology assay should verify the accuracy of their reported VAFs as part of assay validation. For extreme outliers (eg, reported VAF = 2% for IDH1 p.R132H, when engineered VAF = 40% and observed VAF by digital PCR = 41.5%), it is possible that these represent a transcription error introduced during the reporting process. We also cannot exclude that artificial constructs may have resulted in artifactual low VAFs for some selection methods, although we did not detect a pattern suggesting this interpretation.
For each position that a laboratory reported a variant, the total coverage depth at that position was also reported. The median coverage depth for each variant position varied from 1922 to 3297, with a range across all variant sites from 32 to 99,999 (with 99,999 being the highest value that could be entered on the result form). Laboratories using hybridization-based selection methods reported several-fold lower median coverage depth across all engineered variant positions (3445 reads for amplification-based methods versus 959.5 reads for hybridization-based methods). Recent joint consensus recommendations from AMP and CAP suggest a minimum depth of coverage of more than 250 reads.17 More than 98% (974 of 992) of the total variants reported by all laboratories had coverage exceeding this recommended level.
There are 3 important caveats regarding coverage depth. First, laboratories can variably define coverage. As an example, coverage depth can refer to the total number of reads covering a site or the total number of unique reads covering a site (if hybrid capture or molecular bar coding is used) with or without the application of other metrics related to base quality, mapping quality, or other quality filters. The proficiency testing survey did not define coverage or inquire as to laboratories' definition of coverage, so the provided coverage depths may not be equivalent. A second and related consideration is that the proficiency testing survey did not inquire whether laboratories incorporate library complexity, a measure of the number of unique fragments present in a library, when reporting coverage depth.17,18 Given that most of the laboratories in this survey used amplification-based methods without molecular bar codes, most laboratories were unable to directly assess library complexity. The minority of laboratories that used hybridization-based methods or molecular bar coding could have incorporated this quality metric, enabling the reporting of independent coverage depth, which is a more robust measure of assay performance. The third consideration is that laboratories were only asked to report coverage depth for sites at which they identified variants. Consequently, we do not know whether low coverage depth could have contributed to the false-negative results.
When considering the implications and generalizability of these data, there are several limitations. First, the laboratories were provided with engineered nucleic acid specimens (ie, linearized plasmids diluted into genomic DNA). These specimens do not control for many important preanalytical factors relevant to standard clinical samples, such as tissue processing and fixation, selection of a tissue section with optimal neoplastic cellularity, tumor enrichment through microdissection, or nucleic acid extraction methodology.19 Recent work by Sims and colleagues20 suggests that linearized plasmids diluted in genomic DNA isolated from formalin-fixed, paraffin-embedded cell lines performed similarly to endogenous variants contained within genomic DNA derived from formalin-fixed, paraffin-embedded cell lines. One difference is that the diluent genomic DNA used in this study was derived from cells that had not been formalin fixed and paraffin embedded. Consequently, these specimens do not exactly recapitulate the tissue specimens routinely processed by clinical laboratories. However, the need to provide homogeneous, well-characterized materials to more than 100 laboratories necessitates some compromise to facilitate these large interlaboratory comparisons of analytic performance; previous experience with bona fide clinical samples showed variation, due to intratumor heterogeneity, between the sections sent to different participating laboratories. Second, laboratories are asked to return results using a standardized reporting form with structured data that enables high-throughput grading but is a deviation from their normal reporting procedure. This may introduce transcription and other manual errors that would be less likely with the use of a laboratory information system and other quality control systems during actual clinical testing. Third, the lowest VAF tested in this survey was approximately 15%. The vast majority of laboratories report lower limits of detection between 5% and 10% VAF, and the current study does not evaluate the performance of NGS-based molecular oncology assays less than 15%. This is important because it becomes more challenging to detect somatic variants as allele fraction decreases.18,21,22 Fourth, the current survey only included somatic SNVs. Insertions, deletions, homopolymer length changes, copy number changes, and structural variants are also commonly detected by NGS-based molecular oncology assays, and these variant types are generally more challenging to detect than SNVs.
In the current manuscript, we present the results of a large, interlaboratory study about the performance of clinical NGS-based oncology assays used for the detection of somatic variants in solid tumors in 111 participating laboratories. Laboratories demonstrated an overall accuracy of greater than 98% with high specificity when examining 10 clinically relevant somatic SNVs with a VAF of 15% or greater. These initial data suggest excellent performance, but further work is needed to evaluate the performance of lower VAFs and additional variant types, such as insertions, deletions, homopolymer track length changes, copy number variants, and structural variants. Insertions, deletions, delins, and more complex variants are being incorporated into future CAP proficiency testing surveys and other materials. Likewise, similar efforts using standardized reference samples will be needed to evaluate the performance of other NGS-based oncology assays for new and emerging applications, such as cell-free, circulating tumor DNA assays. Recently introduced proficiency testing surveys that began in 2018 will allow the assessment of laboratory performance for detection of cell-free, circulating tumor DNA and for detection for recurring somatic RNA fusions using sequencing-based technologies. These and related efforts should allow for a broader and deeper understanding of the performance of clinical NGS-based oncology assays.
Dr Iafrate has equity in ArcherDx. The other authors have no relevant financial interest in the products or companies described in this article.
Mr Smail was supported by a National Institutes of Health BD2K Training Grant (Biomedical Data Science Graduate Training at Stanford), PI: Russ Altman; grant No. 1 T32 LM012409-01.
The identification of specific products or scientific instrumentation is considered an integral part of the scientific endeavor and does not constitute endorsement or implied endorsement on the part of the author, Department of Defense, or any component agency. The views expressed in this article are those of the authors and do not reflect the official policy of the Department of Army/Navy/Air Force, Department of Defense, or US government.