The Ventana programmed death ligand-1 (PD-L1) SP142 immunohistochemical assay (IHC) is approved by the US Food and Drug Administration as the companion diagnostic assay to identify patients with locally advanced or metastatic triple-negative breast cancer for immunotherapy with atezolizumab, a monoclonal antibody targeting PD-L1.
To determine interobserver variability in PD-L1 SP142 IHC interpretation in invasive breast carcinoma.
The pathology database was interrogated for all patients diagnosed with primary invasive, locally recurrent, or metastatic breast carcinoma on which PD-L1 SP142 IHC was performed from November 2018 to June 2019 at our institution. A subset of cases was selected using a computerized random-number generator. PD-L1 IHC was evaluated in stromal tumor-infiltrating immune cells using the IMpassion130 trial criteria, with positive cases defined as immunoreactivity in immune cells in 1% or more of the tumor area. IHC was interpreted on whole slide images by staff pathologists with breast pathology expertise. Interobserver variability was calculated using unweighted κ.
A total of 79 cases were assessed by 8 pathologists. Interobserver agreement was substantial (κ = 0.727). There was complete agreement among all 8 pathologists in 62% (49 of 79) of cases, 7 pathologists or more in 84% (66 of 79) of cases, and 6 pathologists or more in 92% (73 of 79) of cases. In 4% (3 of 79) of cases, all of which were small biopsies, pathologists' interpretations were evenly split between scores of positive and negative.
The findings show substantial agreement in PD-L1 SP142 IHC assessment of breast carcinoma cases among 8 pathologists at a single institution. Further study is warranted to define the basis for discrepant results.
Triple-negative breast carcinomas (TNBCs) lack expression of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (HER2) and are associated with aggressive clinical behavior. Median overall survival for patients with advanced TNBC on current first-line therapy remains 18 months or less.1 The IMpassion130 trial included patients with metastatic TNBC and assessed programmed death ligand-1 (PD-L1) expression in stromal tumor-infiltrating immune cells using the Ventana SP142 assay (Ventana Medical Systems, Tucson, Arizona).2 The study found that in patients with tumors showing staining of 1% or more of PD-L1–positive stromal tumor-infiltrating immune cells, a combination of atezolizumab, a PD-L1 inhibitor, and chemotherapy prolonged progression-free survival when compared with chemotherapy alone.2 The study findings resulted in US Food and Drug Administration approval of atezolizumab for patients with PD-L1–positive locally advanced or metastatic triple-negative breast cancer and approval of the Ventana SP142 assay as the companion diagnostic test for therapy selection. At our institution, patients with hormone receptor–positive, HER2-negative breast carcinomas may be tested with PD-L1 immunohistochemistry (IHC) to evaluate for either trial eligibility or compassionate use of the therapy per treating physicians' requests. Given the therapeutic implications based on IHC interpretation, the purpose of this study was to assess the interobserver variability of PD-L1 SP142 IHC interpretation by pathologists with breast pathology expertise at a single, large academic tertiary-care cancer center.
MATERIALS AND METHODS
Following approval from the institutional review board, all patients whose primary, locally recurrent or metastatic breast carcinoma were tested with PD-L1 SP142 IHC at the request of the treating physician were identified through a retrospective search of our pathology database between November 1, 2018 to June 30, 2019. Clinical and pathologic data in all study cases were retrieved and reviewed. Cases were randomly selected using a computerized random number generator. Data regarding specimen type of either biopsy or resection were recorded. Biopsy specimens included core-needle biopsies, punch biopsies, and shave biopsies. Resection specimens encompassed wide local excisions, mastectomies, lung wedge resections, and metastatic tumor specimens from craniotomies. The reported estrogen receptor, progesterone receptor, and HER2 results of each case were recorded. Quantification of stromal tumor-infiltrating immune cells was assessed based on the recommendations by the International Tumor-Infiltrating Lymphocytes Working Group3 by a single pathologist, blinded to the PD-L1 status. Cases were stratified by percentage of stromal tumor-infiltrating immune cells, as follows: less than or equal to 10% as low, 11% to 59% as intermediate, and greater than or equal to 60% as high.4
PD-L1 SP142 Immunohistochemistry
PD-L1 SP142 IHC was conducted using 4-μm thick full sections from formalin-fixed, paraffin-embedded tissue blocks. IHC staining was performed on a Benchmark Ultra System (Ventana Medical Systems) with antibody detection using the OptiView DAB IHC Detection Kit (Ventana Medical Systems), according to the manufacturer's manual.5 PD-L1 IHC interpretation at our institution was rendered and reported prospectively by consensus agreement of surgical pathologists with expertise in breast pathology. The reported PD-L1 status of study cases was recorded.
Digital Slide Review and Scoring
Whole slide scanning of PD-L1 IHC slides and corresponding hematoxylin-eosin–stained slides was performed using the Leica Aperio AT2 scanners (Leica Biosystems, Buffalo Grove, Illinois) at ×20 equivalent magnification (0.5 μm/pixel). De-identified whole slide images (WSI) were uploaded and distributed using a web-based digital pathology platform, PathPresenter (https://pathpresenter.net/) to 8 staff surgical pathologists with expertise in breast pathology who were blinded to the clinical history, pathologic data, and consensus PD-L1 IHC interpretation and following a washout period of at least 6 weeks from clinical IHC interpretation. All study pathologists had participated in the manufacturer's web-based training module for PD-L1 IHC interpretation. PD-L1 IHC was assessed independently by pathologists using the IMpassion130 trial criteria,2 with PD-L1–positive cases showing PD-L1 expression on stromal tumor-infiltrating immune cells occupying greater than or equal to 1% of the tumor area.
Interobserver variability was calculated using unweighted Cohen's κ, whereby a value of 0.01 to 0.20 indicated slight agreement, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 0.99 near-perfect. Differences between groups in categorical variables were calculated with the χ2 test and in continuous variables by using the Student t test. Statistical significance was established at P < .05. Sensitivity was calculated as the number of true-positive interpretations divided by the sum of the true-positive and false-negative interpretations, using the consensus score as the standard for interpretation. Specificity was calculated as the number of true-negative interpretations divided by the sum of the true-negative and false-positive interpretations, using the consensus score as the standard for interpretation.
During the study period, 127 breast carcinoma cases were tested with PD-L1 IHC. Following random selection, the study cohort included 79 samples from 76 patients. Of 76 study patients, 75 (98.7%) were women. Details of site of sample tested and PD-L1 status are shown in Table 1. Thirty-nine percent (31 of 79) of study samples were positive for PD-L1 IHC by consensus interpretation, and 61% (48 of 79) were PD-L1 negative. Seventy-seven percent (61 of 79) of the study samples were biopsy specimens, and 23% (18 of 79) were resection specimens. Forty-six percent (36 of 79) of the study samples were classified as low stromal tumor-infiltrating immune cells, 42% (33 of 79) were intermediate, and 13% (10 of 79) were high. Median tumor area percentage of stromal tumor-infiltrating immune cells for TNBC samples was 15% (mean, 22%; range, 0%–80%), and median percentage of stromal tumor-infiltrating immune cells for hormone receptor–positive, HER2-negative cases was 15% (mean, 20%; range, 0%–60%); there was no statistically significant difference in percentage of tumor-infiltrating immune cells between the 2 groups (P = .70). Study samples included 29.1% (23 of 79 cases) primary tumor samples, 17.7% (14 of 79 cases) locally recurrent samples, and 53.2% (42 of 79 cases) metastatic samples. Two study patients had PD-L1 IHC performed on multiple sites of metastases, and 1 patient had PD-L1 testing on 2 primary biopsies of the same lesion. Of 23 primary tumor samples, 13 (56.5%) were scored as positive for PD-L1 IHC by consensus assessment, and 10 (43.5%) were negative for PD-L1. Of the 14 locally recurrent samples, 10 (71.4%) were PD-L1 positive, and 4 (28.6%) were PD-L1 negative. Of the 42 metastatic samples, 8 (19.0%) were PD-L1 positive and 34 (81.0%) were PD-L1 negative.
Summary of study sample tumor histology, specimen type, site type, and receptor status by PD-L1 status is displayed in Table 2. The majority of cases showed invasive carcinoma of no special type on histology (n = 59 of 79 cases; 74.7%). Sixty-eight percent (54 of 79) of cases were TNBCs, and 32% (25 of 79) of cases were hormone receptor–positive, HER2 negative. PD-L1 IHC was positive by consensus grading in 42.6% (23 of 54 samples) of TNBC samples and in 32.0% (8 of 17 samples) of hormone receptor–positive, HER2-negative cases; the difference between the 2 groups was not statistically significant (P = .37). Among the TNBC samples, 79.6% (43 of 54 specimens) were biopsies and 20.4% (11 of 54 specimens) were resections. Among the hormone receptor–positive, HER2-negative samples, 72.0% (18 of 25 specimens) were biopsies and 28.0% (7 of 25 specimens) were resections. There was no significant difference in specimen type between TNBC cases and hormone receptor–positive, HER2-negative cases (P = .45). Including both TNBC cases and hormone receptor–positive, HER2-negative cases, PD-L1 IHC was positive by consensus grading in 31.1% (19 of 61 samples) of biopsies and in 66.7% (12 of 18 samples) of resections; the difference in PD-L1 positivity between biopsy and resection specimens was statistically significant (P = .007). There were no paired biopsy and resection samples in our study cohort.
The interobserver concordance among study pathologists for PD-L1 interpretation on WSI is illustrated in Figure 1, A and B. Overall, there was substantial agreement amongst the 8 study pathologists (κ = 0.727). There was complete agreement among all 8 pathologists in 49 (62.0%) of 79 cases, among at least 7 pathologists in 66 of 79 (83.5%) cases and among at least 6 pathologists in 73 of 79 (92.4%) cases. Interobserver agreement remained substantial across both biopsy and resection specimens (κ = 0.707 and 0.720, respectively). Likewise, concordance was substantial in assessing tumors in the primary, locally recurrent, and metastatic setting (0.691, 0.675, and 0.659, correspondingly). Interobserver agreement for PD-L1 interpretation in TNBCs was substantial (κ = 0.663) and near perfect in hormone-receptor positive, HER2-negative breast carcinomas (κ = 0.868). Interobserver agreement was substantial among cases with low and intermediate stromal tumor-infiltrating immune cells (κ = 0.702 and 0.704, respectively) and moderate in cases with high stromal tumor-infiltrating immune cells (κ = 0.597). Comparing overall study pathologists' interpretations of PD-L1 IHC to the consensus score, the sensitivity and specificity were 90.2% and 90.7%, respectively.
Three cases in the cohort showed split interpretation (ie, 50% positive for PD-L1, 50% negative) by study pathologists (Figure 2, A through F). All 3 cases were biopsies, including 2 core-needle biopsies of primary breast carcinoma and 1 colonic biopsy of metastatic breast carcinoma. The primary breast carcinoma samples displayed low stromal tumor-infiltrating immune cells, while the colonic biopsy of metastatic carcinoma showed high stromal tumor-infiltrating immune cells. The consensus score in all 3 cases was negative for PD-L1. In each of these 3 cases, the IHC-staining pattern of PD-L1 was low.
In this single-institution study on interobserver variation of PD-L1 SP142 immunohistochemistry using WSI, the findings show substantial agreement amongst the 8 study pathologists. This degree of agreement remained constant across both biopsy and resection specimens. In our cohort, there was a significant difference between biopsy and resections in PD-L1 expression, with a significantly higher proportion of resection specimens displaying PD-L1 IHC expression by consensus grading. Of note, our study cohort showed similar prevalence of PD-L1 positivity as the IMpassion130 trial cohort,2 although our cases included those with hormone-receptor expression. Concordance among study pathologists also remained substantial in scoring across all tumor sites, in the primary, recurrent, and metastatic setting. We found that concordance among study pathologists increased in interpreting PD-L1 results in hormone-receptor positive cases compared with those in TNBC cases. This difference is likely owing to fewer cases in the hormone receptor–positive, HER2-negative subset compared with the TNBC cohort. Other contributing factors may be less mean stromal tumor-infiltrating immune cells and higher proportion of PD-L1–negative cases, leading to less variation in interpretation. Conversely, the cases classified as high stromal tumor-infiltrating immune cells were associated with lower concordance, compared with those with low or intermediate amounts of infiltrate; however, the number of cases categorized as high were notably fewer than the latter 2 groups. Cases, wherein study pathologists disagreed with the consensus score or showed split interpretations, were limited to small biopsy samples with minimal immunoreactive stromal tumor-infiltrating immune cells. Individual threshold for calculating immunoreactivity in immune cells of at least 1% of small tumor areas attributed to these discordant results. In these cases, it was of the consensus opinion that the IHC staining was too focal to constitute 1% of the tumor area.
Few studies have examined interobserver variability in PD-L1 SP142 immunohistochemistry assessment in breast carcinoma. A recently published multi-institutional study of PD-L1 IHC interpretation by Reisenbichler et al6 examined interobserver variability using SP142 and SP263 assays on primary TNBC cases using WSI. Pathologists participating in the study included both pathologists with and without expertise in breast pathology. The study findings showed lower interobserver agreement among pathologists in interpretation of PD-L1 SP142. The authors attributed this variability due to issues distinguishing between tumor-cell or immune-cell staining and tissue artifacts, including background staining and tissue folding. Downes et al7 reported a single-institution, tissue-microarray analysis of inter and intraobserver PD-L1 IHC interpretation using SP142, SP263 22C3, and E1L3N assays on head and neck, urothelial, and breast carcinomas. The group found near-perfect concordance among 3 pathologists in scoring PD-L1 SP142 staining of immune cells in 30 breast carcinoma cases (κ = 0.956). Detailed information regarding tumor histology, hormone-receptor status, HER2 status, and sample site are not, however, further described therein. Furthermore, tissue-microarray samples by their very design are quite small, and such studies limit translation to real-world applications, particularly when dealing with issues of accurate interpretation of an immunohistochemical assay that requires assessment over a large tumor area. While the authors did attempt to cover a greater tumor area by using triplicate cores, the smaller tissue area may have contributed to the near-perfect interobserver concordance using tissue-microarray samples.
The limitations of our study include, first, a relatively small sample size (n = 79 cases) that was largely composed of biopsy specimens and secondly the use of WSI. Regarding the latter, prior studies have demonstrated a high degree of diagnostic equivalency between traditional glass slides and digital WSI.8–12 Hanna et al8 found 99.3% diagnostic concordance among 8 pathologists in their study of 199 paired glass and digital cases. Data regarding interpretation on immunohistochemical stains using WSI in breast pathology are limited. Prior publications in the use of WSI for immunohistochemistry interpretation remain largely focused on the evaluation of estrogen receptor, progesterone receptor, and HER2,13–15 as well as the proliferation marker Ki-67.16 Tsao et al17 reported high concordance for PD-L1 SP142 staining interpretation on tumor cells in lung carcinoma samples between traditional glass slides and digital WSI. It should be noted, however, that the study found interobserver agreement on assessment of PD-L1 immune-cell staining was poor on both digital WSI and glass slides. The group noted that cases used for assay interpretation training consisted of only resection samples, while cases used in the study included biopsy samples. The authors surmise this difference could have conferred poor concordance, as study pathologists had not had prior exposure to assessing the spatial distribution of immune cells on such small samples. While our data do not demonstrate an appreciably significant difference in interobserver concordance in IHC interpretation between biopsy and resection specimens, the cases with split interpretation were small biopsy samples. Further studies are warranted to determine if particular sites are associated with discordant results. In addition, the pathologists in this study were not exposed to a training set and relied on their prior experiences, including the manufacturer's web-based training module and participation in IHC interpretation at consensus conference. Therefore, in clinical practice we find that PD-L1 IHC scoring on small samples should be performed in a group setting by pathologists who have undergone training in such interpretation.
Herein, we have shown substantial concordance of PD-L1 SP142 immunohistochemistry interpretation by 8 pathologists at a single institution using WSI. Similar studies have demonstrated higher degree of interobserver concordance in the single-institution setting when compared with multi-institutional analyses. Potential contributing factors to these differences may be routine assessment of the IHC by consensus, leading to standardization of individual thresholds, and scoring of IHC by pathologists specializing in breast pathology. Discrepancies in interpretation arise in core-needle biopsy samples, where quantification of immune-cell staining percentage in small tumor area can be challenging. Interobserver concordance was lower in cases with high stromal tumor-infiltrating immune cells, compared to cases with low or intermediate immune cells. In such cases, consensus opinion for interpretation should be used.
This work was supported in part by a National Institutes of Health/National Cancer Institute Cancer Center Support Grant (P30CA008748).
Hanna is an advisor (non-compensated) with PathPresenter. Reis-Filho is an ad hoc member of the scientific advisory board of Roche Tissue Diagnostics/Ventana Medical Systems. The other authors have no relevant financial interest in the products or companies described in this article.
The abstract for this paper was presented at the 109th annual meeting of the United States and Canadian Academy of Pathology; March 3, 2020; Los Angeles, California.