The use of immunohistochemistry for the determination of pulmonary carcinoma biomarkers is a well-established and powerful technique. Immunohistochemisty is readily available in pathology laboratories, is relatively easy to perform and assess, can provide clinically meaningful results very quickly, and is relatively inexpensive. Pulmonary predictive biomarkers provide results essential for timely and accurate therapeutic decision making; for patients with metastatic non–small cell lung cancer, predictive immunohistochemistry includes ALK and programmed death ligand-1 (PD-L1) (ROS1, EGFR in Europe) testing. Handling along proper methodologic lines is needed to ensure patients receive the most accurate and representative test outcomes.
Pulmonary predictive biomarkers are defined as markers for which the results are essential for therapeutic decision making. In the setting of metastatic non–small cell lung cancer (NSCLC), predictive immunohistochemistry (IHC) includes ALK and programmed death ligand-1 (PD-L1) (ROS1, EGFR in Europe). In the cases of ALK and ROS1 IHC, positive staining serves either as a screening test or as a surrogate test for fusion in the indicated gene, resulting in aberrant expression of the gene products; and in the case of PD-L1, positive staining indicates the presence of an immunomodulatory molecule that is affected by anti–PD-L1 or anti–programmed death receptor-1 (PD-1, CD279) therapy. The resulting treatment consequences for patients are as follows: for ALK+ NSCLC, an ALK inhibitor; for ROS1+ NSCLC, a ROS1 inhibitor; and for PD-L1+ NSCLC, an inhibitor of PD-1 or PD-L1. Pulmonary predictive biomarker IHC is not the sole therapeutic decision maker; an individual patient's treatment also depends on other factors, such as tumor stage, performance score, access to targeted agents, and patient and clinician preference and experience. For a patient eligible for tyrosine kinase inhibitor and immunotherapy, the first treatment of choice may be the tyrosine kinase inhibitor. It is important to differentiate the therapeutic decision-making role of pulmonary predictive biomarker IHC from IHC performed for diagnostic purposes, which, in contrast, serves a diagnostically supportive or decisive role. The latter can be critical in distinguishing NSCLC subtypes, which itself has therapeutic implications.1,2
IMMUNOHISTOCHEMISTRY
Many IHC antibodies are dependent upon the fixation procedure for success: some work on frozen sections, but not on formalin-fixed paraffin-embedded (FFPE) samples, and vice versa. This article restricts discussion of IHC pulmonary biomarkers to FFPE material and the IHC applications of ALK rearrangement, ROS1 rearrangement, and BRAF V600E mutation, as well as PD-L1 expression in tumor cells and/or adjacent inflammatory cells. Automated IHC protocols for ALK and PD-L1 detection were recently approved by the US Food and Drug Administration (FDA) as companion diagnostics for targeted therapies. Issues related to these companion diagnostics have been recently reviewed.3,4 Immunohistochemistry is a rapid and relatively inexpensive method, preferred by most pathologists, primarily because it allows the simultaneous evaluation of a staining pattern and tissue architecture and tumor cells. Potentially, IHC can be interpreted with fewer malignant cells than are needed for successful interpretation by fluorescence in situ hybridization (FISH) or other molecular technologies. Immunohistochemistry can be performed efficaciously on a variety of different tumor specimens. FFPE tissue blocks, fluid, and fine-needle aspiration cytology cell blocks or smears can be tested successfully as long as there are at least a few clusters of viable tumor cells present in the specimen. However, there are limitations; currently, scarcity of predictive trial data regarding the application of IHC to cytology prohibits any definitive determination regarding the value of cytology specimens for pulmonary predictive biomarker IHC. This may be due to trial designers' lack of familiarity with the advantages and limitations of cytology, and their preference of biopsy material for additional translational research in the trial.5 Nevertheless, excellent studies exist on the application of ALK on cytology.5,6 Also, molecular genetic aberrations with low prevalence, such as ALK- and ROS1-rearranged NSCLC, call for economic screening methods, and a validated and robust IHC assay may provide a cost-effective platform.
PREANALYTIC VARIABLES
Preanalytic variables start at the moment the biopsy or resection specimen is removed from the patient. These variables include delay in fixation, inappropriate fixation time, and issues regarding paraffin embedding.7 Cold ischemia time, the time between specimen removal out of patient and initiation of fixation, should be less than 1 hour. Specimens should immediately be fixed in an adequate amount (10 times the volume of the specimen) of neutral buffered 10% formalin and embedded in melted paraffin.7,8 In cytology, alcohol fixation is sometimes used.5 However, when formalin-optimized IHC protocols are used, the accuracy of IHC may be reduced by causing loss or decrease of immunogenicity. Immunohistochemistry protocol adjustments restore immunogenicity for most but not all of the antigens.9 Formalin fixation times of less than 6 hours are not recommended because conventional staining as well as IHC can be adversely affected, for example, for ALK.10 For practical purposes, a fixation interval of 6 to 24 hours for biopsy specimens, and 24 to 72 hours for resection specimens, is recommended.11,12
Because formalin diffusion time is approximately 1 to 3 mm/h, at gross dissection tumor should be cut in slices of 3- to 4-mm thickness. FFPE samples are considered stable, preserved, and protected from oxidative influences. FFPE tissue specimens should be cut into 3- to 5-μm sections for placement on glass slides. If unstained sections are not used within a few days, they should be stored in a closed container at 2°C to 8°C to preserve antigenicity from photooxidation and drying, which could cause false-negative results.13 Decalcification can negatively affect antigenicity, especially when highly acidic agents such as hydrochloric acid and nitric acid are used. Therefore, weaker acids such as ethylenediaminetetraacetic acid (EDTA), an effective decalcifying agent, have become more popular, as they do not appear to interfere with IHC14 ; in addition, in many small sample bone biopsies, decalcification can be completely avoided.
ANALYTIC VARIABLES
For the analytic procedure, several issues need to be controlled and optimized, including the development of adequate antibodies, antigen retrieval, type and concentration of the antibody, incubation time, incubation temperature, and signal enhancement (eg, with a tyramide cascade and intercalation of an antibody-enhanced polymer). Antigen preservation for IHC is epitope dependent, and some epitopes may not be hampered by fixation times—as long as 120 hours. The cross-links due to formalin fixation may hamper the epitope recognition by the primary antibody. To this end different epitope retrieval buffers are tested and the optimal buffer chosen for the standard staining procedure.
POSTANALYTIC VARIABLES
The postanalytic phase starts with the glass slide's microscopic evaluation. Staining intensity is dependent on the enhancement system used. The relationship between the enhancement system and signal intensity was shown in 2003 by Prinsen et al15 and is graphically illustrated in Figure 1. The subjectivity of staining intensity assessment can be reduced with the use of successive microscope objective lenses with inherent related spatial resolution as a physical aid in establishing the intensity level. This approach, first applied to HER2 testing, may lead to more uniform intensity scoring.16 Strong staining (+++) is clearly visible with use of an ×2 or ×4 microscope objective lens; moderate staining (++) requires an ×10 or ×20 objective lens to be clearly seen; and weak staining (+) can be seen only with an ×40 objective lens.17 The histo-score (H-score) is derived by multiplying the percentage of tumor cells that stain positively by different intensities (0, 1 = +, 2 = ++, or 3 = +++), giving a range of 0 to 300. This approach takes greater account of the heterogeneity of the staining. Interestingly, with additional tyramide enhancement, the difference in epitope concentration between a negative and a strong positive staining intensity is reduced to the extent that scoring is restricted to either negative or positive (Figure 1). Strong signal enhancement may have important consequences. Of course, proper titration of primary antibody and enhancement components is a prerequisite. A strong enhancement system may in some cases reveal a positive test result, while an approach using a less strong signal enhancement may be negative. This implies that once a test is clinically validated, only tests with equal test performance can be used. Otherwise, the generalization of a phase III trial toward a presumed predicted chance of a response to a given drug will not hold.
Standardization of both positive and negative controls is needed in predictive IHC, since the use of polymer enhancement negative controls are no longer needed for diagnostic IHC in routine practice. In general, the use of IHC-negative tissue controls, irrespective of type, although well established, is not globally standardized. As such, the relevance and applicability of negative tissue controls continues to challenge both pathologists and laboratory budgets. Negative reagent controls (based on a separate slide, where the primary antibody is replaced with another “irrelevant” antibody of same type and concentration, while all other protocol parameters are identical) are virtually useless, particularly after implementation of biotin-free visualization systems. Despite the clear theoretical notion that appropriate tissue controls serve to demonstrate the sensitivity and specificity of the IHC test, it remains unclear which types of positive and negative controls are applicable or useful in day-to-day clinical practice.18,19 Recently, the term immunohistochemistry critical assay performance controls (iCAPCs), was introduced.19 iCAPCs are external positive controls. iCAPCs monitor the overall system performance, but like any other external positive control, they do not fully inform the patient's results because final results also significantly depend on preanalytic variables unique to the patient's specimen. The optimal IHC positive control has an intensity performance at or above the lower limit of detection and is defined by an observed positive reaction (staining) in a tissue/cellular element that is known to express low levels of the evaluated marker.19
The use of an internal positive control tissue is generally perceived as adequate and an external control might not be needed. However, it is useful to maintain an external positive control for most stains because not every section will contain an internal positive control, and internal positive controls, when present, typically display a relatively high epitope concentration. As such, they rarely provide information as to whether the appropriate sensitivity of the protocol was achieved. This gap is filled by the external positive control with low epitope concentration. External positive controls are increasingly being mounted on the same slide as the test section as a means of improving control of test IHC staining.
As with all IHC stains, artifacts may occur, including nonspecific background changes (in relation to improper drying of the slides, improper deparaffinization, or incomplete rinsing); and edge artifacts (due to drying of the tissue before fixation or during the staining procedure); and crush artifacts (due to necrosis or poor fixation).
Validation
For diagnostic testing according to the College of American Pathologists guidelines, a minimum of 10 samples is required for technical validation.20 In contrast to ALK and ROS1, where the underlying oncogenetic mechanism is gene rearrangement, PD-L1 is a protein with no oncogenetic change. Instead PD-L1 is part of the physiologic immune defense system. The validation of ALK and ROS1 is performed by comparison with tests demonstrating the associated rearrangement (such as FISH or next-generation sequencing–based testing). A practical difficulty is that it takes a long time to acquire 10 positive samples in low prevalent disease (eg, ROS1) for initial setup of the IHC. Thus, careful testing with fewer samples may be a way forward. For PD-L1 the approach is different and is discussed below.
ALK
Genomic rearrangements involving ALK occur in 1% to 5% of NSCLCs. (In 2013 crizotinib became second-line treatment for ALK+ NSCLC21 and since 2014 it has generally been used as the first-line treatment option.22) This therapeutic option was based on ALK FISH assay subsequently approved by the FDA. More recently, IHC has been accepted as a screening method for ALK+ NSCLC.
ALK Preanalytic Variables
If preanalytic conditions are appropriately controlled, most tumor cells stain with IHC, paralleling the homogeneous distribution of the ALK gene rearrangement in FISH analysis (Table 1).
Initially, several different techniques were used for detecting ALK IHC.3 In these studies, the type or source of antibodies, the process of antigen retrieval and antibody detection, and the amplification techniques varied substantially. Eventually, head-to-head comparison of different antibodies showed that D5F3 (Cell Signaling Technology, Danvers, Massachusetts) and 5A4 (Novocastra, Leica Biosystems, Buffalo Grove, Illinois) with the ADVANCE system (Dako, Santa Clara, California) were equally sensitive.23 D5F3 is part of a commercial companion diagnostic ALK IHC assay kit, whereas 5A4 is commonly used as a laboratory-developed test (LDT).24
The ALK1 antibody (Dako) is less accurate and should not be used. A novel monoclonal anti-ALK antibody, 1A4 (Origene, Rockville, Maryland), was compared with D5F3 and described as a promising candidate for screening lung tumors for the presence of ALK rearrangements25 ; however, the 1A4 antibody examined in an independent cohort showed comparable sensitivity to D5F3, but lower specificity (70%).26 Therefore, tumors that are positive for ALK on testing with 1A4 IHC will require an additional predictive technique before treatment advice can be rendered.
The D5F3-based immunoassay (Ventana ALK [D5F3] CDx Assay, Tucson, Arizona) was developed and standardized on the automated immunostaining platform BenchMark XT combined with the OptiView Amplification Kit. Interestingly, with tyramide enhancement the difference in epitope concentration between a negative and a strong positive staining intensity is reduced to the extent that scoring is either negative or positive. This tyramide enhancement works similarly for both of the currently used antibodies, D5F3 and 5A4.27,28 The reproducibility of ALK IHC results among different laboratories and pathologists is high for the 2 validated protocols.24,28
ALK Postanalytic Variables
Because ALK protein is not expressed in normal, mature lung tissue, strong IHC amplification systems can generally be used as a marker for tumor ALK positivity; however, pathologists should be familiar with various artifacts that may lead to false-positive staining. Positive ALK IHC typically shows strong granular cytoplasmic staining. Cytoplasmic staining may occur in alveolar macrophages, cells of neural origin (nerve and ganglion cells including within tumors), glandular epithelium, extracellular mucin, and areas of necrotic tumor.29 Background staining is rarely observed within normal lung parenchyma. False-positive cytoplasmic staining in NSCLC has been noted with the tyramide amplification system using D5F3.28 This stain may show weaker than usual positivity in lung cancers that are ALK+ on IHC.
Histologically, mucin-containing cells such as signet ring cells require careful interpretation of ALK immunoreactivity, which is particularly relevant given frequent signet ring cell morphology of ALK-rearranged adenocarcinomas. A thin membranous positive pattern on ALK IHC may be masked by an intracellular mucin vacuole, and the positive pattern may then be difficult to detect in the signet ring cells.30–32 This finding is not specific to cancer cells and has also been seen in some nontumor cells, such as reactive type II pneumocytes. Thus, it is important to recognize background staining in the stained specimens. In addition, some neuroendocrine carcinomas have also been associated with positive reactions in the absence of ALK rearrangements.33,34 Merkel cell tumors of the skin may also be ALK+ on IHC but have no ALK rearrangement detected by FISH or next-generation sequencing. The paranuclear dotlike pattern reported as typical of the KIF5B-ALK rearrangement may require further confirmation.35
The staining may appear heterogeneous in some tumors, particularly in surgical specimens. This heterogeneity is likely related to delay in the fixation of more deeply situated tissues, due to slow fixative diffusion and is not indicative of a different histologic pattern. The sensitivity of ALK protein to fixation delay is typically not an issue for biopsy specimens, but may be an issue when using tissue microarrays for ALK screening of archived specimens.
ALK Validation
Initially in the United States when the FDA approved the ALK FISH assay, patients with positive results on ALK FISH were eligible for treatment with an ALK inhibitor. The European Medicine Agency (EMA) approved patients with ALK+ metastatic NSCLC as eligible for treatment with an ALK inhibitor, thus not specified to FISH or IHC. More recently, the FDA has approved the IHC CDx Assay, and patients with positive results on ALK D5F3 IHC are now also eligible for treatment with an ALK inhibitor.
In several studies, D5F3 IHC was compared with ALK FISH and showed high concordance rates.36–46 The sensitivity was shown to range from 81% to 100%, and specificity from 82% to 100%. The interobserver reproducibility for D5F3 in a selected series of adenocarcinomas is high.47
Several studies48–56 compared 5A4 IHC with ALK FISH. The sensitivity ranged from 93% to 100% and specificity from 96% to 100%.
In a recent analysis of pooled data, the diagnostic operating characteristics in 12 studies (3754 NSCLC specimens) were analyzed, taking the different scoring systems into account.57 The IHC 3+ binary ALK+ category matched for both antibody procedures with ALK FISH+ cases, and the IHC− cases matched FISH− cases. The nearly 100% concordance in these IHC categories favors the use of IHC as a screening method to identify ALK+ NSCLC. However, for the lower-intensity staining in the 4-tiered IHC approach, tumors with 1+ and 2+ intensity need additional validation with ALK FISH testing.
Discordant NSCLC cases, with ALK protein expression (for D5F3/5A4) but no detectable ALK rearrangement on FISH, have been reported.* Explanations for this discordance include (1) false-negative interpretation of FISH results, especially for results that are close to the threshold of 15% and below 20%46,61,65; (2) amplification of the ALK gene (which has been associated with ALK protein expression in some but not all cases), possibly leading to 1+ or 2+ staining67,68; (3) false-positive interpretation of ALK IHC results; (4) complex rearrangement involving the ALK gene, reducing the visible distance of the 2 FISH probes, leading to false-negative FISH findings; (5) de novo alternative transcription initiation69 ; and (6) additional yet undetermined mechanism(s). Some of the ALK IHC+, FISH− cases respond to crizotinib.59,70
ALK IHC assays, already recommended by organizations in Europe, Japan, Asia, and the United States, are validated and standardized and are clinical tools for cost-effective screening for the presence of ALK rearrangement in NSCLC. To maintain the reliability of assays for detecting ALK positivity, laboratories should participate regularly in external quality assessment programs.
ROS1
The pathologic significance of ROS1 expression has recently been reviewed.34 Genomic rearrangements involving ROS1 occur in 1% to 2% of unselected NSCLCs.71–73 ROS1 fusion partners in lung cancer include FIG, CD74, SLC34A2, and SDC4.74–77 Crizotinib was recently approved by the FDA for patients with advanced ROS1+ NSCLC.78 ROS1 may be detected by a variety of techniques, including FISH, reverse transcription–polymerase chain reaction (RT-PCR), next-generation sequencing, and IHC. This discussion focuses on IHC. Predictive testing for ROS1 rearrangement started recently, and as such, less information is currently available for ROS1 than for ALK testing. ROS1 IHC may be an effective screening tool to detect ROS1+ NSCLC.
ROS1 Preanalytic Variables
It is unclear as to whether or not the protein stability is affected by preanalytic variables.4 Most studies on ROS1 IHC use the D4D6 rabbit monoclonal antibody (Cell Signaling Technology) applied at dilutions ranging from 1:50 to 1:1000 with various antigen retrieval methods, with use of different amplification and detection systems, in automated instruments or with manual testing.79 In contrast to ALK, where the ganglion cells of the appendix serve as an adequate external control, there is currently no good external benign tissue control for ROS1.3,4 Tumor specimens with known ROS1 rearrangement, or a cell block of the HCC78 cell line harboring the SLC34A2-ROS1 fusion gene, can serve as external positive controls.4,74
ROS1 Postanalytic Variables
Positive ROS1 IHC typically shows finely granular cytoplasmic staining; however, the staining pattern may depend on the function and subcellular location of the gene fusion partner.80 Globular ROS1 immunoreactivity has been described in tumor specimens with the CD74-ROS1 fusion, and membranous staining has been observed in tumors with the EZR-ROS1 fusion.79,80 Interestingly, ROS1 expression levels in ROS1-rearranged lung cancers and cell lines can vary from cell to cell, suggesting dynamic ROS1 protein expression despite homogeneous presence of ROS1 gene rearrangement. Detection of ROS1 protein expression in ROS1-rearranged adenocarcinomas with signet ring cells is challenging, since the cytoplasm may contain nonreactive mucin,80 similar to the situation with ALK IHC.4
Studies reporting on ROS1 immunohistochemistry are shown in Table 2. Frequently, with D4D6, weak (1+ to 2+) to strong (3+) staining intensity has been reported. In the cases with strong intensity (88% [49 of 56]), rearrangement was confirmed with ROS1 FISH, while in the 1+ to 2+ positive cases, rearrangement could only be demonstrated in 14% (31 of 228) of the cases. Therefore, current understanding suggests screening with IHC and subsequent confirmation of the positive IHC cases with FISH. ROS1 inhibitors should only be given in the cases that are both IHC and FISH positive. Little is understood regarding ROS1 FISH+ cases with negative ROS1 IHC.
Weak ROS1 expression is occasionally detectable in nonneoplastic hyperplastic type II pneumocytes and in alveolar macrophages.81 In bone metastases, there is strong granular cytoplasmic staining of osteoclast-type giant cells.81 In most cases, the expression in these cells is weak to moderate (1+/2+ in intensity).3,4
ROS1 Validation
As mentioned above, in cases with strong ROS1 staining with clone D4D6 IHC using a highly sensitive amplification kit, there is a high correlation with ROS1 FISH+. Although some discrepant cases have been reported,82 ROS1 testing by IHC is highly sensitive, but less specific, when compared with ALK IHC for detection of the corresponding gene rearrangement. Some authors3,4 suggest that IHC testing of specimens containing at least 20 tumor cells and application of an H-score cutoff of greater than 100 is highly concordant with ROS1 rearrangement by FISH or RT-PCR.
PD-L1
PD-L1 is a biomarker with some predictive value for immunotherapy, which is determined with different but not necessarily equal technologies.83 The pharmaceutical interaction is via PD-1 or PD-L1 inhibition, with 5 different companies having separate but associated proprietary anti–PD-L1 antibodies for use in IHC assays: nivolumab with 28-8 rabbit,84 pembrolizumab with 22C3 mouse,85 atezolizumab with SP142 rabbit, durvalumab with SP 263 rabbit,86 and avelumab with rabbit 73-10.87 The commercial companion PD-L1 test for pembrolizumab (Dako 22C3 pharmDx) and complementary diagnostic PD-L1 tests for nivolumab (Dako 28-8 pharmDx) and atezolizumab (Ventana PD-L1 SP142) are now FDA approved for use in NSCLC.
Pathology laboratories need at least 1 affordable, validated test, probably based on relatively high volume. It is not practical for pathology laboratories to perform 5 or more specific assays for 1 protein with the test used being dependent upon the drug that might possibly be prescribed in 1 facility.
In one study,88 comparison of the primary antibodies with immunohistochemical assessment did not reveal major differences, except for E1L3N, which is not included in one of the abovementioned tests.88 However, SP142 has lower proportion of positivity in another study (see below). Preferably, 1 PD-L1 IHC assay should be suitable for all pharmaceutical drugs that potentially could be prescribed, as that test essentially determines 1 PD-L1 protein within 1 NSCLC case. In addition, the use of multiple IHC cutoffs and proprietary PD-L1 IHC detection systems represents an immense barrier to interpretation of clinical trial biomarker data across trials. Laboratory feasibility issues are further complicated by the application of PD-L1 IHC staining approaches in other tumor types.
The staining procedures of the different commercially available assays are currently being compared to each other. One comparison based on technical validation with E1L3N and SP142 demonstrated prominent interassay variability or discordance.89 The methodology for clinical validation of laboratory-developed assays is discussed below.
Analytic Issues
For PD-L1, it is unknown whether fixation delay affects test results. Initially, it was advised that the tissue-handling procedure for PD-L1 be the same as for other diagnostic or predictive markers such as ALK.90 From the 22C3 and 28-8 pharmDx (Dako 22C3 pharmDx and Dako 28-8 pharmDx) interpretation manuals, a critical step for PD-L1 IHC is a short fixation time of less than 3 hours; other parameters such as prolonged fixation, paraffin embedding, dehydration, and use of unstained slides are not critical issues, as long as unstained slides are less than 6 months old. Specimen age for PD-L1 testing should be less than 3 years, as advanced age of tissue blocks in one PD-L1 study91 for the ATLANTIC phase II trial showed a similar percentage for tumor proportion score that was greater than 25% (around 30%), which dropped for specimens older than 3 years (14%).
The use of PD-L1 IHC on decalcified tissues should be interpreted with caution until further validation studies on PD-L1 IHC have become available. For LDTs, besides essential analytic validation, adequate clinical validation is of critical importance.
PD-L1 Interpretation
PD-L1 expression may be present on dendritic cells, macrophages, mast cells, T and B lymphocytes, endothelial cells, human placenta, and tumor cells.92,93 PD-L1 has 2 small hydrophilic regions for binding of IHC detection antibodies.94 The biological consequences of PD-L1 expression depend on cell membrane localization because it is presumed that PD-L1 is functional only when it ligates a counterreceptor.95
Before determining a specimen's PD-L1 staining result, it is important to examine the hematoxylin-eosin–stained section to assess preservation and staining quality. If any staining of the PD-L1 IHC external control slide is unsatisfactory, all related patient specimen results should be considered invalid. For PD-L1, an external positive control FFPE cell block of human tonsils, placenta, and/or cell lines with variable PD-L1 expression (www.histocyte.com) may be used to develop the staining conditions. Care should be taken to use only cell lines with high epitope concentration, as the optimal staining control has weak epitope concentration, which allows detection of minor deviations in the staining protocol, whereas those with high epitope concentrations will not reveal a difference and remain positive.
Examination of PD-L1− NSCLC tissue helps to evaluate potential background staining. Any background staining should be of less than 1+ staining intensity. If plasma membrane staining of malignant cells occurs in the negative control tissue, all patient specimen results should be considered invalid. In any case, an NSCLC without staining for PD-L1 in tumor cells may still be good staining, representing a valid result.
When examining a patient's slide, necrotic or degenerate malignant cells should be excluded from evaluation. A minimum of 100 viable tumor cells should be present in the PD-L1–stained patient specimen slide for determination of the percentage of stained cells. Immune cells, as positive internal controls (eg, such as infiltrating lymphocytes or macrophages), may also stain with PD-L1. Any background staining should be of less than 1+ staining intensity.
Definition of Positive PD-L1 Staining in NSCLC
Unfortunately, the definition of “PD-L1 positivity” is not the same for the 5 potential PD-L1 assays. In 4 commercial assays, PD-L1 staining is defined as complete circumferential or partial linear plasma membrane staining of tumor cells at any intensity. Cytoplasmic staining only in tumor cells is not considered positive for scoring purposes. In the assay using SP142, the presence of PD-L1+ immune cells is also considered in determining the PD-L1 staining positivity. This seems to hamper the attempt to establish a single PD-L1 IHC test that will be considered equivalent to all clinically validated tests. However, other antibodies also stain immune cells. Whether or not this will ultimately lead to the use of 1 or 2 tests for broader application is unclear at present.
PD-L1 Scoring
It is important to score viable tumor cells exhibiting complete circumferential or partial linear plasma membrane staining at any intensity, and to determine the percentage of stained viable tumor cells in the entire specimen. PD-L1 protein expression is defined as the percentage of tumor cells exhibiting positive membranous staining at any intensity.
As a general scoring procedure, scan for an overview at ×4 objective magnification of the entire specimen. Subsequently, evaluate all areas regardless of whether well or poorly stained. Then, examine and score, at ×10 to ×40 objective magnification, viable tumor cells exhibiting complete circumferential or partial linear plasma membrane staining at any intensity. Exclude immune cells (except for SP142), normal cells, and necrotic cells from scoring. The absence of weak staining in tumor cells should be confirmed with ×20 objective examination.
Because PD-L1 heterogeneity may occur, scoring may be facilitated by dividing the slide into areas with equal amounts of “neoplastic cells” at low magnification, with evaluation of each area for percentage of PD-L1 positivity. The percentage of positivity from each area is then added, and divided by the total number of areas.
The immune activation response in the tumor cells at the tumor-stroma interface may lead to preferential expression of PD-L1 in this region. Small tumor biopsies may, however, miss the pertinent tumor-immune interface. In one study 92% concordance between biopsy and resection was reported,96 while in another the concordance rate was much lower (52%) with underestimations found with the biopsies.97 However, the latter study was unclear with regard to reporting the preanalytic variation. The authors used SP142, a slightly less sensitive antibody, and evaluated immune cells. These aspects clearly need more study.
Because in practice more biopsy specimens than resection specimens are stained for PD-L1, an inaccuracy in PD-L1 testing due to sampling of heterogenous tumors is unavoidable. To put the situation into context, the number of patients who respond to PD-1/PD-L1 treatment, whose tumors have negative PD-L1 IHC staining, is approximately 9%.98,99 If this is due to sampling error (PD-L1− tumor in the biopsy, while another part of the tumor is PD-L+) and if the variation in scoring around the predictive threshold is incorporated, then the effect of biological heterogeneity and variation in scoring seems lower than 10%. This notion is of great importance as this is the maximally attainable accuracy for PD-L1 IHC in daily practice.
PD-L1 Interpretation Cytology/Cell Blocks
Although 30% to 40% of all advanced NSCLCs are diagnosed by cytology alone, the use of cytology samples for determination of PD-L1 is currently not advocated, as none of the assays has been validated for this purpose. However, since immunostaining of tumor cells is standard diagnostic practice in many institutions,100 PD-L1 staining and quantitation should be technically feasible in principle, provided that appropriate protocols and quality control measures are in place. In contrast, quantitation of PD-L1+ immune cells with the SP142 assay approach will likely be more challenging. The lack of tissue architecture precludes distinction of the relevant immune cells between the tumor cells and at the epithelial-stromal interface from immune cells outside of the tumor boundaries that are considered as being irrelevant for PD-L1 scoring. Emerging data from cell blocks and matched histologic specimens suggest that cytologic material is as good as histologic material for PD-L1 IHC tumor cell analysis.101 However, preexisting lymphocytes in a fine-needle aspirate of a lymph node will remain a major confounding factor and this aspect of PD-L1 testing, especially regarding SP142, will not be solved.
For PD-L1 reporting, the predictive IHC test and threshold used should be reported as well as the controls in case of absence of PD-L1 staining in the NSCLC.
Pitfalls With PD-L1
PD-L1 IHC false positivity is generally due to 2 causes. First, PD-L1+ lymphocytes and histiocytes may lie between PD-L1− tumor cells and be misinterpreted as positive. Careful attention to the nuclear to cytoplasmic ratio and thin nuclear membrane of the macrophages is necessary. Second, cytoplasmic staining of tumor cells may be granular, but not truly membranous, and misinterpreted as positive.
PD-L1 Clinical Validation
As well as technical validation, clinical validation of selected thresholds is necessary to prove the predictive power and validity of the assay and the selected thresholds. The selected thresholds, validated in clinical trials, were at least in part derived from the most effective selection of patients who would, or would not, benefit from therapy, in a given clinical scenario, when compared to standard-of-care treatment (Table 3). This threshold forms the basis of the predictive IHC characteristic and is associated with a difference in response rates to the intended treatment. Establishing the threshold can only be achieved by comparison of a study group, in which (1) treatment with the intended drug is performed, (2) outcome data are known, and (3) (preferably graded) IHC biomarker data are known.
The predictive procedure, including preanalytic, analytic, and interpretation, needs to be robust in time to maintain the predictive value. In establishing the clinical validity of an LDT, IHC performance should be the same as that of the commercial clinically validated test. This implies that (1) the antibody should be titrated and/or incubation time varied to obtain the same signal; (2) the effect of the signal enhancement system should be equal; (3) in serial sections the PD-L1+ areas should be the same; (4) in approximately 10 PD-L1−, 10 PD-L1+ samples and 20 critical predictive samples covering the linear dynamic range, the clinical validated PD-L1 IHC test should be the same; (5) as in general there is no calibration standard for IHC, thus, for selection of critical predictive samples the starting point is the intensity threshold of the initially validated test. Conceptually, those samples that are positive in a clinically validated test and become negative after dilution of the primary antibody to 50% fall within the linear dynamic range.15 Samples that remain positive are uninformative for comparison of different clinically validated assays on the same protein (ie, PD-L1). Samples that are negative in a clinically validated test and turn positive when the concentration of the primary antibody is increased 2-fold also fall within the linear dynamic range (Figure 2). However, if such a sample remains negative, it is not informative for comparison of different clinically validated assays on the same protein (ie, PD-L1). Only informative samples with an epitope/antigen concentration close to both sides of the clinically validated threshold are suitable for the critical predictive sample set. For comparison of the LDT with the clinically validated test, 90% of the samples should have an equal outcome, that is, similar to the External Quality Assessment standard in molecular pathology.102 Importantly, after selection of critical samples, consecutive sections should be stained with the clinically validated assay and the test to be compared (commercial or LDT), and the pattern of staining (including proportion of PD-L1+ neoplastic cells) should be more or less equal. Lastly, the laboratory should be certified and have standard operating procedures (including validation reports) for commercial test/LDT in place to maintain robust testing in time; and (6) the laboratory should regularly and successfully participate in external quality assessment program. If any of these requirements are not fulfilled, generalization of the predictive biomarker cannot be guaranteed and performance for clinical testing should be immediately stopped.
The recent studies on comparison for different commercial PD-L1 IHC assays will be discussed along the methodologic notions discussed above.
In the “Blueprint 1” study, cases were selected to “represent the respective dynamic range of each of the 4 PD-L1 assays,” but the authors103 do not mention how the cases were actually selected. Staining was performed centrally and interpreted separately by 3 companies' pathologists using assay-specific cutoff values. In fact, 14 cases show discordances with assay-specific cutoffs. In comparing 22-8 with 22C3, 2 of 14 (14%) are discordant; 22-8 with Sp263, 5 of 14 (36%) are discordant; and 22C3 with SP263, 6 of 14 (43%) are discordant. This comparison does not attempt to adjust one assay to achieve validation from one assay to the other. From the remaining 24 cases (19 positive by all 4 assays and 5 negative by all 4 assays), it is unclear whether these are really critical samples or not.
The German “Harmonization study”129 examined 15 cases most with high concordance but 4 with systematic differences. To quantify potential differences in the proportions of stained carcinoma cells, the 6 pairwise comparisons of the 4 assays were calculated. For 28-8 versus 22C3, 72% of the pairs were concordant; 13% showed higher proportions for 28-8 and 16% higher proportions for 22C3. Comparisons of SP263 with the other assays indicated higher proportions for SP263 in 46% (28-8), 44% (22C3), and 59% (SP142) of the pairs. Comparisons of SP142 indicated lower proportions in 36% (28-8), 39% (22C3), and 59% (SP263) of the pairs. In this study the selection of cases was from a comprehensively annotated patient cohort to include both adenocarcinoma and squamous cell carcinomas to mimic the composition of clinical trials. A relation to the dynamic range of the assays was not provided.
To establish a representative value for interobserver variation, studies examining consecutive series of NSCLCs should be performed. Currently, in a German study with highly selected cases the interobserver variation at the 1% threshold varied for all 4 assays from 2.8% to 9.6%, and at the 50% threshold ranged from 5.2% to 8.5%.104 At the 16th World Conference on Lung Cancer in Vienna, an Australian study, also in a highly selected set of cases, showed interobserver variation of 15.8% at the 1% threshold and of 18% at the 50% threshold.105 Rehman and colleagues106 examined, with the SP142 protocol, the reproducibility of 5 pathologists' scores on a selected set of cases and showed an intraclass correlation coefficient of 94% agreement among the pathologists for the assessment of PD-L1 in tumor cells, but only 27% agreement in stromal/immune cell PD-L1 expression. The possible lack of consensus on the exact method for reading stromal cells was one of the explanations.
The recently published study of Ratcliffe and colleagues107 examined 493 selected cases stained with 3 protocols (22C3, 28-8, and SP263) of which 200 were read by a second pathologist. The intraobserver overall percentage of agreement between the 3 protocols varied for the 1% threshold between 88.7% and 91.7% and for the 50% threshold between 91.5% and 95.6%. Remarkably, the interobserver overall percentage of agreement between the 3 protocols varied for the 1% threshold between 75.9% and 77% and for the 50% threshold between 94.5% and 97.5%. It is likely that a relevant part of the examined cases will consist of critical samples. If we assume that approximately 150 of the more than 200+ negative cases and more than 50 of the highly positive cases may be looked upon, from a methodologic point of view, as not informative, then the denominator will be smaller and the comparison in samples that really matter will increase: for example, from 10% to 16%. It would be useful to know the actual critical samples of this study and establish the clinical validity. Interestingly, the size of this study sheds light on the interobserver variation, being well within the 9% range at the 50% threshold. However, at the 1% threshold there is room for improvement. To what extend training may improve this is currently unclear.
A French PD-L1 comparison study, presented at the 16th World Conference on Lung Cancer in Vienna, was performed by 7 centers, contributing 6 NSCLC cases distributed over adenocarcinoma and squamous cell carcinoma as well as more than 3 categories of PD-L1 positivity (<5%, 5%–25%, and ≥50%).108 Each center used its own staining platform (3 with Dako AS Linker 48, 2 with Ventana Benchmark Ultra, and 2 with Leica Bond) and all 7 used 5 staining protocols: 28-8, 22C3, SP142, SP263, and E1L3N. Thus, at least 3 of the 5 sections per case per laboratory were stained by an LDT and minimally 35 sections were required per case, leaving 41 tested cases. The LDT was locally established, aiming for concordance with 28-8 or 22C3 (reference images and tonsil tissue). The slides were centrally relabeled after staining and read once (each pathologist read 6 cases, 35 PD-L1–stained sections) after a preliminary discussion of the PD-L1 scoring method. Thus, interobserver variability was not assessed. Weighted κ was used to compare the tests, based on the 1% and 50% thresholds.
Based on an arbitrary threshold of κ = 0.75, the number of acceptable concordance, according to the authors of the 7 laboratories, was 4, 5, 7, 2, and 4 for 28-8, 22C3, SP263, SP142, and E1L3N, respectively.108 For the LDTs performed on an instrument with different antibodies from those of the commercial kit, 13 of 27 comparisons were below the κ threshold. The agreement/discordant rates and details of signal enhancement systems used are not available yet. It is not clear whether the samples used were critical samples or not. Thus, care has to be taken before implementing LDT for PD-L1 testing.
Overall, the comparison of different PD-L1 assays suggests hopeful data, but more research is needed. According to the FDA a high number of samples need to be examined for noninferiority testing.109 For optimal determination of the interobserver variability in daily practice, a consecutive series of cases is mandatory.
OTHER IHC BIOMARKERS
EGFR mutation–specific antibodies have been developed, which recognize the protein conformation change due to the mutation, but do not bind to the wild-type EGFR protein.
Different antibodies identify a 15-bp deletion in exon 19 and an L858R point mutation in exon 21. Currently, these are not recommended for predictive testing, as the sensitivity, in particular for other exon 19 deletions, is moderate: 15-bp deletion is better recognized than 18- or 12-bp deletion.110–115 However, when limited tissue is available, such as in cytologic samples, application of IHC with these mutation-specific antibodies may be useful.
In Europe, EMA approved the monoclonal anti-EGFR drug necitumumab for patients with squamous NSCLC in advanced stage of disease that expresses the wild-type EGFR protein by IHC.
CONCLUSIONS
The use of IHC for the determination of pulmonary carcinoma biomarkers is a well-established and powerful technique (Table 4). Immunohistochemistry is readily available in pathology laboratories, it is relatively easy to perform and assess, can provide clinically meaningful results very quickly, and is relatively inexpensive. The lung oncology community may be somewhat suspicious of IHC biomarkers, since there have been a number of failed trials that have occasionally faltered owing to the quality and nature of the IHC biomarker (ERCC1), or the biomarker was perhaps wrongly blamed (METMAb, EGFR/cetuximab). This should not be used as evidence against the use of IHC biomarkers. The IHC biomarkers discussed in this article, however, highlight the importance of understanding the practice of IHC, and how the particular chemistry used in any assay may influence the test outcome. This can be used to the patient's advantage; with appropriate usage, for example, ALK IHC protocols may or may not require FISH confirmation, depending on the way in which the ALK IHC is conducted. The complex story surrounding PD-L1 IHC highlights the need for understanding also. The currently described literature on PD-L1 IHC has not yet shown, from a methodologic point of view, that different assays are comparable. The route to LDT and commercial test validation is challenging and complex. Handling along proper methodologic lines is needed to ensure patients receive the most accurate and representative test outcomes.
References
References 24, 27, 38, 39, 41–47, 51, 53–55, 58–66.
Author notes
Dr Cooper receives honoraria from Pfizer Oncology for lectures and advisory board membership. The other authors have no relevant financial interest in the products or companies described in this article.