Recently, a new type of antibody-drug conjugate, trastuzumab-deruxtecan (T-DXd), has been approved for the treatment of metastatic breast cancer with low level of human epidermal growth factor receptor 2 (HER2) gene expression. Thereby, eligibility relies on an accurate diagnosis of HER2-low status defined by immunohistochemistry IHC 1+/2+ with no gene amplification.
To assess pathologists’ accuracy and training efficacy in the diagnosis of HER2-low.
Agreement rates of HER2-low scoring in breast cancer tissue were assessed between expert consensus and real-world pathologists (n = 77 from 14 countries) before and after a specific 4-hour training program for HER2-low detection. Two assays were evaluated, the Ventana Pathway 4B5 CDx and the Dako HercepTest (polyclonal). Concordance of the pathologists with consensus score and efficacy of training were measured by Cohen κ, overall rater agreement, and receiver operating characteristic (ROC) curve statistics.
In the Ventana 4B5 HER2-low category, baseline agreement rates were >80% but <90%. Negative percentage agreement was improved from 80.6% to 91.1% by training. In the HER2-0 category, positive percentage agreement (74.6%) was the only parameter below the 80% benchmark but was significantly improved to 89.2% after training. Training efficacy was confirmed by ROC curve analysis, which shows improvement for the identification of HER2-0 and HER2-low cases. Finally, in-depth examination of cases with discordant HER2 status disclosed specific issues of HER2-low underscoring and overscoring.
The ability of pathologists to achieve acceptable diagnostic accuracy in identifying patients with HER2-low breast cancer could be enhanced by short-term training. Potential routes to improve the quality of HER2-low scoring in clinical practice have been identified.
During the past 20 years, diagnosis of human epidermal growth factor receptor 2 (HER2) status has been essential in determining eligibility of breast cancer patients for HER2-targeted therapies.1,2 Although HercepTest (HcT) SK001 has been frequently used so far, currently HcT GE001 and Ventana PATHWAY 4B5 are the most widely applied immunohistochemistry (IHC) assays in clinical practice.3 In this study the polyclonal HcT SK001 and monoclonal Ventana PATHWAY 4B5 antibody were investigated.
Previous assessments of HER2 expression focused on the identification of HER2-positive tumors defined as either exhibiting IHC protein overexpression (score 3+) or IHC 2+ with HER2 gene amplification by in situ hybridization (ISH+) based on American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) criteria.4 Recently, it has been found that HER2-low advanced breast carcinomas, defined as IHC 1+/2+ with no gene amplification, benefit from trastuzumab deruxtecan (T-DXd), a novel type of antibody-drug conjugate targeting HER2 with a potent topoisomerase-I inhibitor as the cytotoxic drug. In a head-to-head study, T-DXd was found to be more efficacious than ado-trastuzumab emtansine (T-DM1) in patients with HER2-positive metastatic breast cancer who were previously treated with trastuzumab and a taxane.5 Lately, in a phase 3 study with patients with HER2-low metastatic breast cancer, T-DXd met all study endpoints regarding progression-free and overall survival, regardless of hormone receptor (HR) status.6 T-DXd is now US Food and Drug Administration (FDA)–approved for patients with unresectable or metastatic HER2-low breast cancer who have received prior chemotherapy in the metastatic setting or developed disease recurrence during or within 6 months of completing adjuvant chemotherapy with its companion diagnostics 4B5 (Ventana PATHWAY) and by the European Medicine Agency with an conformité européenne (European conformity)–marked in vitro diagnostic device or an alternate validated test.7,8
Approximately 50% of all breast cancer patients are classified as HER2-low6,9,10 and consequently may benefit from this new type of HER2-targeted therapy.11
Pathologists have demonstrated high concordance in diagnosis of unequivocal HER2-positive or HER2-negative breast carcinomas but lower agreement and accuracy when examining low levels of HER2 expression in tumors of IHC 1+ or IHC 2+ types.12–14
Based on 2019 CAP survey data, there was disagreement among pathologists in determining HER2 IHC 0 versus 1+ scoring observed.15 Concordance between 1391 labs in 2019 and 1452 labs in 2020 was equal to or below 70% in 19% of cases for HER2 IHC score 0 versus 1+.16 Another study of 18 pathologists agreed only 26% of the time between IHC 0 and 1+ scores.16 This suggests that current pathology interpretation of HER2-low expression (IHC 1+ and IHC 2+/ISH−) may result in suboptimal patient identification.16 However, in these studies the pathologists were not informed of any clinical actionability for HER2-low expression and historically 1+ or 2+/ISH− has not impacted treatment management. Additionally, there was no formal training for the participating pathologists to ensure accuracy in the identification of HER2-low tumors.
Once HER2-low became an actionable diagnostic target, accurate diagnosis of low levels of HER2 in breast cancer tissue became clinically relevant. It is crucial to evaluate the accuracy of available diagnostics, given that (1) novel HER2-targeted therapy potentially benefits a large population of patients, (2) the therapy starts with the diagnosis of HER2-low based on pathologists’ manual evaluation of HER2 protein expression, and (3) guidelines for HER2-low evaluation that have only recently been published.17,18 Identifying factors that could potentially affect diagnostic accuracy to ensure accurate patient identification is equally important.
This study evaluates worldwide accuracy of HER2 tissue diagnostics, focusing on HER2-low expression in breast cancer tissue, with a large number of pathologists before (first assessment) and after training (second assessment) using digitized images. Based on this study design, specific scoring challenges could be identified to guide future implementation of HER2-low diagnostics.
MATERIALS AND METHODS
This study is a nonclinical assessment of proficiency of real-world pathologists in diagnosing low levels of HER2 expression in breast cancer tissue. It was conducted in 2022 with the primary endpoint of distinction between HER2-low (IHC 1+ and 2+/ISH−) and HER2 IHC 0 in breast cancer samples with the secondary endpoint of measurement of the influence of a training program for correct identification of HER2-low cases.
Case Selection and Study Design
Study cases were obtained from a HER2 expression study by IHC in breast cancer (AstraZeneca R&D, Cambridge, United Kingdom) with 500 commercial samples encompassing comprehensive range of IHC scores (0, 1+, 2+, 3+) stained with both Ventana PATHWAY 4B5 (monoclonal antibody) and Dako HcT (polyclonal antibody) at an independent central laboratory.19
The presented study aimed at pathologist’s proficiency of HER2 scoring according to ASCO/CAP guidelines (2018),4 which apply to any breast cancer sample irrespective of type and site. Accordingly, no samples obtained from metastases had been specifically included. However, almost one third of study cases represented small biopsies, which in terms of size compare to those usually taken from metastatic sites.
Digitized and anonymized images from this study (not related to other AstraZeneca or Daiichi-Sankyo clinical studies) focusing the case distribution on HER2-low expression, as per ASCO/CAP 2018 guidelines,4 were made available to Discovery Life Sciences Biomarker Services GmbH (DLS, Kassel, Germany). Two sample sets comprising 50 cases each were compiled for HcT- or 4B5-stained tumors to be scored by the participating pathologists through Pathotrainer virtual microscope software (Pathomation BV, Antwerp, Belgium, a CellCarta service). Another sample set (n = 25) was developed for training.
The study of HER2-low scoring proficiency comprised 3 steps: baseline pathologist scoring of cases (first assessment), virtual pathologist-to-pathologist training, and rescoring after a 2-week washout-period (second assessment) (Figure 1). The entire project was overseen by a steering committee of 8 independent experts in HER2 breast pathology (I.O.E., M.E.H.H., A.L., R.Y.O., F.P.-L., F.R., J.R., G.V.) advising on each study step defining the number of test and training samples, content of training, data interpretation, and publication. Complex/borderline cases were adjudicated by this expert panel to reach a consensus score.
Sample selection and study design. Abbreviations: DISH, dual ISH; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.
aAstra-Zeneca R&D, Cambridge, United Kingdom.16
Sample selection and study design. Abbreviations: DISH, dual ISH; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.
aAstra-Zeneca R&D, Cambridge, United Kingdom.16
Pathologists
Eighty pathologists were contacted from different laboratories, globally, by the sponsor (Diaceutics plc, Belfast, United Kingdom). They were informed of the study concept, design (Figure 1), and requirement to use a digital pathology platform to interpret HER2 images using ASCO/CAP 2018 scoring criteria.4
HER2-Low Case Classification and Selection
Study and training cases were selected according to the scoring scheme initially introduced by the FDA in 1998,2 then adopted and further developed by ASCO/CAP (version 2018)4 (Figure 2). Analogous to the HER2-low term, the IHC 0 category was named HER2-0.18 Within this IHC 0 group, pathologists were asked to also record any membrane staining in ≤10% invasive tumor cells (>no staining<1+). Intensity scoring was based on application of the magnification rule as introduced by the DLS group.20,21
Classification scheme of HER2 staining according to FDA,2 ASCO/CAP 20184 in comparison to new scoring of HER2-low (versus HER2-0 and HER2-positive). Immunohistochemically stained tissue sections (see upper row) are evaluated stepwise first by determining intensity of membrane staining using magnification rule,20,21 followed by the assessment of circularity and finally the percentage of stained tumor cells. Classification of diagnostic groups is based on these 3 criteria in combination with HER2 in situ hybridization data in IHC 2+ cases. Note, the dashed line between HER2 IHC 1+ and IHC 0 is representative of cases at the HER2 IHC 0 border where staining intensity is similar to HER2 IHC 1+ with incomplete circularity and/or ≤10% of stained tumor cells.
Abbreviations: ASCO, American Society of Clinical Oncology; CAP, College of American Pathologists; FDA, US Food and Drug Administration; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.
aThe interpretation of HER2 IHC 2+ has changed from “weak positive” to “equivocal” after ISH had been introduced. The “historical FDA classification” is HER2-positive (IHC 3+ or IHC 2+) versus HER2-negative (IHC 1+/IHC 0).
Classification scheme of HER2 staining according to FDA,2 ASCO/CAP 20184 in comparison to new scoring of HER2-low (versus HER2-0 and HER2-positive). Immunohistochemically stained tissue sections (see upper row) are evaluated stepwise first by determining intensity of membrane staining using magnification rule,20,21 followed by the assessment of circularity and finally the percentage of stained tumor cells. Classification of diagnostic groups is based on these 3 criteria in combination with HER2 in situ hybridization data in IHC 2+ cases. Note, the dashed line between HER2 IHC 1+ and IHC 0 is representative of cases at the HER2 IHC 0 border where staining intensity is similar to HER2 IHC 1+ with incomplete circularity and/or ≤10% of stained tumor cells.
Abbreviations: ASCO, American Society of Clinical Oncology; CAP, College of American Pathologists; FDA, US Food and Drug Administration; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.
aThe interpretation of HER2 IHC 2+ has changed from “weak positive” to “equivocal” after ISH had been introduced. The “historical FDA classification” is HER2-positive (IHC 3+ or IHC 2+) versus HER2-negative (IHC 1+/IHC 0).
For most study cases (61 of 65) HER2 amplification status was determined by using silver Ventana HER2 Dual ISH (DISH) DNA Probe Cocktail assay (conformité européenne–marked in vitro diagnostic) on a BenchMark ULTRA IHC/ISH system according to manufacturer’s protocol. Tumors with HER2/chromosome enumeration probe 17 (CEP17) ratio ≥2.0 and an average HER2 copy number of ≥4.0 signals/cell were considered amplified.
Study and training cases initially interpreted by 1 AstraZeneca pathologist were reevaluated and where appropriate, rescored by 2 DLS pathologists with extensive experience in HER2 interpretation (J.R., H.-U.S.).
In the absence of a validated reference standard distinguishing HER2-0 from HER2-low, the steering committee adjudicated cases considered challenging (14 of 50 cases) to determine a consensus score.
Taking DISH data into consideration, only those cases that could clearly be assigned to 1 of the 3 different HER2-classes (ie, HER2-0, HER2-low, and HER2-positive) were included in the final data evaluation.
Virtual Pathologist Diagnosis and Virtual Live Training
On the Pathotrainer pathology platform, hematoxylin-eosin–stained, isotype negative control, and HER2-stained slides were provided for each case along with the 2018 ASCO/CAP guideline.4 A screen suitability check was implemented to guarantee a standardized view of digitized images. Cases were classified by both IHC scoring category (IHC 0, >no staining<1+, 1+, 2+, 3+) and percentage of tumor cell positivity at each intensity level.
According to the study design (Figure 1), participants were provided a proficiency assessment round of 50 stained cases (either 4B5 or HcT) that were completed during the course of 2 weeks. A week after finalization of the first proficiency assessment round, all participants were trained in groups of 10–20 within 4 weeks by 2 trainers (5 by J.R., 1 by C.A.). Each live virtual training, including slides from both 4B5 and HcT, was done on 1 day for 4 hours using an established format (see Supplemental Table 1 in the supplemental digital content, containing 4 tables and 2 figures at https://meridian.allenpress.com/aplm in the May 2025 table of contents.).22
Finally, cases that had been either underscored (classifying HER2-low as HER2-0) or overscored (classifying HER2-0 as HER2-low) by more than 10% of pathologists were reviewed for specific tissue- or staining-related issues that could underlie respective misscoring.
Statistical Analysis
The extent to which raters assigned the same score to the consensus HER2 category (proficiency level) was measured by Cohen κ and rater agreement statistics where Cohen κ measures interrater reliability (>0.8: almost perfect/excellent; 0.61–0.8: substantial/good; 0.41–0.6: moderate; <0.41: fair, slight, and poor). Consensus intervals were determined at the 95% level accordingly.23–25 Degree of interobserver variability was assessed by overall rater agreement (ORA) and related metrics (ie, positive percentage agreement [PPA], measuring how often the presence of the corresponding HER2 category was correctly identified, and negative percentage agreement [NPA], assessing how often the absence of the corresponding HER2 category was correctly identified).26 To demonstrate the effect of training, receiver operating characteristic (ROC) curve analysis was performed using R (R Core Team 2022).27 Results are considered excellent for area under the curve (AUC) values of 0.9–1, good for 0.8–0.9, and fair for 0.7–0.8. Lower values indicate a poor (0.6–0.7) or failed test (0.5–0.6). Significance level was set to P < .05 and calculated by χ2 test. Samples considered not evaluable by a participant were excluded from calculations.
RESULTS
Participating Pathologists
A total of 77 pathologists agreed to participate; 49 evaluated Ventana 4B5 and 28 Dako HcT–stained cases for the first assessment. Two pathologists could not participate in the second assessment and the data plausibility check disclosed another reader with a systematic recording error, resulting in 74 pathologists with evaluable results (Figure 1). Participating pathologists represented 14 countries with highest representation from Europe (n = 41) and North America (n = 23) followed by Asia/Pacific (n = 8) and South America (n = 2) (Supplemental Table 2).
Characteristics of Study Cases
The 2 study sets eligible for proficiency analysis (n = 46 by 4B5 and n = 45 by HcT) were comparable with respect to specimen type (resections versus biopsies, 72% versus 28%), histological tumor type (83% versus 87% not otherwise specified, 17% versus 13% lobular), and tumor grade (1 versus 2 versus 3, 15% versus 70% versus 15%, respectively). Distribution between the 3 categories of HER2-positive, HER2-low, and HER2-0 was in the expected range for 4B5 assay: 13% versus 52% versus 35%.11 Since DISH data was not available at the time of case enrollment only IHC data was used for case selection. Therefore, the corresponding score distribution for HcT differs by doubling of HER2-positives and decrease of HER2-low cases: 27% versus 33% versus 40%.
Accuracy of HER2-low and HER2-0 Diagnostics and Efficacy of Training
In the HER2-low category, baseline pretraining agreement rates between expert consensus and 77 pathologists from 14 countries were >80%. After training, NPA improved from 80.6% to 91.1%. In the HER2-0 category, PPA (74.6%) was the only baseline parameter below the 80% benchmark but also improved to 89.2% by training (P < .001; Table 1). To assess the effect of training more specifically on the interpretation of HER2-low expression, ROC curve analysis and assessment of agreement rates were done. These disclosed a positive effect of training, especially for participants using 4B5. The AUC showed excellent performance values for HER2-positive diagnostics with both tests (≥0.9), and was highest with 4B5. Identification of HER2-0 and HER2-low was good (AUC 0.8–0.9) with both tests and was improved by specific instruction, especially in 4B5 users (Figure 3, A through F). Furthermore, training showed a positive effect within all 3 new HER2 categories for both assays according to the Cohen weighted κ (Supplemental Tables 3 and 4).
Diagnostic ability of pathologists in classification of HER2 expression in breast cancer. Receiver operating characteristic curves for 4B5 (A through C) and HercepTest (D through F) users in the 3 HER2 categories: HER2-0 (A, D), HER2-low (B, E), and HER2-positive (C, F). Area under the curve before (pre-AUC) and following short-term training. Values ≥0.9 are considered excellent, between 0.8 and <0.9 are considered good. Abbreviations: AUC, area under the curve; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.
a(IHC 0)=0; (IHC 1+, IHC 2+ [ISH+/−], IHC 3+) = 1.
b(IHC 0; IHC 2+ [ISH+], IHC 3+) = 0; (IHC 1+, IHC 2+ [ISH−]) = 1.
c(IHC 0; IHC 1+, IHC 2+ [ISH−]) = 0; (IHC 2+ [ISH+], IHC 3+) = 1.
Diagnostic ability of pathologists in classification of HER2 expression in breast cancer. Receiver operating characteristic curves for 4B5 (A through C) and HercepTest (D through F) users in the 3 HER2 categories: HER2-0 (A, D), HER2-low (B, E), and HER2-positive (C, F). Area under the curve before (pre-AUC) and following short-term training. Values ≥0.9 are considered excellent, between 0.8 and <0.9 are considered good. Abbreviations: AUC, area under the curve; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.
a(IHC 0)=0; (IHC 1+, IHC 2+ [ISH+/−], IHC 3+) = 1.
b(IHC 0; IHC 2+ [ISH+], IHC 3+) = 0; (IHC 1+, IHC 2+ [ISH−]) = 1.
c(IHC 0; IHC 1+, IHC 2+ [ISH−]) = 0; (IHC 2+ [ISH+], IHC 3+) = 1.
Diagnostic Accuracy in the Whole HER2 Spectrum With Different Classifications
With respect to the historical FDA classification scheme comparing IHC 3+/2+ and IHC 0/1+ cases, overall posttraining rater agreement of 92.2% and 89.8% was achieved for both 4B5 and HcT, respectively (Supplemental Figures 1 and 2). Consensus was highest for ASCO/CAP classification into HER2-negative and HER2-positive (IHC3+ and IHC2+/amplified), both by κ statistics (almost perfect, >0.80) and ORA (92%–99%) (Supplemental Figure 2). Introduction of the new scoring category, HER2-low, caused a decrease of overall concordance to ORA of ∼82%, whereas κ values still indicated a substantial interrater agreement (κ >0.74) (Supplemental Figures 1 and 2). No significant differences in HER2 scoring were observed regarding the 14 countries and regions (data not shown).
Challenges of HER2-Low Scoring
The 4B5-stained collective turned out to be most informative as it represented the expected range of about 50%–60% HER2-low cases. Underscoring (classifying HER2-low as HER2-0) was more frequently observed as compared to overscoring (classifying HER2-0 as HER2-low) with 12 versus 9 cases (first assessment), respectively. The latter could significantly be reduced to 5 cases by the current training method. Review of respective cases disclosed staining and tissue issues that were almost specific for either underscored or overscored tumor samples (Table 2). Most instances of underscoring (6 of 13 cases) occurred in weakly stained cases showing single-cell spread, such as of lobular type rendering overestimation of cell number (as only half of tumor area is covered tumor cells). In addition, weakly stained single tumor cells can easily be missed, as high magnification is needed for them to be detected (Figure 4, A). Another 3 cases showed patchy membrane staining which was erroneously excluded from scoring (Figure 4, B). All instances of overscoring occurred in cases bordering the 10% cutoff, such as by counting in nonspecific basal membrane–like or cytoplasmic staining (5 of 9 cases) (Figure 4, C). Another 2 cases showed distinct cell borders that could be mistaken as faint membrane stains (Figure 4, D). In 1 overscored and 1 underscored tumor, pronounced shrinkage artefacts were observed; in such cases steering committee members recommended to consider the slide not evaluable and to restain or select a different tumor block if available.
Example figures documenting main scoring issues of underscoring (A and B) and overscoring (C and D). Tumor with lobular growth pattern (A) and only barely visible HER2-stained cell membranes at low magnification. High-power magnification (A, inset) discloses specific HER2 tumor cell membrane staining. Such cases with single/few tumor cell spread were frequently underscored as IHC 0 instead of IHC 1+. Solid-growing tumor (B) with incomplete patchy membrane staining of moderate intensity, IHC 1+. High-power field (B, inset) shows dotted and incomplete stain precisely tracing the tumor cell membranes. This staining pattern should not be excluded from scoring (in contrast to C). Carcinoma with partly tubular growth (C) exhibiting irregular staining at the outer boundary of the tumor cell nests. Higher-power magnification (C, inset) discloses irregular stains spilling over into stroma, some associated with shrinkage clefts or being nonspecific submembranous cytoplasmic stains. This type of basal membrane–like staining should be excluded from scoring thus leading to IHC 0 instead of IHC 1+. Solid growing carcinoma (D) with weak to barely visible HER2 membrane staining and distinct cell borders, the latter mistakenly considered specific though IHC 0. High-power field (D, inset) shows weak but specific HER2 tumor cell membrane staining that has to be distinguished from preexisting unstained cell contours, for instance, by comparison with isotype IgG negative controls (4B5, original magnifications ×10 [A], ×20 [B, C, D], ×40 [A through D insets]). Abbreviations: HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.
Example figures documenting main scoring issues of underscoring (A and B) and overscoring (C and D). Tumor with lobular growth pattern (A) and only barely visible HER2-stained cell membranes at low magnification. High-power magnification (A, inset) discloses specific HER2 tumor cell membrane staining. Such cases with single/few tumor cell spread were frequently underscored as IHC 0 instead of IHC 1+. Solid-growing tumor (B) with incomplete patchy membrane staining of moderate intensity, IHC 1+. High-power field (B, inset) shows dotted and incomplete stain precisely tracing the tumor cell membranes. This staining pattern should not be excluded from scoring (in contrast to C). Carcinoma with partly tubular growth (C) exhibiting irregular staining at the outer boundary of the tumor cell nests. Higher-power magnification (C, inset) discloses irregular stains spilling over into stroma, some associated with shrinkage clefts or being nonspecific submembranous cytoplasmic stains. This type of basal membrane–like staining should be excluded from scoring thus leading to IHC 0 instead of IHC 1+. Solid growing carcinoma (D) with weak to barely visible HER2 membrane staining and distinct cell borders, the latter mistakenly considered specific though IHC 0. High-power field (D, inset) shows weak but specific HER2 tumor cell membrane staining that has to be distinguished from preexisting unstained cell contours, for instance, by comparison with isotype IgG negative controls (4B5, original magnifications ×10 [A], ×20 [B, C, D], ×40 [A through D insets]). Abbreviations: HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.
DISCUSSION
This study is the first of its kind to systematically analyze the accuracy of HER2 diagnosis in breast cancer tissue by a large number of pathologists (n = 74) from 14 countries covering academic institutions (n = 35), hospital pathology labs (n = 27), and private practices (n = 12). Agreement rates, posttraining, for the HER2-low category between expert consensus and the real-world pathologists were 82.9% (4B5) and 84.8% (HcT), which are between the acceptable ORA percentage of ≥80% for test evaluation and an ideal ORA percentage of ≥90%.28–29 The agreement rates of HER2-low are lower than the currently recommended 90% for HER2-positive17 but higher than those when HER2-positive was introduced decades ago and reported in the observations of the German Breast Group studies (GeparTrio to GeparSepto), where discordance rate between local and central testing decreased from 52% to 8.4% during a period of 12 years (2005–2017).30 Although there is currently no global consensus available regarding the optimal agreement rate for HER2-low, it is the steering committee’s consensus that the 80% agreement should be practically acceptable at this early stage of clinical actionability for 1+ and 2+/ISH−. For example, Clinical Laboratory Improvement Amendments of 1988 passing requirements in proficiency testing are set at 80% in general31 and CAP also uses this criterion, for instance, by selecting HER2 test cases only if reference laboratories achieve at least 80% consensus for individual tissue cores.29 As noted earlier, NPA was improved from 80.6% to 91.1% by a 4-hour training program. PPA (74.6%) in IHC-0 category was the only parameter below the 80% benchmark at baseline but was also improved to almost 90% (89.2%) after 4 hours of training. Thus, training significantly improved pathologists’ proficiency in identifying breast carcinomas that do not belong to HER2-low category (true negatives) and are instead HER2-0.
We also analyzed the agreement rates in current ASCO/CAP and historical FDA classifications (see Figure 2). The pathologists showed high level of accuracy in the diagnosis of the ASCO/CAP categories of HER2-negative versus HER2-positive of up to 98% by 4B5 and 92% by HcT, irrespective of training. Similarly, Cohen κ concordance values were above 0.8.
Within the ASCO/CAP and 3-tier HER2-classifications, overall agreement rates show marginal nonsignificant numerical increases after training. This may reflect the fact that the accuracy of HER2-positive (2+/ISH+ and 3+) is at an almost perfect level and may have masked the improvement of HER2-low and HER2-0. The missing training effect on HcT is most likely related to the 2-fold–high prevalence of HER2-positives in this sample set.
These agreement rates are overall higher than those described previously,16,32 where participants were not informed about the intention of the study, did not receive specific training modalities, and were told no details about the assays used. Another recent study33 on 105 HER2-negative breast cancer biopsies (stained with 4B5) demonstrated that consensus between 16 pathologists was higher if only HER2-null cases (without any staining) were grouped against cases with any staining (>no staining<1+, IHC 1+, and IHC 2+/nonamplified). The most relevant factor underlying interobserver disagreement in that study was poor reproducibility for cases bordering the 10% cutoff. Therefore, authors recommended using the “all or nothing” principle originally mentioned in the ASCO/CAP 2007 guideline.28 In our series we also found that borderline status was the most significant cause of reader disagreement, accounting for ∼60% of all discordances. Regrouping alone did not improve agreement rates (data not shown), though we did identify specific staining issues that contributed to overscoring or underscoring (Figure 4). Some of these, such as heterogeneity and cytoplasmic staining, have been described as important reasons of discordance among 16 expert pathologists from the United Kingdom and Ireland as well.34
In contrast to other recent studies,33–36 this is the first to provide globally representative data about the level of ambiguity in HER2-low testing in the real world. It also shows that systematic training, making use of the magnification rule20,21 for reproducible intensity scoring and including demonstration of specific challenging staining patterns, is a valuable tool to improve HER2-low scoring. We do expect an even higher effect if future training is focused on addressing the main sources of discordance (Table 2).
Future training should also take into consideration published HER2 scoring guidelines, of which the first have recently been edited in France37 followed by Germany38 and the United Kingdom.39 Besides scoring issues, the control of preanalytical factors such as quality of fixation and the staining procedure itself is critical for accurate patient identification, and the implementation of on-slide controls that reflect the various staining intensities can further aid in staining quality control.40–41
Due to the approval of T-DXd in HER2-low breast cancer, international guidelines have recently been updated (June 2023)17,18 to reflect this new treatment category. The ASCO-CAP guideline update does not endorse creating a new HER2-low result category for pathology reporting but acknowledges the need for pathologists to include a comment describing patient eligibility for therapy in HER2-low cases and the adoption of best laboratory practices for patient identification. Some of these best practices include the use of high magnification (×40) to interpret slides, secondary pathology review for borderline cases, and controls with a range of protein expression to distinguish between the 0 and 1+ cutoff.17 A European Society for Medical Oncology consensus statement addressing HER2-low breast cancer provided a recommendation for pathology participation in training and education programs to better report HER2-low.18 It also emphasized the need for pathologists to use a properly validated assay for the detection of HER2-low.18 Both guidelines also highlighted the increased importance of preanalytical variables in this new category.
To achieve more reproducible HER2-low scoring, one may consider more refined technologies such as digital image analysis,42,43 for which CAP guidelines are already available.44 However, before recommending new assays and technologies for HER2-low diagnostics,16 one should note that current clinical trial data about efficacy of antibody-drug conjugates in HER2-low breast cancer have been obtained by using a validated and widely used assay (4B5) with well-established ASCO/CAP scoring guidelines.4,6
The limitations of this study include the following: (1) digital images were used for scoring, which is not a standard clinical practice; (2) no clinical trial cases were used; and (3) consensus scoring was determined by expert pathologists, but not the pathologists of the central testing laboratory used to read for the DESTINY-Breast04 trial.
In conclusion, the global level of diagnostic accuracy for diagnosing breast cancer with low levels of HER2 expression is within an acceptable range but is still not optimal. Short-term training could significantly improve the accuracy of HER2 testing, and sources for misscorings could be delineated being specific for either underscoring or overscoring.
Special thanks to Judy Yu, PhD, a former AstraZeneca employee, who provided expertise and technical insights to support the study. Amy Hanlon Newell, PhD, Andrew Livingston, BS, and Victoria de Giorgio-Miller, MD, provided insightful comments. Sharon Allen oversaw the project management. Ouzna Morsli, MD, provided guidance for the study. Under the guidance of authors, assistance in editorial support was provided by ApotheCom and was funded by Daiichi Sankyo.
References
Author notes
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the May 2025 table of contents. See Supplemental Table 2 for the list of HER2-low study group members.
Rüschoff and Penner contributed equally.
In March 2019, AstraZeneca entered into a global development and commercialization collaboration agreement with Daiichi Sankyo for trastuzumab deruxtecan (T-DXd; DS-8201). This study was sponsored by Daiichi Sankyo, in collaboration with AstraZeneca.
Competing Interests
Desai and Moh are full-time employees at Daiichi Sankyo Inc; Desai and Moh confirm stock ownership in Daiichi Sankyo Inc. Penault-Llorca has received personal funds for consultation and advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Eli Lilly, Novartis, Seagen, and Pfizer. Lebeau has received speaker honoraria and/or personal funds for an advisory role from AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Myriad Genetics, Novartis, Roche, Menarini Stemline, and Veracyte Inc.; writer engagement from Qualitätssicherungs-Initiative Pathologie (QuIP); and is a steering committee member of Diaceutics and Daiichi Sankyo Inc. D’Arrigo is the founder of Poundbury Cancer Institute and has received personal funds for consultation and advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, and Pfizer. Viale has received personal funds for consultation and an advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Eli Lilly, Agilent, and Pfizer. Rüschoff is cofounder of Targos Molecular Pathology GmbH, now part of Discovery Life Sciences, to which speaker honoraria and personal funds for an advisor role from Astellas, AstraZeneca, Bristol Myers Squibb, Daiichi Sankyo Inc, GSK plc, Merck Sharp & Dohme, Merck KGaA, and Qualitätssicherungs-Initiative Pathologie are reimbursed. Rojo has received personal funds for an advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Bristol Myers Squibb, Pfizer, Novartis, Amgen, Merck KGaA, and Sophia Genetics; and received travel funds from Roche. The other authors have no relevant financial interest in the products or companies described in this article.
Parts of the study were presented as a poster at the San Antonio Breast Cancer Symposium (SABCS) 2022 (HER2-13) from December 6th to December 10th, 2022; San Antonio, Texas.