Context.—

Recently, a new type of antibody-drug conjugate, trastuzumab-deruxtecan (T-DXd), has been approved for the treatment of metastatic breast cancer with low level of human epidermal growth factor receptor 2 (HER2) gene expression. Thereby, eligibility relies on an accurate diagnosis of HER2-low status defined by immunohistochemistry IHC 1+/2+ with no gene amplification.

Objective.—

To assess pathologists’ accuracy and training efficacy in the diagnosis of HER2-low.

Design.—

Agreement rates of HER2-low scoring in breast cancer tissue were assessed between expert consensus and real-world pathologists (n = 77 from 14 countries) before and after a specific 4-hour training program for HER2-low detection. Two assays were evaluated, the Ventana Pathway 4B5 CDx and the Dako HercepTest (polyclonal). Concordance of the pathologists with consensus score and efficacy of training were measured by Cohen κ, overall rater agreement, and receiver operating characteristic (ROC) curve statistics.

Results.—

In the Ventana 4B5 HER2-low category, baseline agreement rates were >80% but <90%. Negative percentage agreement was improved from 80.6% to 91.1% by training. In the HER2-0 category, positive percentage agreement (74.6%) was the only parameter below the 80% benchmark but was significantly improved to 89.2% after training. Training efficacy was confirmed by ROC curve analysis, which shows improvement for the identification of HER2-0 and HER2-low cases. Finally, in-depth examination of cases with discordant HER2 status disclosed specific issues of HER2-low underscoring and overscoring.

Conclusions.—

The ability of pathologists to achieve acceptable diagnostic accuracy in identifying patients with HER2-low breast cancer could be enhanced by short-term training. Potential routes to improve the quality of HER2-low scoring in clinical practice have been identified.

During the past 20 years, diagnosis of human epidermal growth factor receptor 2 (HER2) status has been essential in determining eligibility of breast cancer patients for HER2-targeted therapies.1,2  Although HercepTest (HcT) SK001 has been frequently used so far, currently HcT GE001 and Ventana PATHWAY 4B5 are the most widely applied immunohistochemistry (IHC) assays in clinical practice.3  In this study the polyclonal HcT SK001 and monoclonal Ventana PATHWAY 4B5 antibody were investigated.

Previous assessments of HER2 expression focused on the identification of HER2-positive tumors defined as either exhibiting IHC protein overexpression (score 3+) or IHC 2+ with HER2 gene amplification by in situ hybridization (ISH+) based on American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) criteria.4  Recently, it has been found that HER2-low advanced breast carcinomas, defined as IHC 1+/2+ with no gene amplification, benefit from trastuzumab deruxtecan (T-DXd), a novel type of antibody-drug conjugate targeting HER2 with a potent topoisomerase-I inhibitor as the cytotoxic drug. In a head-to-head study, T-DXd was found to be more efficacious than ado-trastuzumab emtansine (T-DM1) in patients with HER2-positive metastatic breast cancer who were previously treated with trastuzumab and a taxane.5  Lately, in a phase 3 study with patients with HER2-low metastatic breast cancer, T-DXd met all study endpoints regarding progression-free and overall survival, regardless of hormone receptor (HR) status.6  T-DXd is now US Food and Drug Administration (FDA)–approved for patients with unresectable or metastatic HER2-low breast cancer who have received prior chemotherapy in the metastatic setting or developed disease recurrence during or within 6 months of completing adjuvant chemotherapy with its companion diagnostics 4B5 (Ventana PATHWAY) and by the European Medicine Agency with an conformité européenne (European conformity)–marked in vitro diagnostic device or an alternate validated test.7,8 

Approximately 50% of all breast cancer patients are classified as HER2-low6,9,10  and consequently may benefit from this new type of HER2-targeted therapy.11 

Pathologists have demonstrated high concordance in diagnosis of unequivocal HER2-positive or HER2-negative breast carcinomas but lower agreement and accuracy when examining low levels of HER2 expression in tumors of IHC 1+ or IHC 2+ types.12–14 

Based on 2019 CAP survey data, there was disagreement among pathologists in determining HER2 IHC 0 versus 1+ scoring observed.15  Concordance between 1391 labs in 2019 and 1452 labs in 2020 was equal to or below 70% in 19% of cases for HER2 IHC score 0 versus 1+.16  Another study of 18 pathologists agreed only 26% of the time between IHC 0 and 1+ scores.16  This suggests that current pathology interpretation of HER2-low expression (IHC 1+ and IHC 2+/ISH−) may result in suboptimal patient identification.16  However, in these studies the pathologists were not informed of any clinical actionability for HER2-low expression and historically 1+ or 2+/ISH− has not impacted treatment management. Additionally, there was no formal training for the participating pathologists to ensure accuracy in the identification of HER2-low tumors.

Once HER2-low became an actionable diagnostic target, accurate diagnosis of low levels of HER2 in breast cancer tissue became clinically relevant. It is crucial to evaluate the accuracy of available diagnostics, given that (1) novel HER2-targeted therapy potentially benefits a large population of patients, (2) the therapy starts with the diagnosis of HER2-low based on pathologists’ manual evaluation of HER2 protein expression, and (3) guidelines for HER2-low evaluation that have only recently been published.17,18  Identifying factors that could potentially affect diagnostic accuracy to ensure accurate patient identification is equally important.

This study evaluates worldwide accuracy of HER2 tissue diagnostics, focusing on HER2-low expression in breast cancer tissue, with a large number of pathologists before (first assessment) and after training (second assessment) using digitized images. Based on this study design, specific scoring challenges could be identified to guide future implementation of HER2-low diagnostics.

This study is a nonclinical assessment of proficiency of real-world pathologists in diagnosing low levels of HER2 expression in breast cancer tissue. It was conducted in 2022 with the primary endpoint of distinction between HER2-low (IHC 1+ and 2+/ISH−) and HER2 IHC 0 in breast cancer samples with the secondary endpoint of measurement of the influence of a training program for correct identification of HER2-low cases.

Case Selection and Study Design

Study cases were obtained from a HER2 expression study by IHC in breast cancer (AstraZeneca R&D, Cambridge, United Kingdom) with 500 commercial samples encompassing comprehensive range of IHC scores (0, 1+, 2+, 3+) stained with both Ventana PATHWAY 4B5 (monoclonal antibody) and Dako HcT (polyclonal antibody) at an independent central laboratory.19 

The presented study aimed at pathologist’s proficiency of HER2 scoring according to ASCO/CAP guidelines (2018),4  which apply to any breast cancer sample irrespective of type and site. Accordingly, no samples obtained from metastases had been specifically included. However, almost one third of study cases represented small biopsies, which in terms of size compare to those usually taken from metastatic sites.

Digitized and anonymized images from this study (not related to other AstraZeneca or Daiichi-Sankyo clinical studies) focusing the case distribution on HER2-low expression, as per ASCO/CAP 2018 guidelines,4  were made available to Discovery Life Sciences Biomarker Services GmbH (DLS, Kassel, Germany). Two sample sets comprising 50 cases each were compiled for HcT- or 4B5-stained tumors to be scored by the participating pathologists through Pathotrainer virtual microscope software (Pathomation BV, Antwerp, Belgium, a CellCarta service). Another sample set (n = 25) was developed for training.

The study of HER2-low scoring proficiency comprised 3 steps: baseline pathologist scoring of cases (first assessment), virtual pathologist-to-pathologist training, and rescoring after a 2-week washout-period (second assessment) (Figure 1). The entire project was overseen by a steering committee of 8 independent experts in HER2 breast pathology (I.O.E., M.E.H.H., A.L., R.Y.O., F.P.-L., F.R., J.R., G.V.) advising on each study step defining the number of test and training samples, content of training, data interpretation, and publication. Complex/borderline cases were adjudicated by this expert panel to reach a consensus score.

Figure 1.

Sample selection and study design. Abbreviations: DISH, dual ISH; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.

aAstra-Zeneca R&D, Cambridge, United Kingdom.16 

Figure 1.

Sample selection and study design. Abbreviations: DISH, dual ISH; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.

aAstra-Zeneca R&D, Cambridge, United Kingdom.16 

Close modal

Pathologists

Eighty pathologists were contacted from different laboratories, globally, by the sponsor (Diaceutics plc, Belfast, United Kingdom). They were informed of the study concept, design (Figure 1), and requirement to use a digital pathology platform to interpret HER2 images using ASCO/CAP 2018 scoring criteria.4 

HER2-Low Case Classification and Selection

Study and training cases were selected according to the scoring scheme initially introduced by the FDA in 1998,2  then adopted and further developed by ASCO/CAP (version 2018)4  (Figure 2). Analogous to the HER2-low term, the IHC 0 category was named HER2-0.18  Within this IHC 0 group, pathologists were asked to also record any membrane staining in ≤10% invasive tumor cells (>no staining<1+). Intensity scoring was based on application of the magnification rule as introduced by the DLS group.20,21 

Figure 2.

Classification scheme of HER2 staining according to FDA,2  ASCO/CAP 20184  in comparison to new scoring of HER2-low (versus HER2-0 and HER2-positive). Immunohistochemically stained tissue sections (see upper row) are evaluated stepwise first by determining intensity of membrane staining using magnification rule,20,21  followed by the assessment of circularity and finally the percentage of stained tumor cells. Classification of diagnostic groups is based on these 3 criteria in combination with HER2 in situ hybridization data in IHC 2+ cases. Note, the dashed line between HER2 IHC 1+ and IHC 0 is representative of cases at the HER2 IHC 0 border where staining intensity is similar to HER2 IHC 1+ with incomplete circularity and/or ≤10% of stained tumor cells.

Abbreviations: ASCO, American Society of Clinical Oncology; CAP, College of American Pathologists; FDA, US Food and Drug Administration; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.

aThe interpretation of HER2 IHC 2+ has changed from “weak positive” to “equivocal” after ISH had been introduced. The “historical FDA classification” is HER2-positive (IHC 3+ or IHC 2+) versus HER2-negative (IHC 1+/IHC 0).

Figure 2.

Classification scheme of HER2 staining according to FDA,2  ASCO/CAP 20184  in comparison to new scoring of HER2-low (versus HER2-0 and HER2-positive). Immunohistochemically stained tissue sections (see upper row) are evaluated stepwise first by determining intensity of membrane staining using magnification rule,20,21  followed by the assessment of circularity and finally the percentage of stained tumor cells. Classification of diagnostic groups is based on these 3 criteria in combination with HER2 in situ hybridization data in IHC 2+ cases. Note, the dashed line between HER2 IHC 1+ and IHC 0 is representative of cases at the HER2 IHC 0 border where staining intensity is similar to HER2 IHC 1+ with incomplete circularity and/or ≤10% of stained tumor cells.

Abbreviations: ASCO, American Society of Clinical Oncology; CAP, College of American Pathologists; FDA, US Food and Drug Administration; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.

aThe interpretation of HER2 IHC 2+ has changed from “weak positive” to “equivocal” after ISH had been introduced. The “historical FDA classification” is HER2-positive (IHC 3+ or IHC 2+) versus HER2-negative (IHC 1+/IHC 0).

Close modal

For most study cases (61 of 65) HER2 amplification status was determined by using silver Ventana HER2 Dual ISH (DISH) DNA Probe Cocktail assay (conformité européenne–marked in vitro diagnostic) on a BenchMark ULTRA IHC/ISH system according to manufacturer’s protocol. Tumors with HER2/chromosome enumeration probe 17 (CEP17) ratio ≥2.0 and an average HER2 copy number of ≥4.0 signals/cell were considered amplified.

Study and training cases initially interpreted by 1 AstraZeneca pathologist were reevaluated and where appropriate, rescored by 2 DLS pathologists with extensive experience in HER2 interpretation (J.R., H.-U.S.).

In the absence of a validated reference standard distinguishing HER2-0 from HER2-low, the steering committee adjudicated cases considered challenging (14 of 50 cases) to determine a consensus score.

Taking DISH data into consideration, only those cases that could clearly be assigned to 1 of the 3 different HER2-classes (ie, HER2-0, HER2-low, and HER2-positive) were included in the final data evaluation.

Virtual Pathologist Diagnosis and Virtual Live Training

On the Pathotrainer pathology platform, hematoxylin-eosin–stained, isotype negative control, and HER2-stained slides were provided for each case along with the 2018 ASCO/CAP guideline.4  A screen suitability check was implemented to guarantee a standardized view of digitized images. Cases were classified by both IHC scoring category (IHC 0, >no staining<1+, 1+, 2+, 3+) and percentage of tumor cell positivity at each intensity level.

According to the study design (Figure 1), participants were provided a proficiency assessment round of 50 stained cases (either 4B5 or HcT) that were completed during the course of 2 weeks. A week after finalization of the first proficiency assessment round, all participants were trained in groups of 10–20 within 4 weeks by 2 trainers (5 by J.R., 1 by C.A.). Each live virtual training, including slides from both 4B5 and HcT, was done on 1 day for 4 hours using an established format (see Supplemental Table 1 in the supplemental digital content, containing 4 tables and 2 figures at https://meridian.allenpress.com/aplm in the May 2025 table of contents.).22 

Finally, cases that had been either underscored (classifying HER2-low as HER2-0) or overscored (classifying HER2-0 as HER2-low) by more than 10% of pathologists were reviewed for specific tissue- or staining-related issues that could underlie respective misscoring.

Statistical Analysis

The extent to which raters assigned the same score to the consensus HER2 category (proficiency level) was measured by Cohen κ and rater agreement statistics where Cohen κ measures interrater reliability (>0.8: almost perfect/excellent; 0.61–0.8: substantial/good; 0.41–0.6: moderate; <0.41: fair, slight, and poor). Consensus intervals were determined at the 95% level accordingly.23–25  Degree of interobserver variability was assessed by overall rater agreement (ORA) and related metrics (ie, positive percentage agreement [PPA], measuring how often the presence of the corresponding HER2 category was correctly identified, and negative percentage agreement [NPA], assessing how often the absence of the corresponding HER2 category was correctly identified).26  To demonstrate the effect of training, receiver operating characteristic (ROC) curve analysis was performed using R (R Core Team 2022).27  Results are considered excellent for area under the curve (AUC) values of 0.9–1, good for 0.8–0.9, and fair for 0.7–0.8. Lower values indicate a poor (0.6–0.7) or failed test (0.5–0.6). Significance level was set to P < .05 and calculated by χ2 test. Samples considered not evaluable by a participant were excluded from calculations.

Participating Pathologists

A total of 77 pathologists agreed to participate; 49 evaluated Ventana 4B5 and 28 Dako HcT–stained cases for the first assessment. Two pathologists could not participate in the second assessment and the data plausibility check disclosed another reader with a systematic recording error, resulting in 74 pathologists with evaluable results (Figure 1). Participating pathologists represented 14 countries with highest representation from Europe (n = 41) and North America (n = 23) followed by Asia/Pacific (n = 8) and South America (n = 2) (Supplemental Table 2).

Characteristics of Study Cases

The 2 study sets eligible for proficiency analysis (n = 46 by 4B5 and n = 45 by HcT) were comparable with respect to specimen type (resections versus biopsies, 72% versus 28%), histological tumor type (83% versus 87% not otherwise specified, 17% versus 13% lobular), and tumor grade (1 versus 2 versus 3, 15% versus 70% versus 15%, respectively). Distribution between the 3 categories of HER2-positive, HER2-low, and HER2-0 was in the expected range for 4B5 assay: 13% versus 52% versus 35%.11  Since DISH data was not available at the time of case enrollment only IHC data was used for case selection. Therefore, the corresponding score distribution for HcT differs by doubling of HER2-positives and decrease of HER2-low cases: 27% versus 33% versus 40%.

Accuracy of HER2-low and HER2-0 Diagnostics and Efficacy of Training

In the HER2-low category, baseline pretraining agreement rates between expert consensus and 77 pathologists from 14 countries were >80%. After training, NPA improved from 80.6% to 91.1%. In the HER2-0 category, PPA (74.6%) was the only baseline parameter below the 80% benchmark but also improved to 89.2% by training (P < .001; Table 1). To assess the effect of training more specifically on the interpretation of HER2-low expression, ROC curve analysis and assessment of agreement rates were done. These disclosed a positive effect of training, especially for participants using 4B5. The AUC showed excellent performance values for HER2-positive diagnostics with both tests (≥0.9), and was highest with 4B5. Identification of HER2-0 and HER2-low was good (AUC 0.8–0.9) with both tests and was improved by specific instruction, especially in 4B5 users (Figure 3, A through F). Furthermore, training showed a positive effect within all 3 new HER2 categories for both assays according to the Cohen weighted κ (Supplemental Tables 3 and 4).

Figure 3.

Diagnostic ability of pathologists in classification of HER2 expression in breast cancer. Receiver operating characteristic curves for 4B5 (A through C) and HercepTest (D through F) users in the 3 HER2 categories: HER2-0 (A, D), HER2-low (B, E), and HER2-positive (C, F). Area under the curve before (pre-AUC) and following short-term training. Values ≥0.9 are considered excellent, between 0.8 and <0.9 are considered good. Abbreviations: AUC, area under the curve; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.

a(IHC 0)=0; (IHC 1+, IHC 2+ [ISH+/−], IHC 3+) = 1.

b(IHC 0; IHC 2+ [ISH+], IHC 3+) = 0; (IHC 1+, IHC 2+ [ISH−]) = 1.

c(IHC 0; IHC 1+, IHC 2+ [ISH−]) = 0; (IHC 2+ [ISH+], IHC 3+) = 1.

Figure 3.

Diagnostic ability of pathologists in classification of HER2 expression in breast cancer. Receiver operating characteristic curves for 4B5 (A through C) and HercepTest (D through F) users in the 3 HER2 categories: HER2-0 (A, D), HER2-low (B, E), and HER2-positive (C, F). Area under the curve before (pre-AUC) and following short-term training. Values ≥0.9 are considered excellent, between 0.8 and <0.9 are considered good. Abbreviations: AUC, area under the curve; HcT, HercepTest; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry; ISH, in situ hybridization; 4B5, Ventana PATHWAY 4B5 test.

a(IHC 0)=0; (IHC 1+, IHC 2+ [ISH+/−], IHC 3+) = 1.

b(IHC 0; IHC 2+ [ISH+], IHC 3+) = 0; (IHC 1+, IHC 2+ [ISH−]) = 1.

c(IHC 0; IHC 1+, IHC 2+ [ISH−]) = 0; (IHC 2+ [ISH+], IHC 3+) = 1.

Close modal

Diagnostic Accuracy in the Whole HER2 Spectrum With Different Classifications

With respect to the historical FDA classification scheme comparing IHC 3+/2+ and IHC 0/1+ cases, overall posttraining rater agreement of 92.2% and 89.8% was achieved for both 4B5 and HcT, respectively (Supplemental Figures 1 and 2). Consensus was highest for ASCO/CAP classification into HER2-negative and HER2-positive (IHC3+ and IHC2+/amplified), both by κ statistics (almost perfect, >0.80) and ORA (92%–99%) (Supplemental Figure 2). Introduction of the new scoring category, HER2-low, caused a decrease of overall concordance to ORA of ∼82%, whereas κ values still indicated a substantial interrater agreement (κ >0.74) (Supplemental Figures 1 and 2). No significant differences in HER2 scoring were observed regarding the 14 countries and regions (data not shown).

Challenges of HER2-Low Scoring

The 4B5-stained collective turned out to be most informative as it represented the expected range of about 50%–60% HER2-low cases. Underscoring (classifying HER2-low as HER2-0) was more frequently observed as compared to overscoring (classifying HER2-0 as HER2-low) with 12 versus 9 cases (first assessment), respectively. The latter could significantly be reduced to 5 cases by the current training method. Review of respective cases disclosed staining and tissue issues that were almost specific for either underscored or overscored tumor samples (Table 2). Most instances of underscoring (6 of 13 cases) occurred in weakly stained cases showing single-cell spread, such as of lobular type rendering overestimation of cell number (as only half of tumor area is covered tumor cells). In addition, weakly stained single tumor cells can easily be missed, as high magnification is needed for them to be detected (Figure 4, A). Another 3 cases showed patchy membrane staining which was erroneously excluded from scoring (Figure 4, B). All instances of overscoring occurred in cases bordering the 10% cutoff, such as by counting in nonspecific basal membrane–like or cytoplasmic staining (5 of 9 cases) (Figure 4, C). Another 2 cases showed distinct cell borders that could be mistaken as faint membrane stains (Figure 4, D). In 1 overscored and 1 underscored tumor, pronounced shrinkage artefacts were observed; in such cases steering committee members recommended to consider the slide not evaluable and to restain or select a different tumor block if available.

Figure 4.

Example figures documenting main scoring issues of underscoring (A and B) and overscoring (C and D). Tumor with lobular growth pattern (A) and only barely visible HER2-stained cell membranes at low magnification. High-power magnification (A, inset) discloses specific HER2 tumor cell membrane staining. Such cases with single/few tumor cell spread were frequently underscored as IHC 0 instead of IHC 1+. Solid-growing tumor (B) with incomplete patchy membrane staining of moderate intensity, IHC 1+. High-power field (B, inset) shows dotted and incomplete stain precisely tracing the tumor cell membranes. This staining pattern should not be excluded from scoring (in contrast to C). Carcinoma with partly tubular growth (C) exhibiting irregular staining at the outer boundary of the tumor cell nests. Higher-power magnification (C, inset) discloses irregular stains spilling over into stroma, some associated with shrinkage clefts or being nonspecific submembranous cytoplasmic stains. This type of basal membrane–like staining should be excluded from scoring thus leading to IHC 0 instead of IHC 1+. Solid growing carcinoma (D) with weak to barely visible HER2 membrane staining and distinct cell borders, the latter mistakenly considered specific though IHC 0. High-power field (D, inset) shows weak but specific HER2 tumor cell membrane staining that has to be distinguished from preexisting unstained cell contours, for instance, by comparison with isotype IgG negative controls (4B5, original magnifications ×10 [A], ×20 [B, C, D], ×40 [A through D insets]). Abbreviations: HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.

Figure 4.

Example figures documenting main scoring issues of underscoring (A and B) and overscoring (C and D). Tumor with lobular growth pattern (A) and only barely visible HER2-stained cell membranes at low magnification. High-power magnification (A, inset) discloses specific HER2 tumor cell membrane staining. Such cases with single/few tumor cell spread were frequently underscored as IHC 0 instead of IHC 1+. Solid-growing tumor (B) with incomplete patchy membrane staining of moderate intensity, IHC 1+. High-power field (B, inset) shows dotted and incomplete stain precisely tracing the tumor cell membranes. This staining pattern should not be excluded from scoring (in contrast to C). Carcinoma with partly tubular growth (C) exhibiting irregular staining at the outer boundary of the tumor cell nests. Higher-power magnification (C, inset) discloses irregular stains spilling over into stroma, some associated with shrinkage clefts or being nonspecific submembranous cytoplasmic stains. This type of basal membrane–like staining should be excluded from scoring thus leading to IHC 0 instead of IHC 1+. Solid growing carcinoma (D) with weak to barely visible HER2 membrane staining and distinct cell borders, the latter mistakenly considered specific though IHC 0. High-power field (D, inset) shows weak but specific HER2 tumor cell membrane staining that has to be distinguished from preexisting unstained cell contours, for instance, by comparison with isotype IgG negative controls (4B5, original magnifications ×10 [A], ×20 [B, C, D], ×40 [A through D insets]). Abbreviations: HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry.

Close modal

This study is the first of its kind to systematically analyze the accuracy of HER2 diagnosis in breast cancer tissue by a large number of pathologists (n = 74) from 14 countries covering academic institutions (n = 35), hospital pathology labs (n = 27), and private practices (n = 12). Agreement rates, posttraining, for the HER2-low category between expert consensus and the real-world pathologists were 82.9% (4B5) and 84.8% (HcT), which are between the acceptable ORA percentage of ≥80% for test evaluation and an ideal ORA percentage of ≥90%.28–29  The agreement rates of HER2-low are lower than the currently recommended 90% for HER2-positive17  but higher than those when HER2-positive was introduced decades ago and reported in the observations of the German Breast Group studies (GeparTrio to GeparSepto), where discordance rate between local and central testing decreased from 52% to 8.4% during a period of 12 years (2005–2017).30  Although there is currently no global consensus available regarding the optimal agreement rate for HER2-low, it is the steering committee’s consensus that the 80% agreement should be practically acceptable at this early stage of clinical actionability for 1+ and 2+/ISH−. For example, Clinical Laboratory Improvement Amendments of 1988 passing requirements in proficiency testing are set at 80% in general31  and CAP also uses this criterion, for instance, by selecting HER2 test cases only if reference laboratories achieve at least 80% consensus for individual tissue cores.29  As noted earlier, NPA was improved from 80.6% to 91.1% by a 4-hour training program. PPA (74.6%) in IHC-0 category was the only parameter below the 80% benchmark at baseline but was also improved to almost 90% (89.2%) after 4 hours of training. Thus, training significantly improved pathologists’ proficiency in identifying breast carcinomas that do not belong to HER2-low category (true negatives) and are instead HER2-0.

We also analyzed the agreement rates in current ASCO/CAP and historical FDA classifications (see Figure 2). The pathologists showed high level of accuracy in the diagnosis of the ASCO/CAP categories of HER2-negative versus HER2-positive of up to 98% by 4B5 and 92% by HcT, irrespective of training. Similarly, Cohen κ concordance values were above 0.8.

Within the ASCO/CAP and 3-tier HER2-classifications, overall agreement rates show marginal nonsignificant numerical increases after training. This may reflect the fact that the accuracy of HER2-positive (2+/ISH+ and 3+) is at an almost perfect level and may have masked the improvement of HER2-low and HER2-0. The missing training effect on HcT is most likely related to the 2-fold–high prevalence of HER2-positives in this sample set.

These agreement rates are overall higher than those described previously,16,32  where participants were not informed about the intention of the study, did not receive specific training modalities, and were told no details about the assays used. Another recent study33  on 105 HER2-negative breast cancer biopsies (stained with 4B5) demonstrated that consensus between 16 pathologists was higher if only HER2-null cases (without any staining) were grouped against cases with any staining (>no staining<1+, IHC 1+, and IHC 2+/nonamplified). The most relevant factor underlying interobserver disagreement in that study was poor reproducibility for cases bordering the 10% cutoff. Therefore, authors recommended using the “all or nothing” principle originally mentioned in the ASCO/CAP 2007 guideline.28  In our series we also found that borderline status was the most significant cause of reader disagreement, accounting for ∼60% of all discordances. Regrouping alone did not improve agreement rates (data not shown), though we did identify specific staining issues that contributed to overscoring or underscoring (Figure 4). Some of these, such as heterogeneity and cytoplasmic staining, have been described as important reasons of discordance among 16 expert pathologists from the United Kingdom and Ireland as well.34 

In contrast to other recent studies,33–36  this is the first to provide globally representative data about the level of ambiguity in HER2-low testing in the real world. It also shows that systematic training, making use of the magnification rule20,21  for reproducible intensity scoring and including demonstration of specific challenging staining patterns, is a valuable tool to improve HER2-low scoring. We do expect an even higher effect if future training is focused on addressing the main sources of discordance (Table 2).

Future training should also take into consideration published HER2 scoring guidelines, of which the first have recently been edited in France37  followed by Germany38  and the United Kingdom.39  Besides scoring issues, the control of preanalytical factors such as quality of fixation and the staining procedure itself is critical for accurate patient identification, and the implementation of on-slide controls that reflect the various staining intensities can further aid in staining quality control.40–41 

Due to the approval of T-DXd in HER2-low breast cancer, international guidelines have recently been updated (June 2023)17,18  to reflect this new treatment category. The ASCO-CAP guideline update does not endorse creating a new HER2-low result category for pathology reporting but acknowledges the need for pathologists to include a comment describing patient eligibility for therapy in HER2-low cases and the adoption of best laboratory practices for patient identification. Some of these best practices include the use of high magnification (×40) to interpret slides, secondary pathology review for borderline cases, and controls with a range of protein expression to distinguish between the 0 and 1+ cutoff.17  A European Society for Medical Oncology consensus statement addressing HER2-low breast cancer provided a recommendation for pathology participation in training and education programs to better report HER2-low.18  It also emphasized the need for pathologists to use a properly validated assay for the detection of HER2-low.18  Both guidelines also highlighted the increased importance of preanalytical variables in this new category.

To achieve more reproducible HER2-low scoring, one may consider more refined technologies such as digital image analysis,42,43  for which CAP guidelines are already available.44  However, before recommending new assays and technologies for HER2-low diagnostics,16  one should note that current clinical trial data about efficacy of antibody-drug conjugates in HER2-low breast cancer have been obtained by using a validated and widely used assay (4B5) with well-established ASCO/CAP scoring guidelines.4,6 

The limitations of this study include the following: (1) digital images were used for scoring, which is not a standard clinical practice; (2) no clinical trial cases were used; and (3) consensus scoring was determined by expert pathologists, but not the pathologists of the central testing laboratory used to read for the DESTINY-Breast04 trial.

In conclusion, the global level of diagnostic accuracy for diagnosing breast cancer with low levels of HER2 expression is within an acceptable range but is still not optimal. Short-term training could significantly improve the accuracy of HER2 testing, and sources for misscorings could be delineated being specific for either underscoring or overscoring.

Special thanks to Judy Yu, PhD, a former AstraZeneca employee, who provided expertise and technical insights to support the study. Amy Hanlon Newell, PhD, Andrew Livingston, BS, and Victoria de Giorgio-Miller, MD, provided insightful comments. Sharon Allen oversaw the project management. Ouzna Morsli, MD, provided guidance for the study. Under the guidance of authors, assistance in editorial support was provided by ApotheCom and was funded by Daiichi Sankyo.

1.
Martínez-Sáez
O
,
Prat
A.
Current and future management of HER2-positive metastatic breast cancer
.
JCO Oncol Pract
.
2021
;
17
(
10
):
594
604
.
2.
Jørgensen
JT
,
Winther
H
,
Askaa
J
,
Andresen
L
,
Olsen
D
,
Mollerup
J.
A companion diagnostic with significant clinical impact in treatment of breast and gastric cancer
.
Front Oncol
.
2021
;
11
:
676939
.
3.
NordiQC
.
HER2 IHC: assessment run B36 2023
. https://www.nordiqc.org/downloads/assessments/175_11.pdf. Accessed April 3, 2024.
4.
Wolff
AC
,
Hammond
MEH
,
Allison
KH
, et al.
Human epidermal growth factor receptor 2 testing in breast cancer: ASCO/CAP clinical practice guideline focused update
.
Arch Pathol Lab Med
.
2018
;
142
(
11
):
1364
1382
.
5.
Modi
S
,
Jacot
W
,
Yamashita
T
, et al.
Trastuzumab deruxtecan in previously treated HER2-low advanced breast cancer
.
N Engl J Med
.
2022
;
387
(
1
):
9
20
.
6.
Cortes
J
,
Kim
SB
,
Chung
WP
, et al.
Trastuzumab deruxtecan versus trastuzumab emtansine for breast cancer
.
N Engl J Med
.
2022
;
386
(
12
):
1143
1154
.
7.
US Food and Drug Administration
.
FDA approves fam-trastuzumab deruxtecan-nxki for HER2-low breast cancer
.https://www.fda.gov/drugs/resources-information-approved-drugs/fda-approves-fam-trastuzumab-deruxtecan-nxki-her2-low-breast-cancer.
Published August 5
,
2022
. Accessed March 8, 2023.
8.
European Medicines Agency. Enhertu trastuzumab deruxtecan
. https://www.ema.europa.eu/en/medicines/human/EPAR/enhertu.
Published December 15
,
2022
. Accessed June 5, 2024.
9.
Lal
P
,
Salazar
PA
,
Hudis
CA
,
Ladanyi
M
,
Chen
B.
HER-2 testing in breast cancer using immunohistochemical analysis and fluorescence in situ hybridization: a single-institution experience of 2,279 cases and comparison of dual-color and single-color scoring
.
Am J Clin Pathol
.
2004
;
121
:
631
636
.
10.
Giuliani
S
,
Ciniselli
CM
,
Leonardi
E
, et al.
In a cohort of breast cancer screened patients the proportion of HER2 positive cases is lower than that earlier reported and pathological characteristics differ between HER2 3+ and HER2 2+/Her2 amplified cases
.
Virchows Archiv
.
2016
;
469
:
45
50
.
11.
Tarantino
P
,
Hamilton
E
,
Tolaney
S
, et al.
HER2-low breast cancer: pathological and clinical landscape
.
J Clin Oncol
.
2020
;
38
(
17
):
1951
1962
.
12.
Terrenato
I
,
Pennacchia
I
,
Buglioni
S
,
Mottolese
M
,
Arena
V.
HER2 status determination: analyzing the problems to find the solutions
.
Medicine (Baltimore)
.
2015
;
94
(
15
):
e645
.
13.
Casterá
C
,
Bernet
L.
HER2 immunohistochemistry inter-observer reproducibility in 205 cases of invasive breast carcinoma additionally tested by ISH
.
Ann Diagn Pathol
.
2020
;
45
:
151451
.
14.
Lacroix-Triki
M
,
Mathoulin-Pelissier
S
,
Ghnassia
JP
, et al.
High inter-observer agreement in immunohistochemical evaluation of HER-2/neu expression in breast cancer: a multicentre GEFPICS study
.
Eur J Cancer
.
2006
;
42
(
17
):
2946
2953
.
15.
College of American Pathologists
. 2019 CAP survey of HER2 testing. In:
Surveys and Anatomic Pathology Education Programs— Immunohistochemistry Tissue Microarray HER2-B 2019 Participant Survey
.
Northfield, IL
:
College of American Pathologists
;
2019
.
16.
Fernandez
AI
,
Liu
M
,
Bellizzi
A
, et al.
Examination of low HER2 protein expression in breast cancer tissue
.
JAMA Oncol
.
2022
;
8
(
4
):
1
4
.
17.
Wolff
AC
,
Somerfield
MR
,
Dowsett
M
, et al.
Human epidermal growth factor receptor 2 testing in breast cancer: ASCO-College of American Pathologists Guideline Update
.
Arch Pathol Lab Med
.
2023
;
147
(
9
):
993
1000
.
18.
Tarantino
P
,
Viale
G
,
Press
MF
, et al.
ESMO expert consensus statements (ECS) on the definition, diagnosis, and management of HER2-low breast cancer
.
Ann Oncol
.
2023
;
34
(
8
):
645
659
.
19.
Scott
M
,
Vandenberghe
ME
,
Scorer
P
,
Boothman
AM
,
Barker
C.
Prevalence of HER2 low in breast cancer subtypes using the VENTANA anti-HER2/neu (4B5) assay
.
J Clin Oncol
.
2021
;
39
(
15_suppl
):
abstract
1021
.
20.
Rüschoff
J
,
Hanna
W
,
Bilous
M
, et al.
HER2 testing in gastric cancer: a practical approach
.
Mod Pathol
.
2012
;
25
(
5
):
637
650
.
21.
Scheel
AH
,
Penault-Llorca
F
,
Hanna
W
, et al.
Physical basis of the ‘magnification rule’ for standardized immunohistochemical scoring of HER2 in breast and gastric cancer
.
Diagn Pathol
.
2018
;
13
(
1
):
19
.
22.
Jasani
B
,
Bänfer
G
,
Fish
R
, et al.
Evaluation of an online training tool for scoring programmed cell death ligand-1 (PD-L1) diagnostic tests for lung cancer
.
Diagn Pathol
.
2020
;
15
(
1
):
37
.
23.
Freelon
D.
ReCal OIR: ordinal, interval, and ratio intercoder reliability as a web service
.
Int J Internet Sci
.
2013
;
8
(
1
):
10
16
.
24.
Landis
JR
,
Koch
GG.
The measurement of observer agreement for categorical data
.
Biometrics
.
1977
;
33
(
1
):
159
174
.
25.
McHugh
ML.
Interrater reliability: the kappa statistic
.
Biochem Med (Zagreb)
.
2012
;
22
(
3
):
276
282
.
26.
Marchevskya
AM
,
Waltsa
AE
,
Lissenberg-Witteb
BI
,
Thunnissen
E.
Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability
.
Ann Diagn Pathol
.
2020
;
47
:
151561
.
27.
R Core Team
.
R: a language and environment for statistical computing
.
R Foundation for Statistical Computing
,
Vienna, Austria
.
Version 4.2.1
. https://www.r-project.org/ and https://cran.r-project.org/mirrors.html. Accessed July 17, 2024.
28.
Wolff
AC
,
Hammond
ME
,
Schwartz
JN
, et al.
American Society of Clinical Oncology/College of American Pathologists. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer
.
Arch Pathol Lab Med
.
2007
;
131
(
1
):
18
43
.
29.
College of American Pathologists
. 2021 CAP Survey of HER2 testing. In:
Surveys and Anatomic Pathology Education Programs – Immunohistochemistry Tissue Microarray HER2-B 2021 Participant Survey
.
Northfield, IL
:
College of American Pathologists
;
2021
:1.
30.
Pfitzner
BM
,
Lederer
B
,
Lindner
J
, et al.
Clinical relevance and concordance of HER2 status in local and central testing-an analysis of 1581 HER2-positive breast carcinomas over 12 years
.
Mod Pathol
.
2018
;
31
(
4
):
607
615
.
31.
Howerton
D
,
Krolak
JM
,
Manasterski
A
,
Handsfield
JH.
Proficiency testing performance in US laboratories: results reported to the Centers for Medicare & Medicaid Services, 1994 through 2006
.
Arch Pathol Lab Med
.
2010
;
134
(
5
):
751
758
.
32.
Robbins
CJ
,
Fernandez
AI
,
Han
G
, et al.
Multi-institutional assessment of pathologist scoring HER2 immunohistochemistry
.
Mod Pathol
.
2023
;
36
(
1
):
100032
.
33.
Baez-Navarro
X
,
van Bockstal
MR
,
Nawawi
D
, et al.
Interobserver variation in the assessment of immunohistochemistry expression levels in HER2-negative breast cancer: can we improve the identification of low levels of HER2 Expression by Adjusting the Criteria? An international interobserver study
.
Mod Pathol
.
2023
;
36
(
1
):
100009
.
34.
Zaakouk
M
,
Quinn
C
,
Provenzano
E
, et al.
;
Concordance of HER2-low scoring in breast carcinoma among expert pathologists in the United Kingdom and the Republic of Ireland—on behalf of the UK national coordinating committee for breast pathology
.
Breast
.
2023
;
70
:
82
91
.
35.
Turashvili
G
,
Gao
Y
,
Ai
DA
, et al.
Low interobserver agreement among subspecialised breast pathologists in evaluating HER2-low breast cancer
.
J Clin Pathol
.
Published online September 15
,
2023
.
36.
Karakas
C
,
Tyburski
H
,
Turner
BM
, et al.
Interobserver and interantibody reproducibility of HER2 immunohistochemical scoring in an enriched HER2-low-expressing breast cancer cohort
.
Am J Clin Pathol
.
2023
;
159
(
5
):
484
491
.
37.
Franchet
C
,
Djerroudi
L
,
Maran-Gonzalez
A
, et al.
Pour le GEFPICS. Mise à jour 2021 des recommandations du GEFPICS pour l‘évaluation du statut HER2 dans les cancers infiltrants du sein en France [2021 update of the GEFPICS’ recommendations for HER2 status assessment in invasive breast cancer in France]
.
Ann Pathol
.
2021
;
41
(
6
):
507
520
.
38.
Denkert
C
,
Lebeau
A
,
Schildhaus
HU
,
Jackisch
C
,
Rüschoff
J.
New treatment options for metastatic HER2-low breast cancer: consequences for histopathological diagnosis
.
Pathologie (Heidelb
).
2023
;
44
(
Suppl 2
):
53
60
.
39.
Rakha
EA
,
Tan
PH
,
Quinn
C
, et al.
UK recommendations for HER2 assessment in breast cancer: an update
.
J Clin Pathol
.
2023
;
76
(
4
):
217
227
.
40.
Bauer
DR
,
Otter
M
,
Chafin
DR.
A new paradigm for tissue diagnostics: tools and techniques to standardize tissue collection, transport, and fixation
.
Curr Pathobiol Rep
.
2018
;
6
(
2
):
135
143
.
41.
Vani
K
,
Sompuram
SR
,
Fitzgibbons
P
,
Bogen
SA.
National HER2 proficiency test results using standardized quantitative controls: characterization of laboratory failures
.
Arch Pathol Lab Med
.
2008
;
132
(
2
):
211
216
.
42.
Qaiser
T
,
Mukherjee
A
,
Reddy Pb
C
, et al.
HER2 challenge contest: a detailed assessment of automated HER2 scoring algorithms in whole slide images of breast cancer tissues
.
Histopathology
.
2018
;
72
(
2
)
227
238
.
43.
Lara
H
,
Li
Z
,
Abels
E
, et al.
Quantitative image analysis for tissue biomarker use: a white paper from the Digital Pathology Association
.
Appl Immunohistochem Mol Morphol
.
2021
;
29
(
7
):
479
493
.
44.
Bui
MM
,
Riben
MW
,
Allison
KH
, et al.
Quantitative image analysis of human epidermal growth factor receptor 2 immunohistochemistry for breast cancer: guideline from the College of American Pathologists
.
Arch Pathol Lab Med
.
2019
;
143
(
10
):
1180
1195
.

Author notes

Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the May 2025 table of contents. See Supplemental Table 2 for the list of HER2-low study group members.

Rüschoff and Penner contributed equally.

In March 2019, AstraZeneca entered into a global development and commercialization collaboration agreement with Daiichi Sankyo for trastuzumab deruxtecan (T-DXd; DS-8201). This study was sponsored by Daiichi Sankyo, in collaboration with AstraZeneca.

Competing Interests

Desai and Moh are full-time employees at Daiichi Sankyo Inc; Desai and Moh confirm stock ownership in Daiichi Sankyo Inc. Penault-Llorca has received personal funds for consultation and advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Eli Lilly, Novartis, Seagen, and Pfizer. Lebeau has received speaker honoraria and/or personal funds for an advisory role from AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Myriad Genetics, Novartis, Roche, Menarini Stemline, and Veracyte Inc.; writer engagement from Qualitätssicherungs-Initiative Pathologie (QuIP); and is a steering committee member of Diaceutics and Daiichi Sankyo Inc. D’Arrigo is the founder of Poundbury Cancer Institute and has received personal funds for consultation and advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, and Pfizer. Viale has received personal funds for consultation and an advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Eli Lilly, Agilent, and Pfizer. Rüschoff is cofounder of Targos Molecular Pathology GmbH, now part of Discovery Life Sciences, to which speaker honoraria and personal funds for an advisor role from Astellas, AstraZeneca, Bristol Myers Squibb, Daiichi Sankyo Inc, GSK plc, Merck Sharp & Dohme, Merck KGaA, and Qualitätssicherungs-Initiative Pathologie are reimbursed. Rojo has received personal funds for an advisor role by Roche, AstraZeneca, Daiichi Sankyo Inc, Merck Sharp & Dohme, Bristol Myers Squibb, Pfizer, Novartis, Amgen, Merck KGaA, and Sophia Genetics; and received travel funds from Roche. The other authors have no relevant financial interest in the products or companies described in this article.

Parts of the study were presented as a poster at the San Antonio Breast Cancer Symposium (SABCS) 2022 (HER2-13) from December 6th to December 10th, 2022; San Antonio, Texas.

Supplementary data