In 2014, the College of American Pathologists developed an evidence-based guideline to address analytic validation of immunohistochemical assays. Fourteen recommendations were offered. Per the National Academy of Medicine standards for developing trustworthy guidelines, guidelines should be updated when new evidence suggests modifications.
To assess evidence published since the release of the original guideline and develop updated evidence-based recommendations.
The College of American Pathologists convened an expert panel to perform a systematic review of the literature and update the original guideline recommendations using the Grading of Recommendations Assessment, Development and Evaluation approach.
Two strong recommendations, 1 conditional recommendation, and 12 good practice statements are offered in this updated guideline. They address analytic validation or verification of predictive and nonpredictive assays, and recommended revalidation procedures following changes in assay conditions.
While many of the original guideline statements remain similar, new recommendations address analytic validation of assays with distinct scoring systems, such as programmed death receptor-1 and analytic verification of US Food and Drug Administration approved/cleared assays; more specific guidance is offered for validating immunohistochemistry performed on cytology specimens.
In 2014, the Pathology and Laboratory Quality Center for Evidence-Based Guidelines (“The Center”) of the College of American Pathologists (CAP) published a clinical practice guideline that created recommendations for analytic validation of clinical immunohistochemistry (IHC) assays.1 Data gathered after publication of this guideline showed that a significant number of laboratories incorporated recommendations from this guideline into their laboratory practice.2
The Center updates guidelines at least every 5 years if substantive additional evidence merits revision. As such, The Center convened a guideline update committee in late 2018 whose charge was to perform a systematic review of the medical literature and update the 2014 guideline using the rigorous standards created by the National Academy of Medicine.3
The landscape of clinical IHC has changed substantially since the original guideline was published in 2014. The most notable changes include the profusion of new predictive markers and the advent of companion and complementary diagnostics. These tests add substantial complexity to the work of IHC laboratories, their medical directors, and the interpreting pathologist. Analytic validation of predictive marker assays, best exemplified by programmed death receptor-1 (PD-L1), is an extremely difficult task due to multiple clones, scoring systems that apply to particular tumor types, and the lack of a standardized comparator for validation. In addition, the advent of digital pathology with associated computerized quantitative techniques is poised to assist in the standardization of IHC interpretation. The expert panel believes that these digital techniques will revolutionize the field of clinical IHC. However, the use of these methods has yet to enter routine clinical use. As such, the appropriate validation of these new digital readout techniques will not be covered in this document.
Analytic validation of clinical IHC assays is required by the Clinical Laboratory Improvement Amendments of 19884 ; these rules require that the performance characteristics of any assay are verified and documented before the assays are placed into clinical service. Many terms that are used in this guideline have variable definitions; as such, we have created a glossary that unifies the meanings of these terms as they apply to this document (see Supplemental Table 1 in the supplemental digital content at https://meridian.allenpress.com/aplm in the June 2024 table of contents, or www.cap.org).
Additional goals of these revised recommendations are to harmonize previously variable recommendations for analytic validation or verification of predictive markers, including human epidermal growth receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) IHC performed on breast carcinoma; to create validation recommendations for companion and complementary IHC assays with distinct scoring systems based on tumor type (eg, PD-L1); and to reevaluate the validation requirements for non–formalin-fixed tissues, including cytology specimens. These modifications are based on the systematic review of the medical literature.
METHODS
This evidence-based guideline was developed following the standards endorsed by the National Academy of Medicine.3 A detailed description of the panel composition, conflicts of interest policy, and systematic review methods used to create this guideline can be found in the online Evidence-based Guidelines Development Methodology Manual (Methodology Manual) available on www.cap.org. Detailed information about data and specific outcomes of interest can be found in the supplemental digital content. A detailed protocol for this project was developed and registered on PROSPERO, an international prospective register of systematic reviews (protocol number CRD42020153268).5
Guideline Panel
The CAP convened a multidisciplinary expert and advisory panel to develop the guideline and approved the appointment of the members. The roles of each panel are described in the Methodology Manual. Detailed information about the panel composition can be found in the supplemental digital content.
Conflicts of Interest
In accordance with the CAP conflicts of interest policy, members of the expert panel disclosed all financial interests from 24 months prior to appointment through the development of the guideline, as well as any future relationships planned in the 12 months postpublication. Complete disclosures of the expert panel members are listed in the Appendix. A detailed description of the policy is included in the Methodology Manual.
Most of the expert panel (9 of 12 members) was assessed as having no relevant conflicts of interest. Disclosures of interest judged by the oversight group to be manageable conflicts are as follows: R.F., consultancies with Leica Biosystems (Wetzlar, Germany), Sakura Finetek USA, Inc. (Torrance, California), ownership/partnership of Array Science (Sausalito, California); P.L., consultancy with Sakura Finetek USA, Inc. (Torrance, California); and T.H., consultancies with Leica Biosystems (Wetzlar, Germany), Biocare Medical (Pacheco, California), Agilent Dako (Santa Clara, California).
The CAP provided funding for the administration of the project; no industry funds were used in the development of the guideline. All panel members volunteered their time and were not compensated for their involvement, except for the contracted methodologist.
Guideline Objectives
The purpose of this update was to assess evidence published since the release of the original guideline to provide recommendations on how to analytically validate or verify immunohistochemical assays used for diagnostic and predictive purposes. The panel addressed the overarching question: What is needed for initial analytic assay validation or verification before placing any IHC test into clinical service? To answer this, 5 more pointed key questions were developed and outcomes of interest were identified (see Supplemental Table 2).
For the initial validation of an assay used clinically, what is the minimum overall analytic accuracy?
What is the minimum number of positive and negative cases that need to be tested to analytically validate immunohistochemical nonpredictive marker assays, US Food and Drug Administration (FDA) approved or cleared predictive marker assays (including companion and complementary diagnostics), and laboratory-developed predictive marker assays, for their intended use?
What parameters should be specified for the tissues used in the validation set?
How do decalcification and nonformalin fixation methods (including those utilized on cytology specimens) influence analytic validation?
What conditions require assay revalidation?
The intended audience includes laboratory directors, pathologists, histology and cytology technicians and technologists, and medical professionals involved in laboratory quality. As noted above, these recommendations apply only to pathologist-interpreted IHC assays. Validation of image analysis–assisted IHC interpretation is not within the scope of this document.
Literature Search and Collection
A comprehensive literature search for relevant evidence was completed by a medical librarian. The search strategy was first constructed in Ovid MEDLINE (Wolters Kluwer Health, Philadelphia, Pennsylvania) using controlled vocabulary and keywords to reflect the population, intervention, comparison, and outcome elements, then translated into Embase (Elsevier, Amsterdam, Netherlands) and Cochrane Library (John Wiley & Sons, Inc., Hoboken, New Jersey). Major concepts derived from the key questions included (1) IHC, (2) preanalytic factors, and (3) validation. Limits were set to reflect the protocol inclusion/exclusion criteria and included the publication date range of January 1, 2013, through the date each search was run, and language limits to capture only full-text articles available in English due to time and financial constraints. The Cochrane search filter for humans6 was applied in Ovid MEDLINE and Embase, and letters, editorials, commentaries, and case studies were excluded. The database literature searches were initially run on April 9, 2019, and rerun on August 4, 2021, and again on October 24, 2022, to capture studies published since the initial searches were run. Supplemental searches to complement the database references were completed, and expert panel members were polled for relevant unpublished data at the onset of the project. More information, including specific search strategies, supplemental search sources used, dates of search activity, and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram7 that details the systematic review process is available in the supplemental digital content (see Supplemental Figures 1 and 2).
Inclusion and Exclusion Criteria
Studies were selected for inclusion in the systematic review of evidence if they met the following criteria: (1) English-language articles or documents that addressed IHC assays and provided data or information relevant to one or more key questions; (2) study designs that included validation, method comparison, cohort or case-control studies, clinical trials, and systematic reviews, as well as qualitative information from consensus guidelines, regulatory documents, and US or international proficiency testing reports; and (3) articles and documents focused on the clinical use of IHC for identification of predictive and nonpredictive markers and analytic variables.
Articles were excluded from the systematic review if (1) they were published prior to January 1, 2012; (2) they were editorials, letters, commentaries, consensus documents, invited opinions, meeting abstracts without a full-text manuscript, narrative reviews, or case reports; (3) the full article was not available in English; (4) they were animal studies; or (5) they did not address any key question, and/or focused primarily on assay optimization, quality control or quality assurance, basic or nonhuman research, nontissue immunoassays, preanalytic and postanalytic variables, or clinical validation only.
Assessing Quality and Risk of Bias
Each included study underwent a risk of bias assessment and each recommendation was subjected to an aggregate assessment of the certainty of evidence. Refer to Supplemental Table 3 for definitions for the categories of certainty of evidence, Supplemental Tables 4 through 10 for the individual study risk of bias assessments, and Supplemental Table 11 for the aggregate certainty of evidence assessment.
Assessing the Strength of Recommendations
Development of recommendations required that the panel review the identified evidence and make a series of key judgments using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach.8,9 See Table 1 for the definitions of strength of recommendation. The panel used the GRADE Evidence to Decision framework to aid in assessment.10 Refer to the supplemental digital content for further details, including a summary of the benefits and harms of each guideline recommendation.
RESULTS
A total of 4464 studies met the search term requirements. Based on review of these abstracts, 671 articles met the inclusion criteria and qualified for full-text review. A total of 293 articles were included for additional evaluation for potential data extraction, and 137 articles were included in the evidence synthesis. Excluded articles were available as discussion or background references. Additional information about the systematic review is available in the supplemental digital content, including a Preferred Reporting Items for Systematic Reviews and Meta-Analyses chart outlining details of the systematic review.
The expert panel convened monthly via teleconference and in 1 face-to-face meeting to develop the scope, draft recommendations, review and respond to solicited feedback, and assess the certainty of evidence that supports the final recommendations. A nominal group technique was used for consensus decision making to encourage unique input with balanced participation among group members.
An open comment period was posted on the CAP website (www.cap.org) from August 4, 2021, through August 25, 2021, during which the draft recommendation statements were posted for public feedback. Refer to the supplemental digital content for more details. The expert panel approved the final recommendations with a supermajority vote. The advisory panel assisted the expert panel with reviewing the draft recommendations, guideline manuscript, and supplement. Finally, an independent review panel, masked to the expert panel and vetted through the conflicts of interest process, recommended approval by the CAP Council on Scientific Affairs.
Two strong recommendations, 1 conditional recommendation, and 12 good practice statements are offered. While recommendations hold more weight than good practice statements, the good practice statements offer additional suggestions related to the recommendations, although they lack supportive evidence from published trials. However, the benefit of the practice is inherently obvious and such studies are typically not performed or necessary. The statements are presented together for continuity and are summarized in Table 2.
Statements
1. Good Practice Statement
Laboratories must analytically validate all laboratory developed IHC assays and verify all FDA-cleared IHC assays before reporting results on patient tissues.
Note: A validation study design may include, but is not necessarily limited to, such means as the following:
Comparing the new assay’s results with the expected architectural and subcellular localization of the antigen.
Comparing the new assay’s results with the results of prior testing of the same tissues with a validated or verified assay in the same laboratory.
Comparing the new assay’s results with the results of testing the same tissues in another laboratory using a validated or verified assay.
Comparing the new assay’s results with results of a nonimmunohistochemical method.
Comparing the new assay’s results with the results from testing the same tissues in a laboratory that performed testing for a clinical trial.
Comparing the new assay’s results against percentage of positive rates documented in published clinical trials.
Comparing the new assay’s results to IHC results from cell lines that contain known amounts of protein.
Comparing previously graded tissue challenges from a formal proficiency testing program (if available) with the graded responses.
This guideline statement was previously listed as a “recommendation” and is now described as a “good practice statement” in accordance with the GRADE methodology.11 The strength of evidence was adequate to support that analytic validation should be performed and that it should include determination of concordance and precision (eg, interrun, interinstrument, and interoperator) as part of the validation. The evidence was inadequate to assess the precision of IHC assays in practice or how validation should be designed and performed in accordance with the above listed approaches but did show that these approaches have been used. Hence, the panel designated this as a good practice statement rather than a recommendation. The panel agreed that analytic validation provides a net benefit for the overall performance and safety of IHC tests by contributing to the avoidance of potential harms related to analytic false positive and false negative test results.12
Laboratories are required by the Clinical Laboratory Improvement Amendments of 1988 (section 493.1253) to validate the performance characteristics of all assays used in patient testing in order to ensure that the results are accurate and reproducible.4 This includes establishment of the analytic validity of all non–FDA-cleared or approved (ie, “laboratory-developed”) tests.4 For qualitative assays such as IHC, validation usually requires comparing a new assay’s results with a reference standard and calculating estimates of analytic sensitivity and specificity; however, because there are no gold standard reference tests for most IHC assays, laboratories must use concordance as a surrogate for analytic sensitivity and specificity.13–16
This updated guideline includes 2 additional statements regarding the use of tissues used in a clinical trial and the comparing positivity rates in a validation set to those published for a clinical trial. For further explanation of these new options, see statement 6 below.
The laboratory medical director has complete control over the design and performance of the validation or verification plan. IHC validation may include independent comparisons of a new test’s results to clinical outcomes, other validated IHC tests (intralaboratory or interlaboratory), or previously characterized tissue validation sets.14,17–25 Nonimmunohistochemical tests may include, but are not limited to, in situ hybridization (ISH), flow cytometry, and molecular, cytogenetic, or microbiologic studies. Laboratories may use a combination of comparison methods when appropriate. When correlating the new test’s results with expected results, positive and negative tissues pertinent to each intended clinical use must be included in the validation set, and thus should fulfill the concept of “fit for purpose.” Normal tissues (with 100% positive staining expected) cannot comprise the entire validation set for markers primarily used in diagnosing neoplasms but may be used in conjunction with neoplastic and lesional tissues as appropriate. In some cases, a section of tissue may contain both antigen-positive cells and negative internal control cells and may therefore serve as both a positive and negative validation challenge. The laboratory medical director must determine the most appropriate selection of tissues in the validation set, but the validation set must not consist solely of the same tissues used for antibody optimization. Although not currently available for many markers, excess tissue previously used in a proficiency testing or interlaboratory comparison program could also be used for assay validation. Tissue from previously graded proficiency-testing challenges could be tested and the results compared with the graded responses from the program.
This statement applies to all assays in clinical use, including those for pathogen-specific antigens, such as cytomegalovirus and Helicobacter pylori, irrespective of the regulatory status of the primary antibody (eg, in vitro diagnostic, analyte-specific reagent).
This statement received 182 responses during the open comment period, with 89% (162 of 182) either agreeing with the statement as written or agreeing with suggested modifications. Suggested modifications to this good practice statement largely involved clarification of terminology and were addressed by both the creation of the glossary (see Supplemental Table 1) and a discussion regarding what combination of comparative validation methods were required or suggested. As stated elsewhere, the determination of appropriate and feasible methods is the responsibility of the laboratory medical director.
2. Strong Recommendation
For initial analytic validation or verification of every assay used clinically, laboratories should achieve at least 90% overall concordance between the new assay and the comparator assay or expected results.
The certainty of evidence was moderate.
The evidence supporting this statement comprises 5 studies that evaluated the diagnostic test characteristics of laboratory-developed tests (Table 3). This included 4 nonrandomized studies26–29 and 1 consecutive series of patients.30 The studies included 1 intermediate-quality study26 and 4 low-quality studies.27–30 The aggregate risk of bias across all 5 studies was serious, but the evidence was not further downgraded for any domain.
This recommendation essentially reaffirms the 2014 CAP Principles of Analytic Validation,1 setting the threshold to “pass” a new assay for clinical implementation. This recommendation also serves to harmonize validation requirements for all predictive markers. This guideline revision applies to breast ER, PR, and HER2, also reset at 90% concordance, superseding the previously different concordance thresholds. Validation of predictive markers is further discussed in other guideline statements (statements 4–7). As a true gold standard comparator does not exist for most IHC assays, and because comparators may vary (see statement 1), the expert panel argued that “concordance” should be used to describe the similarity between the validation set and the comparator set, rather than “accuracy,” “sensitivity,” and “specificity.” The latter 3 terms are more appropriately applied when a reference standard is used.
Although the systematic review did not identify substantive new literature, the numerical considerations and modeling discussed in the initial guideline document remain relevant.1 If validation challenges yield unexpected results (discordant with comparator assay), those tissues or cases should be investigated. Calculation of positive and negative concordance rates as well as the discordance (using the McNemar test when sample size is appropriate) may be illustrative. The McNemar test assesses the significance of the difference between the discordant results (false positives and negatives) in a 2 × 2 contingency table. Refer to the initial guideline for more information.1 For example, if all discordant tissues are from the expected negative group, yielding false positive staining, the assay may need to be reoptimized for better specificity, with validation of the modified protocol. The same principle would hold if there is a bias toward discordant expected positive cases yielding negative results with the new assay, suggesting a need for reoptimization for improved sensitivity.
As instrument platforms differ in mechanics of antigen retrieval (eg, time, temperature, proprietary retrieval buffers) and detection, an assay should be considered unique to each platform. If an assay is to be performed on multiple platforms, it should be separately validated on each one, including the requisite number of tissues or cases, and 90% concordance. In laboratories with multiple “identical” instruments (same platform), a separate validation need not be performed for each instrument, provided that reproducibility has been established. Laboratories may consider applying the expected control tissue (especially multi-tissue block or tissue microarray) to several or all validation tissues or cases, some tested on different instruments.
This recommendation received 181 responses during the open comment period, with 94% (171 of 181) either agreeing with the recommendation as written or agreeing with suggested modifications. Fifteen comments were submitted, only 3 advocating numerical changes in required concordance (2 higher, and 1 lower concordance level). Given these comments, and the lack of disparate literature, the concordance threshold of 90% is maintained. Other comments about definitions, investigation of the outlier results, and separate requirements for positive and negative concordance are incorporated into the glossary and the above discussion.
3. Good Practice Statement
For initial analytic validation of nonpredictive laboratory-developed assays, laboratories should test a minimum of 10 positive and 10 negative tissues. When the laboratory medical director determines that fewer than 20 validation cases are sufficient for a specific marker (eg, rare antigen), the rationale for that decision needs to be documented.
Note: The validation set should include high and low expressors for positive cases when appropriate and should span the expected range of clinical results (expression levels) for markers that are reported using either a semiquantitative or numerical scoring system.
Analytic validation of nonpredictive laboratory-developed assays ensures the accuracy of the test results and increases the reliability of the test’s performance. The results of these tests will have a direct impact on patient care, and each laboratory should endeavor to objectively assess the performance of the test in their laboratory. Including a range of tissue expression in the validation set is necessary to confirm a numerical scoring system.
This good practice statement outlines the minimum number of tissues to be used in the validation of a nonpredictive laboratory developed assay. The use of less than the recommended number of tissues must be documented in the validation report. The use of fewer tissues increases the risk that validation results will not meet the concordance benchmark of 90% and lowers the confidence that the assay will perform as expected on patient samples. Scoring systems that are semiquantitative or numerical must be validated using tissues with a range of expression that will test the extremes of the range. Any scoring system may be imprecise at the upper or lower limits of the scoring range. These issues are less likely to be observed if only tissues in the middle of the scoring range are used in the validation set.
For assays whose result is nondichotomous (eg, Ki-67), laboratory medical directors should design a validation strategy that is based on the clinical use of the assay. For example, if the assay will be used to offer prognostic grade for well-differentiated gastrointestinal neuroendocrine tumors, the medical director could design a validation that contains approximately 7 cases each of grade 1, grade 2, and grade 3 tumors that display Ki-67 proliferation indices of less than 3%, between 3% and 20%, and greater than 20%. If, however, the assay is designed to span the entire dynamic range of Ki-67 expression, then the medical director should include a set of at least 20 tumors that display proliferation indices that range from less than 1% (eg, uterine leiomyoma) to greater than 99% (eg, Burkitt lymphoma).
This statement was not substantively changed from the original version.
We received a total of 182 responses to this statement during the open comment period. Of the comments, 93% (169 of 182) agreed with the statement with little to no modification. The majority of comments asked for clarification of specific parts of the statement. The note accompanying the statement was modified to further explain the type of scoring that may be used for some markers, and how to address the proper selection of validation tissues.
4. Good Practice Statement
For initial analytic validation of all laboratory-developed predictive marker assays, laboratories should test a minimum of 20 positive and 20 negative tissues. When the laboratory medical director determines that fewer than 40 validation tissues are sufficient for a specific marker, the rationale for that decision needs to be documented.
Note: The validation set should include high and low expressors for positive cases when appropriate and should span the expected range of clinical results (expression levels) for markers that are reported using either a semiquantitative or numerical scoring system.
The results of predictive assays are more clinically significant compared with nonpredictive markers because the results of predictive marker assays can be used to direct therapy exclusive of the histologic findings. Consequently, the laboratory director must design a validation strategy that results in increased confidence in the assay findings compared with nonpredictive markers. The increased numbers of tissues required for analytic validation of predictive markers compared with nonpredictive markers is predicated on the statistical observation that the confidence intervals from a larger validation set are narrower compared with a smaller validation set (see table 5 from original guideline1 and table A8 from Wolff et al15 ). As such, one can be more confident that the overall concordance observed in a validation set with 40 challenges is more indicative of the relationship between the new assay and the comparator compared with 20 challenges. Exceeding the number of tissue challenges stipulated above would further increase the confidence in the assay. In fact, many well-resourced laboratories test higher numbers of cases to increase the stringency of their analytic validation of predictive markers, especially when validating laboratory-developed predictive marker assays.
Because additional evidence was not found during the systematic review, this statement remains unchanged from the original guideline.
During the open comment period, 180 people commented, of whom 90% (162 of 180) agreed with the statement as written or agreed with suggested modifications. While unifying themes in the comments were not evident, approximately 5 of the responders suggested lowering the numbers of tissues required for predictive markers. However, the expert and advisory panels strongly believe that due to the clinical importance of predictive markers, the increased number of required challenges will provide the laboratory medical director added confidence that the assay result accurately reflects the presence or absence of the measured analyte.
5. Good Practice Statement
For initial analytic verification of all unmodified FDA-approved predictive marker assays, laboratories should follow the specific instructions provided by the manufacturer. If the package insert does not delineate specific instructions for assay verification, the laboratory should test a minimum of 20 positive and 20 negative tissues. When the laboratory medical director determines that fewer than 40 verification tissues are sufficient for a specific marker, the rationale for that decision needs to be documented.
Note: The validation set should include high and low expressors for positive cases when appropriate and should span the expected range of clinical results (expression levels) for markers that are reported using either a semiquantitative or numerical scoring system.
This new statement pertains to analytic verification, which is defined as the process by which a laboratory determines that an unmodified FDA-cleared or approved test performs according to the specifications set forth by the manufacturer when used as directed.
Some FDA approved or cleared products contain explicit verification instructions as part of the package insert. In the case where verification procedures are not clearly stated, the laboratory should test a minimum of 20 positive and 20 negative cases. The expert panel chose this number of cases since a vast majority of FDA cleared or approved immunohistochemical tests are classified as predictive markers. Given the clinical importance of these assays, as outlined in statement 4 above, the number of challenges should be higher compared with nonpredictive markers. It should be emphasized that FDA approved or cleared tests are designed to be utilized according to the package insert without modification. If any portion of the assay is changed, the test is then classified as a laboratory-modified or laboratory-developed test.
One hundred seventy-nine respondents evaluated this statement during the open comment period, of which 91% (162 of 179) agreed with little to no modifications. While repeated suggestions were not evident in the comments, approximately 5 respondents suggested reducing the number of tissue challenges recommended for analytic verification. As with statement 4 above, both the expert and advisory panels agree that due to the clinical impact of FDA-approved or FDA-cleared immunohistochemical tests, most of which are predictive markers, the number of recommended cases should be similar to non–FDA-approved or non–FDA-cleared tests.
6. Strong Recommendation
For initial analytic validation of laboratory-developed assays and verification of FDA-approved or FDA-cleared predictive immunohistochemical assays with distinct scoring schemes (eg, HER2, PD-L1), laboratories should separately validate or verify each assay–scoring system combination with a minimum of 20 positive and 20 negative tissues. The set should include challenges based on the intended clinical use of the assay.
The certainty of evidence was moderate.
This is a new recommendation and the evidence supporting it comprises a total of 48 analytic validity studies: 23 for companion diagnostic assays,31–53 22 for PD-L1,54–75 and 3 for HER2 (Table 4).76–78 The studies included 24 consecutive series of patients,* 14 nonrandomized studies,† 5 randomly selected series of patients,45,54,63,66,77 1 diagnostic study,37 3 observational studies,55,57,58 and 1 survey of commercially obtained tissues and cell lines.49 The total evidence was made up of 5 high-quality studies,33,37,59,63,77 23 intermediate-quality studies,‡ and 20 low-quality studies.§ The aggregate risk of bias across all 48 studies was serious, but the evidence was not further downgraded for any domain.
This recommendation pertains to predictive marker assays, such as HER2 and PD-L1, in which more than one scoring system is used. It also emphasizes the fundamental principle of IHC assay validation that the IHC readout (ie, the determination of the intensity, extent, quality, and cellular localization of immunohistochemical signal) is part of the analytic phase of testing,79 and, as such, the staining protocol and readout for both diagnostic and predictive makers should be validated as a single unit. This aspect of IHC validation embodies the concept of fit for purpose, wherein assay validation should be designed in a manner that is driven by the clinical purpose of the assay and specifically assesses all tumor or tissue types relevant to that clinical application.
This recommendation was initially intended to address validation of PD-L1 immunohistochemical assays, given that assay validation was outside of the scope of the CAP Center guideline currently in development: “PD-L1 and Tumor Mutation Burden Testing of Patients with Lung Cancer for Selection of Immune Checkpoint Inhibitor Therapies.” After analysis of the results of the systematic review, the expert panel subsequently generalized this recommendation to any predictive marker assay in which more than one scoring system may be utilized.
It is important, in this context, to understand that the IHC readout is distinct from the interpretation of the result. For example, the identification of 4 PD-L1-positive immune cells and 2 neoplastic epithelial cells in an area on a glass slide occupied by 100 head and neck squamous cell carcinoma cells (ie, IHC readout = combined positive score of 6) is interpreted as positive, while the same IHC readout in an esophageal squamous cell carcinoma would be interpreted as negative due to distinct cutoffs in these 2 tumor types.
Assay-scoring system (also known as IHC protocol or IHC readout) combinations should be validated according to the intended clinical use of the test, and the validation cohort should consist of at least 20 positive and 20 negative tissues. For example, laboratories typically only perform 1 HER2 IHC protocol, but there are multiple different scoring systems in clinical use. These include, but are not limited to, those delineated in the 2018 ASCO/CAP breast cancer guideline which was reaffirmed in the 2023 ASCO/CAP HER2 breast testing guideline update,80 the 2016 CAP/ASCP/ASCO gastroesophageal adenocarcinoma (GEA) guideline,81 and the National Cancer Center Network practice guidelines for colorectal carcinoma (CRC) based in large part on the HERACLES clinical trial assay.82–84 Since the IHC readout implicit in these scoring systems is different (ie, circumferential membrane staining that is complete, intense, and in greater than 10% of carcinoma cells defines IHC 3+ in breast cancer; strong, basolateral or lateral membranous reactivity in greater than or equal to 10% of carcinoma cells defines IHC 3+ in resections of GEA; strong basolateral or lateral [partial] membranous reactivity in greater than 50% of carcinoma cells in metastatic CRC), these scoring systems should be separately validated. Similarly, in GEA, there are separate scoring criteria for biopsies and resections and, ideally, both biopsies and resections should be included in the validation set for HER2 IHC performed on GEA. As an important caveat, clinical trials for CRC have also been based on the scoring systems employed for breast cancer80 and GEA81 ; hence, it is at the discretion of the laboratory director which assay–scoring system combination should be used in this setting.
If the laboratory director intends to perform HER2 IHC on CRC cases using a previously validated assay–scoring system combination, the laboratory director has the discretion to extend the initial validation to CRC by assessing a representative sample of CRC. If a laboratory is initially validating a new HER2 assay and intends to use the same scoring criteria in breast and colon cancers, then both cancer types should be included in the set of 20 positive and 20 negative tissues constituting the validation. It is not the intent of this recommendation that every assay–scoring system–tumor type combination be subject to the requirement of 20 positive and 20 cases for each validation. A similar approach can be applied to the myriad assay–scoring system combinations currently employed for PD-L1 predictive marker testing, so long as the validation design complies with the concept of fit for purpose.
The reader will note that amongst the various indications for PD-L1 testing, different cut-offs may be employed (for example, the use of tumor proportion score ≥1% versus ≥50%; or combined positive score ≥1 versus ≥10). It is at the laboratory director’s discretion whether these different thresholds within a scoring system require separate validations.
For traditional HER2 assay–scoring system combinations, ISH results have typically been considered a sufficient comparator in addition to the comparison of results of the new test to a previously validated test in the same laboratory or from a peer laboratory. Recently, new therapies, including immune checkpoint inhibitors and trastuzumab deruxtecan, have emerged that are linked to PD-L1 IHC and HER2 IHC 1+/2+ ISH negative (“HER2 low”). Analytic validation of PD-L1 IHC and HER2-low breast carcinoma do not have a readily available comparator assay. Thus, analytic validation of PD-L1 assays and HER2 assays’ ability to distinguish HER2 0 from HER2 1+/HER2 2+ ISH-negative is problematic. For initial validation of PD-L1 assays, the expert panel recommends comparing the new assay’s results with the results of testing the same tissues from another laboratory using a previously validated or verified assay. For the validation of HER2 IHC assay’s ability to distinguish HER2 0 from HER2 low, there is insufficient information available at this time for most laboratories to design an appropriate assay validation for this emergent clinical marker.
If direct comparison of tissues stained in another laboratory using a previously validated or verified assay is not feasible, another, less desirable method delineated in statement 1 is possible. The laboratory could assess concordance of the new assay by comparing the new assay’s results with published results from an associated clinical trial. This approach has been termed “indirect clinical validation.”
The concept of indirect clinical validation poses difficulties and should be employed with caution. The first issue is the presence of conflicting information amongst published clinical trials regarding anticipated positive rates for a given clinical indication. Indirect clinical validation also poses a significant risk of sample bias, as the rates of biomarker positivity in any given clinical trial are only point estimates and the patient population tested for the clinical trial may not reflect the local laboratorian’s patient population. Assuming that a reliable estimate of biomarker positivity can be determined, this risk of sample bias necessarily requires that assay validation using this technique would need to include a larger sample size than the 20 positive and 20 negative cases that the expert panel recommends for the validation or verification of predictive markers. That being said, the use of indirect clinical validation, properly documented by the laboratory director, might reasonably support the clinical use of a given predictive marker. The imperfection of this validation design highlights the dire need for supplemental validation materials in the form of commercially available tissues, cell lines, or other constructs of known analyte concentration.
At this time, IHC readout is largely done by pathologists. In the near term, image analysis is poised to supplement the pathologist readout. It seems obvious that readout based on image analysis algorithms must be validated, and pathologist readout should be held to the same standard.85 For biomarkers that are difficult to read, carefully optimized and validated image analysis algorithms will likely prove superior, as they are not subject to interobserver variability. As such, tests including image analysis readout may be more likely to “pass” the validation. If the use and appropriate validation of image analysis algorithms for predictive biomarkers is not feasible, the laboratory director should advocate for appropriate training and monitoring of all pathologists who will perform a readout of a given assay.
This recommendation received 179 responses during the open comment period, with 88% (157 of 179) either agreeing with the recommendation as written or agreeing with suggested modifications. Among 24 total comments, 6 asked for clarification on the number and types of cases to be included in the validation and 3 asked for clarification as to what form the validation of a scoring system should take. Three found the recommendation to be ambiguous, 3 found it impractical to implement, and 3 generally disagreed without obvious explanation as to why. Two comments affirmed the recommendation, 1 asked for consistency with use of the terms “test” and “assay,” and 1 asked whether full validations have to be performed on each instrument of a given platform. As to the latter, see the discussion of statement 1 above. Based on results from the open comment period, the recommendation was modified to explicitly state the number of expected positive and negative tissues to be included in the validation cohort and the term “clone” was updated to “assay,” as a particular antibody clone only constitutes 1 component of the assay.
7. Good Practice Statement
For laboratory-developed assays with both predictive and nonpredictive applications using the same scoring criteria, laboratories should treat these assays as predictive markers and test a minimum of 20 positive and 20 negative cases.
Note: See statement 4 for additional information.
Immunohistochemical assays may be used as both predictive and nonpredictive markers. When applied as a predictive assay, the results will determine the use of a specific and targeted therapy for the patient. Nonpredictive assays are used most commonly to aid in diagnosis of the cellular lineage, classification, and origin of neoplasms as well as identification of specific microbes and genetic alterations. A number of markers serve as both predictive and nonpredictive assays (eg, ER, CD117, CD20). Given that predictive assays will directly determine a specific patient therapy, it is considered good practice to validate assays that have both predictive and nonpredictive applications by utilizing the more stringent predictive marker validation guidelines when the same scoring criteria are used.
The systemic literature review by the expert panel did not provide sufficient evidence to qualify statement 7 as a recommendation and it will therefore remain as a good practice statement.
In concert with the concept of fit for purpose, the laboratory director may decide to perform separate validations for the antibodies that have both predictive and nonpredictive uses, allowing for the specific target ranges for each application to be better addressed. For example, ER as a predictive marker in the setting of breast carcinoma requires validation with a comparator to detect a wide range of expected results. However, this level of stringency at detecting low-expressing ER-positive breast tumors may lead to false positive ER results in nonpredictive applications. Therefore, electing to have separate validations that set the cut-off values of positive/negative tissues more appropriately may result in lower numbers of false-negative and false-positive results.
There were 179 respondents in the open comment period of which 91% (163 of 179) agreed with little to no modifications; thus, this statement did not undergo substantive changes.
8. Good Practice Statement
Laboratories should use validation tissues that have been processed using the same fixative and processing methods as cases that will be tested clinically, when possible.
While the impact on a given IHC assay may be idiosyncratic, fixation and processing methods have been reported to affect certain epitopes in a manner that may alter the reliability of the assay. Thus, it stands to reason that a good practice statement would include the recommendation that validation tissues use the same fixative and processing methods as cases that were used to optimize the assay and that will be tested clinically. In the United States, the prevalent fixative is 10% neutral buffered formalin. However, numerous other fixatives are used, such as alcohol and Bouin solution. As such, if fixatives other than neutral buffered formalin are used, it is a reasonable expectation for laboratories to test a selected panel of common markers to demonstrate that the range of fixative and processing methods yield equivalent immunohistochemical results compared with neutral buffered formalin (see CAP Laboratory Accreditation Program [LAP] checklist ANP.22750 and ANP.22978).86
Of the 175 respondents during the open comment period, nearly all (98%) agreed with the statement as is or with minimal modifications (171 of 175). Notable comments indicated that it has become increasingly difficult for laboratories to harvest control tissues “in house.” In turning to commercial suppliers, there can be reduced ability to replicate the laboratory’s fixation and processing methods. Thus, the desire to retain “when possible” in the statement was considered important. Other comments varied, including allowing laboratories to validate equivalence of fixatives with respect to assay performance and requesting that laboratories document if this is not done, and the reason why.
9. Conditional Recommendation
For analytic validation of IHC performed on cytologic specimens that are not fixed in the same manner as the tissues used for initial assay validation, laboratories should perform separate validations for every new analyte and corresponding fixation method before placing them into clinical service.
Note: Such cytologic specimens include (but are not necessarily limited to) the following:
Air-dried and/or alcohol-fixed smears.
Liquid-based cytology preparations.
Alcohol-fixed cell blocks.
Specimens collected in alcohol or alternative fixative media that are postfixed in formalin.
The certainty of evidence was moderate.
The evidence supporting this recommendation comprises 87 studies that evaluated the diagnostic test characteristics of IHC assays performed on cell blocks: 52 studies on formalin-fixed, paraffin-embedded (FFPE) tissues,44,48,87–136 Cellient cell blocks (2 studies)137,138 or other cell blocks (1 study),40 smears (11 studies),139–149 liquid-based cytology (12 studies),150–161 or a combination of cell blocks, smears, and liquid-based cytology (9 studies)162–170 using a variety of different IHC markers (Table 5). The studies included 2 systematic reviews,111,124 23 nonrandomized series of patients,** 52 consecutive series of patients,†† 3 diagnostic studies,99,150,162 3 observational studies,88,112,137 1 randomly selected series of patients,92 and 3 other study designs.89,108,140 The total evidence was made up of 2 high-quality studies,87,151 63 intermediate-quality studies,‡‡ and 22 low-quality studies.§§ The aggregate risk of bias across all 87 studies was moderate, and the evidence was not further downgraded for any domain.
Laboratories typically perform analytic validation of IHC assays using FFPE tissues prior to placing an assay into clinical service. Frequently, IHC is performed using cytologic specimens; however, these cytologic specimens may have different fixation and processing methods than the original validation substrate (ie, most commonly, FFPE histologic tissue), which can impact IHC test results (see also, statement 8). Therefore, for every new analyte that is being placed into clinical service, laboratories should perform a separate validation for cytologic specimens that are processed differently than the specimens used for the initial assay validation. While this would not apply to cytologic specimens that are fixed and processed identically to tissues used for the initial assay validation, it would apply to the vast majority of cytologic specimens that have alternative fixatives and/or processing techniques. Examples of such specimens would include (1) direct smears and cytospin preparations that are air-dried and/or alcohol-fixed; (2) liquid-based cytology preparations (eg, ThinPrep, SurePath); (3) cell block preparations that use alcohol-based fixatives (eg, Cellient); and (4) specimens collected in transport media such as saline or Rosewell Park Memorial Institute (RPMI) medium or in alcohol-based fixatives (eg, CytoLyt) that are subsequently processed in formalin to create a FFPE cell block. It is worth reiterating that preanalytic variables, such as warm and cold ischemic time, may have profound effects on assay results. As such, preanalytic variables should be standardized to the extent possible.
In the systematic literature review, there were several studies that compared concordance of IHC assays using both predictive and nonpredictive markers in cytologic cell block preparations with that of histologic tissues. While most studies do not specify the collection and transport media or the use of prefixatives prior to cell block processing*** a handful of studies describe the use of media such as RPMI,87,91,93 with or without CytoRichRed,98 CytoLyt,97 and prefixatives such as 95% ethanol94 or a combination of ethanol and formalin.89,172 Most cell block preparations in the identified studies were fixed or postfixed in formalin, but there were some that performed IHC assays on cell blocks prepared using Cellient, which uses a methanol-based fixative.106,137,138 The concordance of IHC assays performed on cell blocks compared to that of histologic tissues using a variety of different IHC markers showed varying sensitivity and specificity. One study comparing results of Ki-67 and E-cadherin IHC assays between formalin-fixed and alcohol-fixed cell blocks showed significant differences in immunostaining.105 Another study using methanol-fixed tissue reported nearly one-half of the antibodies failed initial validation using IHC conditions that were established in the laboratory for FFPE material.138 An additional study evaluating PD-L1 expression on Cellient-fixed cell blocks reported higher false negative results, whereas formalin-fixed cell blocks had more histology-concordant immunostaining, although the deleterious impact of alcohol fixation could be reversed to some extent by postfixation in formalin.106 In contrast, another study evaluating PD-L1 in cell blocks prepared from specimens that were processed using various prefixatives and fixatives (RPMI, saline, formalin, CytoLyt, Cellient) reported superior performance of Cellient cell blocks when compared to other methods that demonstrated increased background staining, with the poorest performance from specimens that were processed using CytoLyt.137 An additional study that compared p16 staining in cytologic specimens that were fixed directly in formalin versus those fixed in CytoLyt reported a higher number of false negative results in the latter with weaker staining overall.104 Similar false negative results were noted on a different study comparing p16 IHC on cytologic specimens (both cell blocks and ThinPrep) that were collected in CytoLyt.163 Similarly, several studies evaluated IHC assays of cytologic smears and compared results to those of FFPE tissues.
The performance of IHC assays on smears and the concordance varied across studies and while some studies noted high correlation for some antibodies tested,††† there were differences noted in other studies with lower correlation for other assays.143,146,166 One study noted high reliability of methanol-fixed cytospin preparations but showed false negative results when the assay was performed on ethanol-fixed Papanicolaou-stained cytospins or smears.148
A few studies compared results of IHC assays on liquid-based cytology with histologic FFPE tissues. While a handful of studies reported that some antibodies appear to perform comparably in liquid-based cytology when compared to their histologic FFPE counterparts,156,159 other studies have reported that other antibodies have lower specificity and may not provide reliable results.94,156,157,162
The widely variable results in the literature highlight the importance of separate assay validation for cytologic specimens that are not fixed identically to the tissues used for initial validation.
The previous guideline did not provide specific recommendations for validating cytologic specimens due to a lack of primary studies or systematic evidence reviews that would provide evidence for guideline recommendations. However, an expert consensus opinion was included in the original guideline acknowledging that cytologic specimens may have different fixation and processing methods, and therefore, laboratories should determine whether cytologic specimens have equivalent immunoreactivity to routinely processed, formalin-fixed tissue by performing a subset of commonly ordered markers in a set of cytologic specimen types used for IHC staining and correlating the results with that of routinely processed tissues. The prior guideline further stated that since separate validation of all markers on all potential cytologic specimens may not be feasible, laboratories could include a disclaimer in their report that results should be interpreted with caution. Based on the current systematic literature review, new evidence highlights the importance of validating IHC protocols for cytologic specimens that are processed and/or fixed in any way that deviates from the fixation and processing of tissue used for the initial assay validation. For instance, if the initial assay validation is performed on FFPE histologic tissues, then any cytologic specimen that is not collected directly into formalin for fixation and processed in a manner similar to histologic tissue would require a separate validation prior to clinical use. It is important to note that this recommendation is relevant for only new analytes that will be placed into clinical service and does not recommend retrospective validation of all antibodies that have been previously validated and currently in use on cytologic specimens.
One hundred seventy-eight respondents gave feedback to this recommendation during the open comment period, with 83% (147 of 178) either agreeing with the recommendation as written or agreeing with suggested modifications. There were 44 written comments, most of which sought clarification of the cytologic specimens that are included in this recommendation. Nine written comments asked that any cytology cell block preparation that is processed as FFPE be considered as part of the initial assay validation and not require separate validation. Eleven comments discussed the challenge of finding adequate number of cytology specimens and the additional resources that would be needed to fulfill the requirements for a full separate validation. Seven comments asked to include the option of using a disclaimer in their report that would allow reporting IHC results on cytologic specimens that have not been separately validated. This feedback was taken into consideration; Recommendation 9 was reworded with the additional clarification of the definition of cytologic specimens.
10. Good Practice Statement
A minimum of 10 positive and 10 negative cases is recommended for each validation performed on cytologic specimens, if possible. The laboratory medical director should consider increasing the number of cases if predictive markers are being validated. If the minimum of 10 positive and 10 negative cases is not feasible, the rationale for using fewer cases should be documented.
The criteria for determining the number of samples needed to validate an IHC assay depends largely on the intended use. The recommendation for histologic tissue samples in this guideline is delineated in statements 3–5. Cytologic specimens have some unique challenges, including a variety of fixation and processing methods that may potentially impact the antigenicity of some antibodies, thus requiring a separate validation for every new analyte prior to clinical use (conditional recommendation 9). However, availability of cytologic specimens for validation may be limited, especially when attempting to validate non-FFPE cytologic samples such as direct smears, cytospin, and liquid-based cytology preparations. Therefore, the panel determined that the use of 20 samples (10 positive and 10 negative) in a validation set for cytologic specimens would be appropriate in most instances. From a statistical standpoint, the more samples that are run in a validation set, the higher the likelihood that the concordance estimate will reflect the test’s true concordance, thus increasing the confidence that the assay performs as expected. For predictive markers, the assay test result is used to guide specific therapeutic intervention or predict treatment response; thus, an even higher level of confidence is required. In this situation, the laboratory director should consider increasing the number of samples in the validation set (ideally a minimum of 20 positive and 20 negative samples as outlined in statement 4). Nevertheless, the panel recognized that it may be difficult for some laboratories to obtain the recommended minimum number of positive cytologic specimens for validating rare analytes. If the laboratory medical director determines that a validation set smaller than 20 samples is sufficient, the medical director must provide and document an objective rationale for this determination.
The previous guideline did not provide specific recommendations for the number of samples required to validate cytologic specimens due to a lack of primary studies, systematic evidence reviews, or qualitative documents that provided evidence for guideline recommendations. However, an expert consensus opinion was included in the guideline stating that the laboratory medical director is responsible for determining the number of cytologic positive and negative cases and the number of predictive and nonpredictive markers to test. Based on the current systematic literature review, the panel recognized the importance of validating IHC protocols for cytologic specimens that are processed and/or fixed differently than the tissue used for the initial assay validation (typically FFPE histologic tissues) using criteria that closely mimic the recommendation for validating histologic tissues.
One hundred seventy-seven participants responded to this statement during the open comment period, with 81% (143 of 177) either agreeing with the statement as written or agreeing with suggested modifications. There were 36 written comments, mostly related to the challenge of finding adequate number of cytology specimens for validation. Ten written comments suggested reducing the number of cases (eg, 5 positive and 5 negative), while 3 comments sought clarification if cell blocks that were fixed in formalin would require separate validation. Due to the potential need to optimize IHC assays in non–formalin-fixed cytologic specimens when using protocols that are optimized for FFPE tissues, the panel agreed that the number of recommended cases in the validation set should be similar to that of the initial assay validation. The feedback was taken into consideration with minor rewording of statement 10.
11. Good Practice Statement
If IHC is regularly done on decalcified tissues, laboratories should test a sufficient number of such tissues to ensure that assays consistently achieve expected results. The laboratory medical director is responsible for determining the number of positive and negative tissues and the number of predictive and nonpredictive markers to test.
Decalcifying solutions potentially compromise antigen integrity and therefore may alter the performance of IHC assays. In addition, there is wide variance in not only the length of time specimens undergo decalcification, but the type of decalcification solutions employed and whether they include strong acids versus weak acids. There are also inconsistencies in frequency among the decalcified specimens that regularly require IHC assays (eg, bone marrow biopsy versus femoral head).
The systemic review data was ultimately considered insufficient to qualify statement 11 as a recommendation. The number of studies identified173–178 and the relatively small number of analytes and specimens evaluated did not render conclusive evidence for a recommendation; however, depending on the variables, some of these studies did show differences in IHC results after decalcification, providing overall support for testing a sufficient number of decalcified tissues in a validation set.
The impact of the inherent inconsistencies of decalcifying solutions and procedures on IHC assays was addressed in the original IHC validation guideline (statement 8, expert consensus opinion). The current statement 11 remains unchanged and is identical to the original statement 8.
Bone marrow biopsies, which are typically subjected to a defined decalcification protocol, regularly require IHC studies and should be incorporated into a validation set for markers that will be regularly utilized in the evaluation of hematopoietic disorders or metastatic deposits. These results can be correlated with expected results from similar non-decalcified specimens (such as bone marrow clot sections or lymph nodes). If a predictive marker has not been adequately validated on decalcified tissue, a disclaimer addressing the potential for false negative IHC results on decalcified specimens, should be included in the pathology report (LAP checklist ANP.22985).86
The expert panel received 180 respondents during the public comment period with 90% agreeing or agreeing with modification (162 of 180). In the minor group that disagreed (n = 6), the most frequent suggestion was to include a decalcification IHC disclaimer in the pathology report, in lieu of including decalcified tissues in a validation set. Four participants suggested that the actual number of specimens required for validation be provided. The panel maintains that a sufficient number of decalcified tissues should be tested during the validation if IHC is regularly performed on decalcified specimens. Furthermore, given the spectrum of variables inherent to decalcification protocols and utilizations within individual laboratories, the numbers and types of tissues tested can be left to the discretion of the laboratory medical director.
12. Good Practice Statement
Laboratories should confirm assay performance with at least 1 known positive and 1 known negative tissue when a new antibody lot is placed into clinical service for an existing validated assay (a control tissue with known positive and negative cells is sufficient for this purpose).
Verifying the performance of each new reagent lot is a standard part of good laboratory practice (LAP checklist ANP.22760).86 Factors that can cause between-lot variation in test results include changes in manufacturing conditions and suboptimal handling of reagents during shipping and storage. Verifying lot-to-lot performance is particularly important for markers reported quantitatively or semiquantitatively, where a treatment decision could be changed because of differences around a specific cut point for positivity.
The systematic review did not find strong evidence related to this issue.
This statement is nearly the same as statement 10 in the original guideline, but a proviso has been added that a control slide with both positive and negative cells may suffice. This statement is changed, however, from the draft statement sent for open comment. That version included new antibody lot with other assay changes (see statement 13 below) for which 2 positive and 2 negative tissues were recommended.
The most common response by far was that opening a new antibody lot does not represent a change to the assay method and that 1 positive and 1 negative tissue is sufficient. Several reviewers also noted that a control slide that has both positive and negative cells can be used to ensure that the new antibody lot is performing as expected. The expert panel agreed with both suggestions. As a result, the text of this statement was left nearly unchanged from the original guideline.
13. Good Practice Statement
Laboratories should confirm assay performance with at least 2 known positive and 2 known negative tissues when an existing validated assay has changed in any one of the following ways:
Antibody dilution.
Antibody vendor (same clone).
Incubation or retrieval times (same method).
Assay performance must be reconfirmed whenever there has been a change to the assay methods. Reassessing assays with at least 2 positive and 2 negative cases is a reasonable approach in ensuring assay performance when relatively minor changes are made. If these tissues do not perform as expected, further investigation and testing is warranted. The wording of this statement provides the laboratory medical director flexibility to increase the number of challenges as needed. For example, a major change in antibody dilution or incubation times, as defined by the laboratory director may warrant testing more than 2 negative and 2 positive cases. It may also be useful to include high and low expressors when these changes are made to predictive marker assays.
The systematic review did not find new evidence related to this issue and this statement is identical to guideline statement 11 in the original IHC validation guideline.
The draft statements sent for open comment included a single statement for testing 2 positives and 2 negatives for new antibody lot, changes in antibody vendor (same clone), incubation or retrieval times, and adoption of a new statement from a standard-setting organization. A total of 136 respondents largely agreed with the statement as is or with modifications (77% [136 of 177]), but leaving comments about new antibody lot. Based on those comments, a new statement was created for new antibody lot (see statement 12 above). Several commenters stated that 2 positives and 2 negatives for the other changes is excessive, but the expert panel concluded that these changes warrant slightly more stringent procedures for ensuring assay performance and that 2 positives and 2 negatives is a reasonable approach.
There were several comments that the last item in the draft list (adoption of a new statement from a standards organization) was not only unclear but does not represent a change to an assay and therefore does not belong in this list. The expert panel agreed with this and voted to remove it.
14. Good Practice Statement
Laboratories should confirm assay performance by testing a sufficient number of tissues to ensure that assays consistently achieve expected results when any of the following have changed:
Fixative type.
Antigen retrieval method (eg, change in pH, different buffer, different heat platform).
Detection system.
Tissue-processing equipment.
Automated testing platform.
Environmental conditions of testing (eg, laboratory relocation, laboratory water supply).
The laboratory medical director is responsible for determining how many predictive and nonpredictive markers and how many positive and negative tissues to test.
This statement is essentially identical to statement 12 in the original IHC validation guideline. In contrast to current statements 12 and 13, which apply to changes that affect only 1 assay, this statement applies to changes that can affect every assay. Full revalidation of every assay in this situation is not practical, but an assessment is needed to ensure that results of testing under new conditions are comparable to the results of prior testing. The laboratory medical director must determine the extent of this testing based on the nature of the change(s). The scope of the assessment should increase if more than 1 assay condition is modified (eg, both antigen retrieval buffer and detection system). A representative group of predictive and nonpredictive markers could be selected to assess the impact of the change and whether more thorough testing is needed. It is recommended to select markers for testing that have different immunolocalizations (ie, nuclear, membranous, cytoplasmic) as appropriate for the laboratory. When feasible, comparing the results of staining after the change with the slides from initial assay validation may help to determine if the intensity of staining has changed.
CAP-accredited laboratories are required to verify the performance of instruments and equipment after relocation to ensure that they run according to expectations (LAP checklist COM.30550).86
The systematic review did not find strong evidence related to this issue.
We received 178 respondents during the open comment period, with 92% (163 of 178) either agreeing with the statement as written or agreeing with suggested modifications. There were multiple comments that a minimum number of tissues should be specified here, but there were also several comments that reconfirming assay performance in this setting is unnecessary. The expert panel believes that reconfirming assay performance is necessary but that one specific number of positive and/or negative tissues cannot be recommended as the listed conditions vary in their potential impact on assay performance (eg, a minor change in pH or buffer is not the same as introduction of an entirely new testing platform). The panel believes that the laboratory medical director is in the best position to determine how many tissues to test based on the nature of the changes and the results of testing.
15. Good Practice Statement
Laboratories should run a full revalidation (equivalent to initial analytic validation) when the antibody clone is changed for an existing validated assay.
This statement is identical to statement 13 in the original IHC validation guideline. Although a limited reassessment of assay performance is sufficient when there are minor changes in assay conditions (eg, antibody dilution or incubation time), introduction of a different antibody clone represents a fundamental change to the assay and requires complete revalidation. This is because different antibody clones are typically raised against different epitopes on the target protein and their performance characteristics may significantly vary. This phenomenon is exemplified by the expression of thyroid transcription factor 1 in carcinomas other than those of thyroid or pulmonary origin. Multiple studies have shown low levels of expression in metastatic and primary CRCs, carcinomas of gynecologic origin, and glial neoplasms, using the SPT24 clone.179–181 By contrast, the 8G7G3/1 clone is uniformly negative in these tumor types. Similar data exist for CDX2.182
The systematic review did not find strong evidence related to this issue.
We received 178 respondents during the open comment period, with 90% (161 of 178) either agreeing with the statement as written or agreeing with suggested modifications. Only 11 comments were received, most suggesting that a change in clone does not warrant full revalidation. The expert panel does not agree with this for the reason stated above, that the use of a different clone represents a fundamental change to the assay, which warrants full revalidation.
Limitations
The current limitation of IHC validation is absence of calibrated reagents against which new assays can be compared. However, technologies are being developed that may provide calibrated standards in the near future.183,184 The development of these reagents would provide needed standards for clinical IHC assays.
In addition, the nonstandardized methods for preparing cytology and decalcified specimens impacts laboratories’ abilities to effectively validate assays on these specimen types. Development of protocols for preparing cytology and decalcified preparations would provide additional and needed standardization of assays performed on these specimen types.
CONCLUSIONS
Guidelines for validation and revalidation of IHC were originally published in 2014, which helped ensure accuracy, reproducibility, and consistency of results. This guideline update addresses additional facets of IHC validation that have evolved since the publication of the original guideline. These guideline statements provide additional direction regarding validation of IHC assays performed on cytology specimens and predictive marker assays that have distinct scoring systems.
Guideline Revision
This guideline will be reviewed every 4 years, or earlier in the event of publication of substantive and high-quality evidence that could potentially alter the original guideline recommendations. The status of the guideline can be found on www.cap.org.
Disclaimer
The CAP developed the Pathology and Laboratory Quality Center for Evidence-based Guidelines as a forum to create and maintain laboratory practice guidelines (LPGs). Guidelines are intended to assist physicians and patients in clinical decision making and to identify questions and settings for further research. With the rapid flow of scientific information, new evidence may emerge between the time an LPG is developed and when it is published or read. LPGs are not continually updated and may not reflect the most recent evidence. LPGs address only the topics specifically identified therein and are not applicable to other interventions, diseases, or stages of diseases. Furthermore, guidelines cannot account for individual variation among patients and cannot be considered inclusive of all proper methods of care or exclusive of other treatments. It is the responsibility of the treating physician or other health care provider, relying on independent experience and knowledge, to determine the best course of treatment for the patient. Accordingly, adherence to any LPG is voluntary, with the ultimate determination regarding its application to be made by the physician in light of each patient’s individual circumstances and preferences. CAP makes no warranty, express or implied, regarding LPGs and specifically excludes any warranties of merchantability and fitness for a particular use or purpose. CAP assumes no responsibility for any injury or damage to persons or property arising out of or related to any use of this statement or for any errors or omissions.
We thank the advisory panel members for their input and thoughtful review throughout the development of this guideline: Kimberly H. Allison, MD; Richard Brown, MD; Richard N. Eisen, MD; Rouzan Karabakhtsian, MD, PhD; Homa Keshavarz, PhD; Chad Livasy, MD; David Rimm, MD, PhD; Lori Schmitt, HT(ASCP) QIHC; Robert Schwartz, MD; and Thomas Summers, MD, MBA.
Footnotes
References 32–36, 38–41, 43, 47, 50, 52, 56, 59, 60, 64, 67, 68, 72–75, 78.
References 31, 42, 44, 46, 48, 51, 53, 61, 62, 65, 69–71, 76.
References 31, 32, 34–36, 38–44, 47, 49, 53, 56, 58, 61, 64, 69–71, 78.
References 45, 46, 48, 50–52, 54, 55, 57, 60, 62, 65–68, 72–76.
References 44, 48, 87, 90, 91, 93, 97, 100, 101, 138, 139, 145–147, 152–155, 158, 159, 163, 165, 168.
References 40, 94–96, 98, 102–107, 109, 110, 113–123, 125–136, 141–144, 148, 149, 151, 156, 157, 160, 161, 164, 166, 167, 169, 170.
References 40, 44, 88–90, 92, 93, 95, 96, 98, 99, 101–106, 109, 110, 112–131, 134, 135, 137, 139–142, 148–150, 152, 156, 157, 159–163, 165–170.
References 48, 91, 94, 97, 100, 107, 108, 111, 132, 133, 136, 138, 143–147, 153–155, 158, 164.
References 90, 96, 99, 101, 103–105, 171.
References 139, 141, 143, 149, 167, 169.
References
Author notes
Dr Goldsmith is the Guideline Expert Panel Chair
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the June 2024 table of contents.
Competing Interests
Authors’ disclosures of potential conflicts of interest and author contributions are found in the Appendix at the end of this article.