The authors announce the launch of the Consortium for Analytic Standardization in Immunohistochemistry, funded with a grant from the National Cancer Institute. As with other laboratory testing, analytic standards are important for many different stakeholders: commercial vendors of instruments and reagents, biopharmaceutical firms, pathologists, scientists, clinical laboratories, external quality assurance organizations, and regulatory bodies. Analytic standards are customarily central to assay development, validation, and method transfer into routine assays and are critical quality assurance tools.
To improve immunohistochemistry (IHC) test accuracy and reproducibility by integrating analytic standards into routine practice. To accomplish this mission, the consortium has 2 mandates: (1) to experimentally determine analytic sensitivity thresholds (lower and upper limits of detection) for selected IHC assays, and (2) to inform IHC stakeholders of what analytic standards are, why they are important, and how and for what purpose they are used. The consortium will then publish the data and offer analytic sensitivity recommendations where appropriate. These mandates will be conducted in collaboration and coordination with clinical laboratories, external quality assurance programs, and pathology organizations.
Literature review and published external quality assurance data.
Integration of analytic standards is expected to (1) harmonize and standardize IHC assays; (2) improve IHC test accuracy and reproducibility, both within and between laboratories; and (3) dramatically simplify and improve methodology transfer for new IHC protocols from published literature or clinical trials to clinical IHC laboratories.
The new Consortium for Analytic Standardization in Immunohistochemistry (CASI) has been launched to build foundational evidence for integration of analytic standards into the routine practice of clinical and clinical research immunohistochemistry (IHC). Its mission is to improve clinical and clinical research IHC test accuracy and reproducibility by integrating analytic standards into the routine practice of IHC. This article summarizes the problem to be solved and how CASI will address it.
WHY CASI IS NEEDED
The absence of analytic standards is a highly unusual situation for a clinical diagnostic testing environment. (The term analytic standard refers to reference materials with known analyte concentrations, described later.) There is no precedent to our knowledge for an entire clinical laboratory testing industry to lack analytic standards, especially one so large, well established, and critically important for patient care as IHC. Many individuals and organizations recognized the need early on,1 but developing analytic standards proved technically difficult. Consequently, there is no link in IHC testing to the analytic performance of the original clinical trial assays. Figure 1 illustrates a challenge that the field of IHC testing currently faces.
Figure 1 is reproduced from a published study of estrogen receptor (ER) testing across laboratories in Canada.2 Six breast cancer biopsies from 2 laboratories are illustrated. Laboratory A has a highly sensitive ER assay, with a lower limit of detection (LOD) of 7310 molecules per cell equivalent (Figure 1, y-axis), as measured with recently developed ER calibrators. The LOD is the lowest analyte concentration that produces a stain and can be visually detected. Laboratory B, on the other hand, has an ER assay with an LOD of 74 790 (Figure 1, y-axis), an analytic sensitivity that is 10 times less sensitive. Tumor cells require a 10-fold higher ER concentration to produce a visible brown color in laboratory B. Both laboratories are accredited and passed national proficiency testing surveys. Despite that, the samples are uniformly positive by laboratory A and very weak or negative by laboratory B.
The problem is not just that the IHC test results from the 2 laboratories are discordant but is, in fact, bigger and more serious. Because of the lack of analytic standards, there is no definitive evidence as to which laboratory is even right. In fact, it is possible that neither laboratory has the optimal analytic sensitivity corresponding to the highest clinical utility. The original clinical trial assay was conducted decades ago, using different primary antibodies, detection systems, and IHC instruments. The analytic sensitivity of the original clinical trial assays, showing patient benefit with hormonal therapy, is unknown. The clinical trial samples may be long gone. It is a mistake to assume that greater analytic sensitivity automatically corresponds with more accurate prediction of patient responsiveness to hormonal treatment for breast cancer. For example, responsiveness to atezolizumab in breast cancer was principally correlated with the least sensitive programmed death ligand-1 (PD-L1) IHC assay,3 as measured with PD-L1 calibrators.4 If the original clinical trial assays evaluating ER expression in breast cancer had been performed with analytic standards, and the LOD had been determined, then this would have facilitated a standardized method transfer from clinical trials to diagnostic use. Laboratories would be able to use such analytic standards to ensure that the assay in the laboratory matched that used in the clinical trial.
In all other fields of laboratory medicine, analytic standards are widely accepted not only for predictive and/or prognostic biomarker testing but also for all diagnostic purposes. Similarly, incorporation of analytic standards into IHC clinical practice is highly relevant for all IHC biomarkers, including diagnostic, prognostic, and predictive IHC assays.
INTRODUCING IHC CALIBRATION
Quantitative IHC analytic standards were recently developed5–9 and tested in 2 large studies.2,4 Calibrators are composed of purified analytes conjugated to a solid phase, at up to 10 defined analyte concentrations that are traceable to National Institute of Standards and Technology (NIST) Standard Reference Material (SRM) 1934.2 The solid phase is a clear cell-sized (7–8 μm) glass microbead. IHC staining of calibrators is performed exactly as for tissue samples, including deparaffinization, hydration, and antigen retrieval. Staining of calibrators is, therefore, like a solid-phase immunoassay except that it is performed on a microscope slide and the result is viewed under microscopy.
Figure 2 illustrates the process of measuring the LOD using calibrators. Figure 2, A, schematically shows a series of 10 calibrator microbeads with graded analyte concentrations (“levels”). When processed in an IHC test, the resulting color intensity on the calibrators is a function of the analyte concentration. The LOD can then be visually determined. The LOD is the lowest analyte concentration that still produces a visible color. In the example of Figure 2, A, level 5 represents the lower LOD, establishing the analytic sensitivity of the IHC assay. Figure 2, B, shows the same example but with actual images of microbeads. At analyte concentrations above the LOD, there is an initial linear increase in stain intensity followed by an analytic response plateau (maximum). A certificate of analysis provides an analyte concentration for level 5, expressed as the number of analyte molecules per cell equivalent, traceable to NIST SRM 1934. Figure 2 is only an example. The actual LOD and analytic response vary depending on the assay.
When used in this way, calibrators provide a quantitative measurement of an IHC assay's analytic sensitivity. Calibrators can serve as the link between the assay in your laboratory and the original clinical trial assay. Knowing the LOD will facilitate disseminating new assays in a consistent fashion and monitoring assay performance on each patient sample. Calibrators enable direct comparisons of IHC assay analytic performance.
EXPLANATION OF ANALYTIC STANDARDS
In laboratory medicine, analytic standards are essential for accuracy and reproducibility in patient testing. Analytic standards include primary reference standards, secondary reference standards, and the use of traceable units of measure.
Primary reference standards are fully characterized materials with known analyte concentrations.10 Typically, primary reference materials are prepared at accredited reference laboratories such as the NIST, the Institute for Reference Materials and Measurements, the World Health Organization, etc. These agencies issue certificates of analysis with data that are used for the preparation of secondary reference standards. Clinical laboratories do not typically purchase or directly use primary reference standards. Instead, they are used by companies that prepare secondary reference standards (calibrators) for routine clinical laboratory use.
Secondary reference standards (calibrators) are found in nearly every clinical laboratory (except for IHC laboratories). Examples include calibrators for serum electrolytes, glucose, creatinine, and many others. They have assigned analyte concentrations derived from a calibration curve using a primary reference standard. In IHC, calibrators for Ki-67, PD-L1, human epidermal growth factor receptor 2 (HER2), and many others have recently been developed, all with concentrations traceable to NIST SRM 1934.2,4
Traceable units of measure are defined by the primary and secondary standards. In IHC, the units of measure are the number of analyte molecules per cell equivalent. Scoring readouts such as percentage positive cells, H scores, or a 0 to 3+ score do not represent traceable units; they are morphologic descriptions that are not formally linked to analyte concentration.
THE PROBLEM AND OPPORTUNITY FOR IMPROVED IHC TESTING
The use of analytic standards represents a bedrock principle in clinical laboratory medicine but, for technical reasons, has been absent from IHC testing. As a result, the rates of clinically inadequate IHC testing and interlaboratory IHC test discordance usually range between 10% and 30%, about 10 times more than those of other clinical laboratory sections.11,12 The root cause of these discrepancies was previously reviewed.11 The clinical IHC laboratory is an outlier among the various clinical laboratory sections because it does not subscribe to the principles of metrology. Primary and secondary standards are foreign concepts in IHC testing. CASI's mission is to address this deficiency and improve both patient safety and patient outcomes. Patients benefit in 3 ways:
Increased analytic accuracy and precision. Historical precedent reveals the magnitude of the opportunity associated with introducing analytic standards. For example, the efforts of the National Glycohemoglobin Standardization Program led to a dramatic improvement in hemoglobin A1c testing, revolutionizing diabetes diagnosis and management.13 Introduction of analytic standards for cholesterol testing lowered cholesterol testing error rates from 18% to less than 5%, with an estimated savings in health care–related costs of more than $100 million per year.10
The evolution of new IHC tests. The new ability to objectively measure the staining cutoff (lower LOD) separating different diagnostic or therapeutic groups enables the development of IHC assays that might otherwise not have sufficient clinical sensitivity or specificity to be useful.
Next-generation (calibrated) IHC. Pathologist readouts will be based on accurate analytic standards, not only providing for more confident interpretations but also creating a foundation for more quantitative, objective test result reporting.
THE PURPOSE OF IHC CALIBRATION
Primary and secondary standards provide a link between the original clinical trial assay and/or published diagnostic study and the daughter assays in thousands of laboratories across the globe for years afterwards. Figure 3 illustrates the concept, comparing test development of a typical serologic immunoassay (Figure 3, A) and an IHC assay (Figure 3, B). The original clinical trial assay derives its validity from the ability to distinguish patient samples for a particular diagnostic, prognostic, or treatment purpose. This patient-related link is represented in Figure 3, A and B, by the images of people denoted true positives and true negatives. For a serologic assay, patient blood samples are depicted by the racks of test tubes in Figure 3, A. For an IHC assay, patient tissue samples are depicted by the microscope slides in Figure 3, B. Both types of samples are analyzed in the assay under development, and test results are graphed. The x-axis in both graphs is the assay signal intensity, which is proportional to the analyte concentration. Both graphs show that the assay is capable of distinguishing the true-positive and true-negative patient groups, with minimal overlap. However, this distinction can be accomplished only if there is a well-defined cutoff for the assay signal intensity (x-axis).
The vertical dashed blue line (Figure 3, A and B) is the optimal assay signal intensity (and corresponding analyte concentration) distinguishing a positive from a negative test result. For the serologic immunoassay (Figure 3, A), the calibrator (illustrated as a test tube) is usually an artificial sample with a defined amount of analyte representing the cutoff between positive and negative samples. Such calibrators are included with the reagents as part of commercial kits. For IHC assays (Figure 3, B), there is no analytic calibrator. The concentration of the analyte (vertical dashed blue line) that distinguishes negative and positive samples (the LOD) is unknown and therefore illustrated with a question mark.
In laboratory testing, the most reliable method to measure and monitor LOD is through the use of calibrators. Because there have been no IHC calibrators until now, previous publications described IHC critical assay performance controls, to introduce a descriptive lower LOD when true quantification was not feasible.14 If properly designed and used, these and other types of control samples have been useful to indirectly assess analytical sensitivity. Although not quantitative, their use has significantly improved IHC assay performance.12,15,16
CALIBRATORS COMPENSATE FOR ASSAY DRIFT
In the life cycle of assays, from development to validation and implementation of the test, there is a significant risk of analytic drift. Analytic drift is inherent in all assays, not just IHC, and is due to random changes in the test environment, reagents, instrumentation, or protocol. Analytic drift applies to both commercial kits and laboratory-developed tests. Figure 4 illustrates the consequences of analytic drift in a typical clinical laboratory. For the serologic immunoassay (Figure 4, A), the calibrator normalizes those analytic factors, creating a reproducible threshold separating positive and negative samples. The normalization occurs because analytic drift affects both patient and calibrator test results similarly, thereby creating a consistent calibration threshold for classifying patient test results. Patient samples with test results greater than the calibrator (signal to calibrator ratio >1) are positive. The calibrator is the link between the original clinical trial assay or published study and the subsequently developed assays in thousands of laboratories across the globe.
IHC, on the other hand, lacks calibrators. As a result, patient test results can vary, as illustrated in Figure 4, B and C. To illustrate, patient tissue samples mounted on slides are organized according to analyte concentration (left to right). Sample 1 is the lowest concentration and 5 the highest. If the IHC assay LOD (vertical blue dashed line) drifts, as illustrated from Figure 4, B, to Figure 4, C, then patient sample 3 has a different test result. Sample 3 is positive with the lower LOD (Figure 4, B) but negative with the higher LOD (Figure 4, C). The IHC staining result for sample 3 depends on whether its analyte concentration is above or below the LOD (dashed vertical line). A lower LOD means that more samples will be positive. This expected relationship was confirmed with ER testing.2 The absence of analytic standards is a major cause of interlaboratory test discrepancies.11,17 In IHC, analytic standards can be used to define the cutoff illustrated by the vertical dashed blue line in Figures 3 and 4.
CLARIFICATIONS
Image analysis does not compensate for the lack of analytic standards. The absence of analytic standards affects the underlying image itself. Feeding an inaccurate image into an image analysis algorithm will produce an inaccurate result.
Cell lines and tissue samples are not analytic standards. The term analytic standard and the principles (of metrology) by which they are used to support accurate testing are well defined in the field of laboratory medicine.
Introduction of analytic standards addresses IHC analytic test reproducibility but has no bearing on preanalytic factors. Meticulous attention to adequate tissue sampling and processing must still be maintained.
DESCRIPTION OF A CASI STUDY
CASI's role is to support, coordinate, and/or conduct clinically relevant IHC studies linking clinical sensitivity and specificity with analytic sensitivity. Figure 5 schematically depicts the tool for each study—a slide that bears both a tissue microarray (TMA) and calibrators. The TMA will be composed of formalin-fixed, paraffin-embedded clinical samples that include both true-positive and true-negative samples for the diagnostic purpose under study. The calibrators are synthetic samples with defined analyte concentrations, typically ranging from 10 000 to more than 1 000 000 molecules per cell equivalent. These slides will be distributed free of charge to up to 100 IHC laboratories (per study) that wish to participate. The participants will process them in their IHC assay as per their normal protocol and mail the answers and slides back to CASI. The slides are then analyzed to determine clinical sensitivity and specificity (from the TMA) and analytic sensitivity (LOD, from the calibrators). Pathologists affiliated with CASI will also assess the TMA readout. CASI staff will quantify the calibrator staining to calculate the LOD for each participating laboratory.
For 2022, the first 4 projects are HER2, PD-L1, p53, and BRAF V600E. The comparator assay for HER2 is gene amplification. For p53 and BRAF, the comparator assay is mutation analysis by DNA sequencing. For PD-L1, the comparator assay is the US Food and Drug Administration–cleared PharmDx PD-L1 22C3 assay as performed by reference laboratories such as those affiliated with external quality assurance programs. In each instance, the comparator assay defines true-positive and true-negative tissue samples. Once the survey tool is prepared, it is provided to approximately 100 laboratories for testing, one slide each. The survey slides are returned to CASI for reading the TMA tissue samples and quantifying the analytic sensitivity for each participating laboratory. The data are analyzed, plotting diagnostic sensitivity and specificity as a function of analytic sensitivity. From this analysis, a range of analytic sensitivity is selected that yields the most accurate diagnostic test results.
The above-described data will provide quantitative information characterizing interlaboratory variability in IHC testing. CASI will also identify the optimal LOD—the analyte concentration for the vertical blue dashed line in Figure 3, yielding the highest clinical sensitivity and specificity for each assay under study.
PARTICIPATING IHC LABORATORY FOLLOW-UP
A report will be generated for each participating laboratory summarizing its performance relative to its peers. CASI research surveys have no impact on laboratory accreditation. Resources permitting, each participating laboratory that did not do well will be offered assistance, such as suggested IHC protocols associated with best performance. Once the reason for the suboptimal performance is rectified, a repeat survey slide will be offered and improvement rates will be measured. Such improvement data may be useful for a future health economics impact assessment of this intervention.
ORGANIZATIONAL STRUCTURE
CASI is governed by a steering committee composed of appointed members and representatives of participating organizations (Figure 6). Steering committee responsibilities include ranking analytes for prioritization for study, designing study protocols, specifying TMA composition, data review, formulation of data-driven recommendations, and drafting of manuscripts. The committee also has oversight responsibilities for the allied functions. For example, the steering committee will either adopt existing criteria for TMA readout and interpretation or develop new criteria. The steering committee will also define the selection criteria for TMA samples, incorporating true-positive and true-negative samples. The steering committee may reach out to external consultants in this pursuit.
In addition to the steering committee, there are separate cores for statistical support, TMA readout and interpretation, TMA creation and validation, calibrator creation, reading of slides to generate data from TMAs and calibrators, and study tool distribution and collection. To the extent possible, whole slide imaging and automated data analysis will be incorporated into the workflow.
TRANSPARENCY AND COORDINATION
It is the authors' goal to conduct these studies in as inclusive and transparent a manner as possible. CASI intends to publicize each study, from inception and design, on a newly formed CASI Web site. CASI is interested in suggestions and collaborations, and will be seeking consulting expertise and systematically reviewing constructive suggestions.
NEXT STEPS
The introduction of objective, quantitative analytic standards may represent an inflection point for the field of IHC. The benefits extend beyond interlaboratory and intralaboratory test consistency and accuracy. The ability to evaluate and measure different analytic sensitivity cutoffs will likely create entirely new diagnostic opportunities that would otherwise not achieve sufficient interlaboratory reproducibility and accuracy to be relevant for clinical practice and patient care.
The authors are not aware of any previous historical instance in which an entire diagnostic testing industry lacks analytic standards. Correcting this situation is a large undertaking. It is our hope that others will concur on the importance of this mission and join with our efforts. We are interested in working with individuals, teams, or organizations that can assist.
Finally, the authors urge those currently developing IHC assays, especially for predictive IHC biomarker assays, to start integrating analytic standards into their clinical studies. CASI can advise on how calibrators be made available as well as how and at which study phase they may be integrated into clinical studies.
REFERENCES
Author notes
The Consortium for Analytic Standardization in Immunohistochemistry (CASI) is supported, in part, by the National Cancer Institute of the National Institutes of Health under award number R44CA268484-01 (to Bogen).
Bogen is a principal at Boston Cell Standards, which holds patents and has filed patent applications on the technology for creating calibrators. The other authors have no relevant financial interest in the products or companies described in this article.