The original guideline, “Validating Whole Slide Imaging for Diagnostic Purposes in Pathology,” was published in 2013 and included 12 guideline statements. The College of American Pathologists convened an expert panel to update the guideline following standards established by the National Academies of Medicine for developing trustworthy clinical practice guidelines.
To assess evidence published since the release of the original guideline and provide updated recommendations for validating whole slide imaging (WSI) systems used for diagnostic purposes.
An expert panel performed a systematic review of the literature. Frozen sections, anatomic pathology specimens (biopsies, curettings, and resections), and hematopathology cases were included. Cytology cases were excluded. Using the Grading of Recommendations Assessment, Development, and Evaluation approach, the panel reassessed and updated the original guideline recommendations.
Three strong recommendations and 9 good practice statements are offered to assist laboratories with validating WSI digital pathology systems.
Systematic review of literature following release of the 2013 guideline reaffirms the use of a validation set of at least 60 cases, establishing intraobserver diagnostic concordance between WSI and glass slides and the use of a 2-week washout period between modalities. Although all discordances between WSI and glass slide diagnoses discovered during validation need to be reconciled, laboratories should be particularly concerned if their overall WSI–glass slide concordance is less than 95%.
In 2013, the Pathology and Laboratory Quality Center for Evidence-based Guidelines (the Center) of the College of American Pathologists (CAP) released a guideline on the validation of whole slide imaging (WSI) for diagnostic purposes.1 For the purposes of this guideline, validation is defined as a process that demonstrates WSI will perform as expected for its intended use and environment prior to using it for patient care. The specific aims are to ensure pathologists make accurate diagnoses to at least the same level as light microscopy and to identify and control interfering artifacts or technological risks that WSI could introduce to patient safety. The Center chose to develop this guideline in 2010 because of growing interest in using WSI for diagnostic purposes, coupled with an expected wider adoption and a lack of evidence-based guidelines for clinical laboratories to validate the technology. The fundamental question addressed by the original guideline panel was, “What needs to be done to validate a WSI system for diagnostic purposes before it is placed into clinical service?” Following a comprehensive literature search, the Australian National Health and Medical Research Council methodology was used to grade recommendations based on strength of evidence, consistency, clinical impact, generalizability, and applicability to WSI with oversight by a methodologist consultant. The Australian National Health and Medical Research Council methodology generated 4 categories of statements based on decreasing strength of evidence: recommendations, suggestions, expert opinions, or no recommendation offered.2 The 2013 guideline had 12 recommendations. In terms of impact since its online release on May 1, 2013, the guideline has received 253 separate citations in publications originating in 39 countries. Most of the citations have occurred in comparative studies, narrative reviews, validation studies, and conference papers in journals originating in the United States (52%; 131 of 253), United Kingdom (13%; 33 of 253), Italy (9%; 24 of 253), and Canada (9%; 22 of 253).3 The guideline has also had 11 871 downloads from Archives of Pathology & Laboratory Medicine (Allen Press Technical Support, email communication, September 13, 2020).
According to Center policy, guidelines are subject to review and revision every 4 years, or at 2 years should significant new evidence appear in the literature. By the end of 2017, the amount of WSI validation literature had substantially increased, demonstrating excellent diagnostic concordance between WSI and glass slide diagnoses. The US Food and Drug Administration granted 2 WSI systems approval for primary diagnosis, and several laboratories around the world had already been using WSI for this purpose.4–9 Although other digital pathology modalities exist (eg, email of static JPEG images, robotic static/dynamic telemicroscopy, real-time video telemicroscopy), the Center expert panel chose to focus on WSI because other WSI vendors have or will have received Food and Drug Administration clearance by the 510(k) route, and WSI will form the basis of artificial intelligence/machine learning technology. In addition, the integration of laboratory information systems with digital pathology solutions is essential for patient safety, especially for high-volume digital reporting. Laboratory information system integration has been available for several years for WSI, but not for other digital pathology modalities.
This guideline update differs from the 2013 guideline in 2 respects. First, the CAP collaborated with the Association for Pathology Informatics and the American Society for Clinical Pathology on the development of the guideline. In addition to having representative members on the expert panel, both organizations assisted in dissemination of the open comment period. Second, the guideline revision process used the Grading of Recommendations Assessment, Development, and Evaluation (GRADE), approach as opposed to the Australian National Health and Medical Research Council system previously used. The GRADE approach is used by many organizations worldwide and is considered a standard in guideline development.10
As a result of using GRADE, recommendations are now classified as either strong or conditional. This differs from the 4-tiered strength of recommendations used in the original guideline. Additionally, GRADE introduces the concept of good practice statements (GPSs) for issues that are important and intuitive but lack evidence on which to base a recommendation. Although these changes impact both the strength and number of recommendations, the guideline update affirms almost all of the principles established in the original guideline.
This evidence-based guideline was developed and revised following the standards established by the National Academy of Medicine.11 A detailed description of the methods and the systematic review (including the quality assessment and complete analysis of the evidence) used to create this guideline can be found in the supplemental digital content (SDC) at https://meridian.allenpress.com/aplm in the April 2022 table of contents, which also contains 12 tables and 2 figures.
EXPERT PANEL COMPOSITION
The CAP, in collaboration with the Association for Pathology Informatics and the American Society for Clinical Pathology, convened multidisciplinary expert and advisory panels to revise the 2013 guideline. The expert panel included pathologists with considerable experience in WSI, 2 histotechnologists, and a research methodologist. The CAP approved the appointment of the members. Detailed information about the panel composition and the role of the panels can be found in the SDC.
CONFLICT OF INTEREST POLICY
The collaborators agreed upon a conflict of interest (COI) policy (effective November 2017), and members of the expert panel disclosed all financial interests from 12 months prior to appointment through the development of the guideline. Individuals were instructed to disclose any relationship that could be interpreted as constituting an actual, potential, or apparent conflict. Complete disclosures of the expert panel members are listed in the Appendix. Disclosures of interest judged by the oversight group to be manageable conflicts are as follows: M.B.: research grants, National Institutes of Health/National Cancer Institute (Bethesda, Maryland), and consultancies, Hologic, Inc (Bedford, Massachusetts) and ContextVision (Stockholm, Sweden); E.C.: ownership/partnership, Premier Laboratory, LLC (Boulder, Colorado); A.P.: boards/advisory boards, ContextVision (Stockholm, Sweden), PathPresenter (New York, New York), and Digital Pathology Association; LP.: consultancies, Hamamatsu Photonics KK (Hamamatsu City, Japan), Leica Biosystems (Wetzlar, Germany), and Ibex Medical Analytics (Tel Aviv-Yafo, Israel), research grants, Ibex Medical Analytics (Tel Aviv-Yafo, Israel), Huron Digital Pathology (St Jacobs, Ontario, Canada), and Lunit (Gangnam-gu, Seoul, Korea); V.R.: research grants, Leica Biosystems (Wetzlar, Germany), Philips (Amsterdam, Netherlands), 3DHISTECH LtD (Budapest, Hungary), and Paige (New York, New York). It would be challenging, if not impossible, to assemble a panel with the level of expertise required to write this guideline where no member has COIs related to WSI. The chair was required to have no COIs related to the subject matter of this guideline. The majority of the expert panel members (7 of 12) were assessed as having no relevant COIs. The CAP provided funding for the administration of the project; no industry funds were used in the development of the guideline. All panel members volunteered their time and were not compensated for their involvement, except for the contracted methodologist. See the SDC for complete information about the COI policy.
As per the 2013 guideline, the expert panel addressed the overarching question, “What needs to be done to validate (as defined above) a WSI digital pathology system for diagnostic purposes before it is placed in clinical service?” The panel considered intended uses of WSI, preparation types, numbers of cases, systems/components, personnel, and processes.
LITERATURE SEARCH AND COLLECTION
A comprehensive literature search for relevant evidence was completed by the CAP's medical librarian using Ovid MEDLINE and Elsevier Embase on June 26, 2018, encompassing the publication dates of January 1, 2012, through June 26, 2018, and supplemental searches were completed using the Cochrane Library. The search strategy used controlled vocabulary (ie, MeSH, Emtree) and keywords derived from the key questions. Database searches were supplemented with a search for unindexed literature, including a review of clinical trials and pertinent organizations' Web sites. Expert panel members were also polled for relevant unpublished data at the onset of the project. The literature searches were rerun on June 14, 2019, and July 15, 2020, to identify articles published from June 26, 2018, through July 15, 2020. Detailed information regarding the literature search is available in the SDC, including the search terms used.
Studies were selected for inclusion in the systematic review of evidence if they met the following criteria: (1) the study referred to WSI; (2) the study pertained to clinical use or investigative research; and (3) the study was a peer-reviewed, full-text article. Detailed information about the inclusion criteria is available in the SDC.
Articles were excluded from the systematic review if they were conference abstracts that were not published in peer-reviewed journals, qualitative studies, follow-up studies, mixed-methods studies, editorials, commentaries, narrative reviews, case reports, or letters; studies including fewer than 30 cases per study arm; studies in animal models or cell lines; full-text articles that were not available in English; studies that discussed only cytology cases; studies involving static and robotic digital imaging, purely technical components, only educational applications, or image analysis; or studies that did not address WSI validation. Despite literature that is now beginning to accrue on cytopathology as a use case for WSI, the evidence was considered to be immature relative to that for surgical pathology. As such, validation of WSI for cytopathology was considered out of scope for the guideline update pending additional research. Similar issues applied for peripheral blood smears and bone marrow aspirates, which were likewise out of scope. The guideline focuses on the use of WSI platforms by pathologists to make visual interpretations from images. It does not include recommendations on the use of automated image analysis systems. Detailed information about the exclusion criteria is available in the SDC.
The research methodologist performed a risk of bias assessment for all fully published studies meeting inclusion criteria. The methodologist assessed key indicators based on study design and methodologic rigor; a rating for quality of evidence (Supplemental Table 1) was designated. An overall GRADE rating was given for each recommendation by outcome. Refer to the SDC for further details.
ASSESSING THE STRENGTH OF RECOMMENDATIONS
Following the quality of evidence assessment, completion of the GRADE Evidence to Decision Framework,12 and discussion of the definitions and implications of strength of recommendations (Table 1), the expert panel designated the recommendations as either strong or conditional.
As per the initial guideline, this revision will be reviewed in 4 years, or earlier in the event of publication of substantive and high-quality evidence that could potentially alter the original guideline recommendations. If necessary, the entire expert panel will reconvene to discuss potential changes. When appropriate, the panel will recommend revision of the guideline to the CAP and the Association for Pathology Informatics and American Society for Clinical Pathology collaborators for review and approval.
The Center was developed by the CAP as a forum to create and maintain laboratory practice guidelines. Guidelines are intended to assist physicians and patients in clinical decision-making and to identify questions and settings for further research. With the rapid flow of scientific information, new evidence may emerge between the time a laboratory practice guideline is developed and when it is published or read. Laboratory practice guidelines are not continually updated and may not reflect the most recent evidence. Laboratory practice guidelines address only the topics specifically identified therein and are not applicable to other interventions, diseases, or stages of diseases. Furthermore, guidelines cannot account for individual variation among patients and cannot be considered inclusive of all proper methods of care or exclusive of other treatments. It is the responsibility of the treating physician or other health care provider, relying on independent experience and knowledge, to determine the best course of treatment for the patient. Accordingly, adherence to any laboratory practice guideline is voluntary, with the ultimate determination regarding its application to be made by the physician in light of each patient's individual circumstances and preferences. The CAP and its collaborators make no warranty, express or implied, regarding laboratory practice guidelines and specifically exclude any warranties of merchantability and fitness for a particular use or purpose. The CAP and its collaborators assume no responsibility for any injury or damage to persons or property arising out of or related to any use of this statement or for any errors or omissions.
A total of 1827 studies met the search term requirements and were carried forward for title and abstract review. Based on review of these abstracts, 173 articles met the inclusion criteria and were selected for full-text review. A total of 62 articles were included for data extraction. Each study was reviewed by 2 expert panel members at each phase. Studies with discordant reviews were referred to the chair for a final decision on inclusion or exclusion. Excluded articles were available as discussion or background references. Additional information about the systematic review is available in the SDC, including a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) table outlining details of the systematic review. Refer to the write-up for each recommendation for specific details about supporting evidence.
The panel convened 12 times (10 teleconferences and 2 face-to-face meetings) to develop the scope, draft recommendations, review and respond to solicited feedback, and assess the quality of evidence that supports the final recommendations. A nominal group technique was used for consensus decision-making to encourage unique input with balanced participation among group members. An open comment period was posted on the CAP Web site from June 24, 2019, to July 15, 2019, during which the 3 draft recommendations and 9 GPSs were posted for public feedback. A total of 146 comments were submitted from 154 participants, with all draft recommendation and GPSs receiving at least 90% outright agreement or agreement with some modification. Refer to the SDC for more details. The expert panel approved the final recommendations with a supermajority vote.
An independent review panel, masked to the expert panel and vetted through the COI process, recommended approval by the CAP Council on Scientific Affairs. The final recommendations are summarized in Table 2.
1. Strong Recommendation
The validation process should include a sample set of at least 60 cases for one application, or use case (eg, hematoxylin-eosin–stained sections of fixed tissue, frozen sections, hematology), that reflect the spectrum and complexity of specimen types and diagnoses likely to be encountered during routine practice. Note: the validation process should include another 20 cases to cover additional applications such as immunohistochemistry or other special stains if these applications are relevant to an intended use and were not included in the 60 cases mentioned above.
This recommendation was reaffirmed from the 2013 guideline.
The quality of evidence is moderate to support this recommendation.
The evidence base supporting this recommendation comprised 32 diagnostic studies13–44 and 1 systematic review.45 Data collection occurred prospectively in 6 studies15–17,20,26,41 and retrospectively in 18.* One study33 had both prospective data collection at 1 site and retrospective data collection using archived specimens at 2 other sites. Seven noninferiority studies18,22–24,29,32,34 were also included. The GRADE rating for concordance was strong recommendation, moderate quality evidence. Refer to Supplemental Tables 2 through 5 in the SDC for the individual study quality assessment, GRADE certainty rating, and evidence to decision summary for recommendation 1.
If the primary aim of validation is to demonstrate that a WSI system will perform as expected for a specific application such as primary diagnosis, it follows that the process should evaluate cases that reasonably represent the spectrum and proportion of diagnoses to be encountered in that application. The number of cases being evaluated should be sufficient to allow pathologists to establish trust in diagnoses made using WSI as well as to identify and mitigate risks associated with the technology. At the same time, the number of cases must strike a balance in terms of the amounts of time and resources required to complete the validation process. It is neither practical nor possible to evaluate a predetermined number of cases for every conceivable diagnosis during a validation study. To this end, the 2018 Royal College of Pathologists best practice recommendations for implementing digital pathology state that the sample size and duration of the validation process can vary according to specific circumstances and do not mention a particular number of cases to be evaluated.46
The number of cases recommended in the 2013 guideline and reaffirmed in this update is not intended to be a rigid number that laboratories must follow when validating WSI systems. It is logical that laboratories embarking on WSI validation will want some evidence-based guidance on a minimum number of cases they need to evaluate. The recommendation of at least 60 cases was determined from a systematic review of published validation studies showing that concordance between WSI and glass slide diagnoses is not improved or worsened when sets of more than 60 cases are used (Table 3; Figure). Systematic review of literature published between 2012 and 2018 (eg, after the 2013 guideline had been written) revealed 33 studies with case numbers ranging from 40 to 8069 where the reported end point was either intraobserver concordance† or discordance (from which concordance could be derived) between WSI and glass slides.‡ The weighted mean concordance, directly determined or derived, from these 33 studies was 95.2%, with a median of 95.0% and an interquartile range of 91.0% to 97.1%. On breaking these studies down into subgroups by number of cases, there were only 2 studies where the mean number of cases was less than 60.20,21 The mean number of cases for these studies was 41, and the mean WSI to glass slide intraobserver concordance was 85% (82.8% and 86.7%, respectively). Studies with means of 64 and 231 cases each yielded an identical mean concordance of 93%. Likewise, those with a mean of 750 cases also yielded a mean concordance of 93%. As such, no new evidence emerged to support changing the 2013 recommendation of at least 60 cases. More studies using fewer than 60 cases would be required to refine the above recommendation concerning a minimum number of cases.
This recommendation received 111 responses during the open comment period, of which 94.6% (n = 105) either agreed with the recommendation as written or agreed with suggested modifications. There were 22 comments. Some comments appeared to suggest that the number of cases was determined in an arbitrary manner, which, as described above, is not the case. Some comments suggested that 60 cases represented too few cases to adequately cover the complexity of cases encountered in clinical practice, whereas others suggested that the number was too large and impractical. The question of how one defines a “case” was also raised. Does a case include all parts and slides associated with a given accession number in a laboratory information system or can representative parts and/or slides be used? The PIVOTAL trial conducted by Philips to obtain Food and Drug Administration approval did not use all parts and slides for each of the 1992 cases that were evaluated.22 As mentioned above, the issue becomes one of balance between completeness and practicality with respect to the time and resources. It is the opinion of the expert panel that laboratories should be free to decide on how to define a case, with the proviso being that the selected material allows pathologists to conduct a reasonable and thorough assessment of the WSI system prior to introducing it into patient care. Finally, the issue of validating multiple WSI scanners distributed over a multisite network was raised and whether each scanner requires its own validation set of 60 cases. Laboratories may use their own judgment to determine whether the applications or use cases (eg, frozen sections, consultations, quality assurance or primary diagnosis) and expected case mix between sites are sufficiently different to warrant separate validations. Because the validation exercise is also intended to identify issues related to histology and their impact on image quality, assessing differences in the quality of histology between sites represents a critical consideration when deciding whether separate validations are required at each site. If WSI scanners from different vendors with different proprietary viewing software are being used across the network, it is reasonable to recommend that separate validation studies be performed for each system.
2. Strong Recommendation
The validation study should establish diagnostic concordance between digital and glass slides for the same observer (ie, intraobserver variability). If concordance is less than 95%, laboratories should investigate and attempt to remedy the cause.
This recommendation was updated from the 2013 guideline.
The quality of evidence is moderate to support this recommendation.
The evidence base supporting this recommendation comprised 32 diagnostic accuracy studies13–44 and 1 systematic review.45 Data collection occurred prospectively in 6 studies15–17,20,26,41 and retrospectively in 18.§ One study had both prospective data collection at 1 site and retrospective data collection using archived specimens at 2 other sites.33 Seven noninferiority studies18,22–24,29,32,34 were also included. Noninferiority designs compare diagnoses made by glass slides and WSI during the validation process to ground truth diagnoses made with glass slides as part of patient care. This approach provides a measure of diagnostic correctness, assuming the ground truth is correct. The original diagnoses may have been made by pathologists other than those completing the validation, providing information on interobserver variability. Comparing original and validation diagnoses made by glass slides provides baseline diagnostic concordance when cases are reviewed by the same modality. Validation diagnoses made by WSI are compared with original and validation glass slide diagnoses, the aim being to determine whether WSI creates additional discordance on top of that seen with glass slides alone. Based on a statistically determined margin, WSI is assessed as being inferior or noninferior to glass slides.
The GRADE rating for diagnostic concordance was strong recommendation, moderate quality evidence. Refer to Supplemental Tables 6 through 9 in the SDC for the individual study quality assessment, GRADE certainty rating, and evidence to decision summary for recommendation 2.
As outlined in the 2013 guideline, validation is a process designed to demonstrate that new technology or instrumentation performs as expected for its intended use prior to its application for patient care. This recommendation refers to the initial validation of the hardware and software components of a WSI system that must function correctly for pathologists to render diagnostic interpretations. The central question to be addressed is whether the same pathologist makes the same interpretation of a given case regardless of whether it is reviewed by WSI or as glass slides. Concordance, defined in this context as intraobserver agreement between diagnoses made by glass slides and WSI, is an ideal end point. Discordance, defined as disagreement between diagnoses, may also be used. Discrepant diagnoses between modalities can be subclassified as major or minor, where major discrepancies are defined as those that would impact patient management. Arbitration of discrepant diagnoses can be done manually by individual pathologists or panels of pathologists,22 or more objectively via standardized discordance tables.47 The intent of this recommendation is to evaluate the performance characteristics of a new technology being introduced into a diagnostic environment and its influence on the ability of pathologists to make diagnoses relative to glass slides. The process is not intended to assess diagnostic correctness or to validate an individual pathologist's diagnostic competency. An interobserver study design compares the interpretations of 2 pathologists each reviewing the same case/slide by different modalities. This approach does not optimally examine the influence of WSI on discordant diagnoses that could represent differences of opinion independent of the modality by which the slides were reviewed. Further, validation of a WSI system should be carried out by pathologists who have already been trained and are technically competent in the use of the specific WSI system being implemented (see below for discussion on GPS 6).
Laboratories engaged in validating WSI for patient care will logically seek advice concerning an acceptable pass/fail point for concordance/discordance with diagnoses made by traditional glass slide review. The ideal pass/fail number would be 100% concordance (or 0% discordance); however, this does not reflect the subjective nature of pathology as practiced with glass slides where interobserver and intraobserver variability is an established reality. The weighted mean percentage concordance across the 33 studies in our systematic review was 95.2%. This analysis formed the basis for the 95% concordance mark in this recommendation. Discordance between WSI and glass slides was reported in 24 studies, of which 5 studies20,23,39,40,48 classified the discordance as minor (average rate of 6.7%) and 7 studies18,20,23,25,29,32,36 classified the discordance as major (average rate of 4.2%). Eleven studies reported specifically on the concordance and minor discordance rate between WSI and glass slides, with minor discordance ranging from 1.4% to 10.1% in 7 studies,15,18,25,29,30,32,36 with an average minor discordance rate of 5.1%. It should be noted that validation studies using a noninferiority design have shown no significant difference in major discordance rates (ie, discrepant diagnoses that would affect patient management) between WSI and glass slides.18,22,24,29,32,34
This recommendation received 111 responses during the open comment period, with 94% (n = 104) agreeing with the recommendation either as written or with suggested modifications. Comments on the minimum concordance value suggested that 95% was too low, or conversely a discordance rate of 5% was unacceptably high, especially if the discordances are major and affect patient management. One comment suggested the need to capture rates of deferral of WSI cases to glass slide review during validation studies. Additional information suggested included the types of cases involved, the reasons for deferral, and whether deferral behavior was shown by all pathologists completing the validation or only a few pathologists who might be inherently reluctant to use WSI for diagnostic purposes. Capturing rates of deferral to glass slide review was a recommended quality management metric in the 2014 American Telemedicine Association clinical guidelines for telepathology.49
The 95% figure is not intended to be a pass/fail mark, where less than 95% means validation has failed and WSI should not be used for patient care. Achieving less than 95% concordance merely suggests the results are below average based on the peer-reviewed literature used to develop this guideline. As such, the panel's recommendation is that laboratories investigate and attempt to resolve systematic issues that may have contributed to a concordance rate of less than 95%. Possible areas to examine include, but are not limited to, the types of cases that were found to be problematic, whether the discordances are attributable to only 1 or 2 pathologists or a larger group of reviewers, and whether the discordant cases are related to correctable histology and/or scanner issues. One approach to exploring discordances for specific diagnoses that arose in an initial validation set would be to review additional cases of that type to better understand the issues involved. If factors contributing to discordant WSI diagnoses cannot be rectified to the satisfaction of the pathologists using the WSI system, those cases could routinely be deferred to glass slide review. Problematic specimen types could then become the focus of additional research as WSI technology evolves.
3. Strong Recommendation
A washout period of at least 2 weeks should occur between viewing digital and glass slides.
This recommendation was reaffirmed from the 2013 guideline.
The quality of evidence is moderate to support this recommendation.
The evidence base supporting this recommendation comprised 14 studies.‖ Data collection occurred prospectively in 4 studies15–17,26 and retrospectively in 8.¶ One study had both prospective data collection at 1 site and retrospective data collection using archived specimens at 2 other sites.33 A noninferiority study34 was also included. Refer to Supplemental Tables 10 through 12 in the SDC for the individual study quality assessment, GRADE certainty rating, and evidence to decision summary for recommendation 3.
This recommendation is intended to address the issue of recall bias when cases are reviewed by 2 different modalities by the same observer. Most, if not all, pathologists can identify cases that they describe as “once seen, never forgotten.” Nonetheless, there is merit in attempting to control for bias that may lead to less than thorough review and/or rubber-stamping of diagnoses previously made by one modality when cases are reviewed by the second modality. Literature specifically designed to identify an optimal washout period for WSI validation studies is nonexistent, although studies based on comparing glass slide diagnoses made on the same cases at different time points have been performed. Campbell et al51 reported the influence of a 2- versus 4-week washout period on diagnoses made by 3 pathologists. The pathologists each reviewed groups of 60 glass slides from a set of 120. Following intervals of 2 and 4 weeks, the pathologists reviewed a second set of 60 slides that included a mix of cases seen or not seen in the initial review. The participants were asked to identify cases they had seen during the first review and were asked to rate their level of confidence with respect to recalling those cases. The pathologists correctly recalled 40% of the cases after 2 weeks and 31% of the cases after 4 weeks, indicating an appreciable degree of recall that could influence data collected during intraobserver studies.
A total of 14 studies (see Table 4) with respect to the influence of the length of a washout period on intraobserver concordance between glass slide and WSI diagnoses were included. The washout period ranged from less than 4 weeks to greater than 8 weeks, with 5 (35.7%) reporting a washout of 1 to 4 weeks and 9 (64.3%) using a washout of more than 8 weeks. No influence was found when concordance data from these studies were stratified according to washout duration. As such, no new evidence was identified on systematic review to support changing the washout period of at least 2 weeks recommended in the 2013 guideline.
This recommendation received 111 responses during the open comment period, 90% (n = 100) of which agreed with the recommendation as written or agreed with comments concerning modifications. Suggestions for washout periods other than the recommended 2-week minimum ranged from not using a washout period to increasing it to a minimum of 4 weeks. Some contributors believed that 2 weeks was too long and would create a barrier to using the guideline. Others believed 2 weeks was too short and would increase the likelihood of recall bias, although this concern is not supported by our systematic review. Those believing that a washout period was unnecessary also indicated that washout periods could remove the opportunity to identify cases where all tissues on a glass slide may not have been captured during the scanning process. Although failure of WSI scanners to capture all tissue is a rare occurrence,52 it has been documented and could have potentially devastating consequences on patient care with medicolegal repercussions.
As per the other recommendations in this guideline, the 2-week washout period serves as guidance for laboratories seeking evidence-based advice on this specific aspect of WSI validation. Laboratories are free to use washout periods of any duration they might deem more practical or better able to minimize the negative impact of recall bias on data being collected to inform the use of their WSI system for diagnostic work. One solution to the possibility of not detecting incomplete tissue capture by a WSI scanner would be to build in a technical quality control step whereby all scanned slides used in the validation study are reviewed by a scanning technologist in conjunction with the corresponding glass slides to ensure all tissue was captured. Detection of any such events should trigger communication with the WSI vendor to begin work on correcting the problem. Image analysis algorithms based on artificial intelligence may emerge as a preanalytical solution for detecting missing tissue on scanned slides before cases are assigned to pathologists for review.53
GOOD PRACTICE STATEMENTS
As mentioned in the introduction, GPSs are defined as statements having a “high level of certainty that the recommendation will do more good than harm (or the reverse), but where there is little direct evidence.”54,55 Unlike recommendations, GPSs are not evidence based. Nine of the recommendation statements from the 2013 guideline were reaffirmed and/or revised as GPSs.
GPS 1. All pathology laboratories implementing WSI technology for clinical diagnostic purposes should carry out their own validation studies.
GPS 2. Validation should be appropriate for and applicable to the intended clinical use and clinical setting of the application in which WSI will be used. Validation of WSI systems should involve specimen preparation types relevant to intended use (eg, formalin-fixed, paraffin-embedded tissue; frozen tissue; immunohistochemical stains). If a new application for WSI is contemplated, and it differs materially from the previously validated use, a separate validation for the new application should be performed.
GPS 3. The validation study should closely emulate the real-world clinical environment in which the technology will be used.
GPS 4. The validation study should encompass the entire WSI system. It is not necessary to separately validate each individual component (eg, computer hardware, monitor, network, scanner) of the system or the individual steps of the digital imaging process.
GPS 5. Laboratories should have procedures in place to address changes to the WSI system that could impact clinical results. This statement was revised from the 2013 guideline.
GPS 6. Pathologists adequately trained to use the WSI system must be involved in the validation process.
GPS 7. The validation process should confirm that all of the material present on a glass slide to be scanned is included in the digital image.
GPS 8. Documentation should be maintained recording the method, measurements, and final approval of validation for the WSI system to be used in the anatomic pathology laboratory.
GPS 9. Pathologists should review cases/slides in a validation set in random order. This applies to both the review modality (ie, glass slides or digital) and the order in which slides/cases are reviewed within each modality. This statement was revised from the 2013 guideline.
Responses during the open comment period indicated an average of 98% (range, 94%–100%) agreement with the GPSs as written or agreement with some modification. Several of the GPSs were revised from the 2013 guideline by the expert panel or were the subject of specific questions during the open comment period. Good practice statement 2, concerning validation of WSI for specific intended uses, states that separate validation is required for new applications that differ materially from the application for which the initial validation was performed. Examples include the following: If the initial validation was performed for primary diagnosis involving hematoxylin-eosin, histochemically and immunohistochemically stained paraffin sections, a separate validation is required if frozen section interpretation is the new application. Similarly, if the initial validation did not include biopsies requiring review at high resolution (eg, gastric biopsies with Helicobacter pylori), it would be prudent to conduct a separate validation with such cases, as they may represent limitations on the use of WSI.24,27 Good practice statement 3 covers the concept of validation studies emulating the real-world environment in which the technology is to be used. It is recognized that validation studies cannot perfectly replicate real-life diagnostic activities. Laboratories are free to incorporate whatever they feel would be appropriate to achieve this goal, including clinical information, serial/deeper hematoxylin-eosin levels, immunohistochemical stains, or ancillary test data. Good practice statement 4, concerning the validation of the entire WSI system as opposed to individual components, received comments on how to deal with situations where new monitors/displays were introduced after validation had been completed. These points also relate to GPS 5 and the need for procedures to address changes to the WSI system that could impact clinical results. (The panel developed a resource to help laboratories determine the actions needed for various changes to the WSI system. It can be accessed on the CAP's WSI guideline Web page: https://www.cap.org/protocols-and-guidelines/cap-guidelines/current-cap-guidelines/validating-whole-slide-imaging-for-diagnostic-purposes-in-pathology.) Good practice statement 6, concerning the need for pathologists to be adequately trained on the use of the WSI system prior to embarking on a validation study, received requests for a definition of adequately trained. Following systematic review, no evidence-based recommendations could be made about the type of training or the metrics used to determine technical competency of pathologists using WSI systems for diagnostic purposes. Further, expert panel members noted that some studies did not report whether user training had been provided. As such, adequate training is defined at the discretion of the laboratory medical director. The same applies for the number of pathologists participating in the validation process. Having all pathologists in a given institution complete a validation study certainly has its benefits; however, this may not be practical, and there is no evidence indicating that it is necessary. The expert panel will provide suggestions for training in a document made available on the CAP Web site following release of the guideline. Good practice statement 9, concerning the order of review of cases within and between the glass slide and WSI arms of a validation study, was discussed in detail by the expert panel. Although some studies in the systematic review of WSI validation literature indicate that such randomization of review order was performed,22 there is no specific evidence that changing the order actually influences the data collected. Even further, although it may be intuitive that randomizing the order of cases between modalities will minimize recall bias, no studies were found in the pathology literature where random versus nonrandom review order was specifically compared in terms of intraobserver variability.
See Table 5 for a summary of the GPSs.
This guideline provides evidence-based recommendations for pathologists and laboratories implementing WSI for diagnostic purposes, including frozen sections, consultation, quality assurance reviews, and primary diagnosis. The term diagnostic purposes in the context of WSI refers to an activity where a pathologist interprets a scanned slide and that interpretation either contributes to or is the sole basis for information in a pathology report that is used for patient management. Although WSI is frequently used to review cases at tumor boards/multidisciplinary case conferences, tumor boards do not constitute diagnostic activities in and of themselves, even though they may lead to slide reviews that result in amendments to original pathology reports. As such, WSI systems do not require validation before they are used for tumor boards.
As with the initial version released in 2013, this update was generated by an expert panel that considered the necessary steps involved in validating these systems for diagnostic use. A systematic review of literature that appeared after the release of the 2013 guideline was conducted to identify and grade evidence that might inform each step in the validation process. In developing the final recommendations and GPSs in this guideline, professional bodies such as the CAP need to anticipate questions that will arise as laboratories embark on WSI validation. To place this guideline in proper context, several points must be acknowledged. Guidelines are not intended to be standards of care. Several other guidelines concerning the implementation of WSI for clinical use exist.46 Laboratories are free to deviate from published guidelines where the advice is either impractical or not applicable to their situation. Finally, the panel recognizes that the recommendations offered in this manuscript may be more stringent than what is required by the CAP Laboratory Accreditation Program. For accreditation purposes, laboratories should follow the requirements of their accreditation agencies.
The revised guideline contains 3 strong recommendations and 9 GPSs, all of which were included in the original 2013 guideline. Importantly, systematic review of literature following release of the original guideline reaffirms strong recommendations concerning the use of a validation set comprising at least 60 cases, establishing diagnostic concordance between WSI and glass slides for the same observer, and the use of a 2-week washout period. The mean diagnostic concordance between WSI and glass slides reported by studies assessed in our systematic review was 95.2%. Although all discordances between WSI and glass slide diagnoses discovered during validation studies need to be reconciled, laboratories should undertake a systematic review of their data if the overall WSI–glass slide concordance is less than 95%.
We thank the American Society for Clinical Pathology, the Association for Pathology Informatics, and their staff for their involvement and support. We also thank the advisory panel members for their guidance throughout the development of the guideline and for their thoughtful review of this work: Walter H. Henricks, MD; Jason Hipp, MD, PhD; Dennis O'Neill, MD; David McClintock, MD; Paul J. van Diest, MD, PhD; Chee Leong Cheng, MBBS; and Veronica Klepeis, MD, PhD.
Authors' disclosures of potential conflicts of interest and author contributions are found in the Appendix at the end of this article.
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the April 2022 table of contents.