Context.—There is increasing interest in using whole slide imaging (WSI) for diagnostic purposes (primary and/or consultation). An important consideration is whether WSI can safely replace conventional light microscopy as the method by which pathologists review histologic sections, cytology slides, and/or hematology slides to render diagnoses. Validation of WSI is crucial to ensure that diagnostic performance based on digitized slides is at least equivalent to that of glass slides and light microscopy. Currently, there are no standard guidelines regarding validation of WSI for diagnostic use.
Objective.—To recommend validation requirements for WSI systems to be used for diagnostic purposes.
Design.—The College of American Pathologists Pathology and Laboratory Quality Center convened a nonvendor panel from North America with expertise in digital pathology to develop these validation recommendations. A literature review was performed in which 767 international publications that met search term requirements were identified. Studies outside the scope of this effort and those related solely to technical elements, education, and image analysis were excluded. A total of 27 publications were graded and underwent data extraction for evidence evaluation. Recommendations were derived from the strength of evidence determined from 23 of these published studies, open comment feedback, and expert panel consensus.
Results.—Twelve guideline statements were established to help pathology laboratories validate their own WSI systems intended for clinical use. Validation of the entire WSI system, involving pathologists trained to use the system, should be performed in a manner that emulates the laboratory's actual clinical environment. It is recommended that such a validation study include at least 60 routine cases per application, comparing intraobserver diagnostic concordance between digitized and glass slides viewed at least 2 weeks apart. It is important that the validation process confirm that all material present on a glass slide to be scanned is included in the digital image.
Conclusions.—Validation should demonstrate that the WSI system under review produces acceptable digital slides for diagnostic interpretation. The intention of validating WSI systems is to permit the clinical use of this technology in a manner that does not compromise patient care.
In the last decade, digital imaging in pathology has been significantly impacted by the development and application of whole slide imaging (WSI) technology.1–7 The automated WSI scanner is a robotic microscope capable of digitizing an entire glass slide, using software to merge or stitch individually captured images into a composite digital image. The critical components of an automated WSI device (system) include the hardware (scanner composed of an optical microscope and digital camera connected to a computer), software (responsible for image creation and management, viewing of images, and image analysis where applicable), and network connectivity. Whole slide imaging technology has evolved to the point where digital slide scanners are currently capable of automatically producing high-resolution digital images within a relatively short time. The virtual image may represent an entire glass slide or a user-selected area of the glass slide, and is often referred to as a whole slide image or digitized slide. Upon retrieval of the digital file, the captured image of the slide can be viewed on a computer monitor without the use of an actual microscope. The software interface used to view digital slides simulates the operation of light microscopy. Several types of WSI scanners have been developed by vendors, all capable of producing automated, high-speed, high-resolution whole slide digital images.8
Whole slide imaging technology has several advantages over conventional microscopy, such as portability (ie, images are often accessible anywhere and at any time), ease of sharing and retrieval of archival images, and the ability to make use of computer-aided diagnostic tools (eg, image analysis). Whole slide imaging has been successfully used for education (eg, digital slide teaching sets), quality assurance (eg, proficiency testing, archiving), research, image analysis, and diagnostic purposes. Jara-Lazaro and colleagues9 reviewed several articles wherein validation studies using WSI were performed and concluded that these digital systems generally show good concordance with glass slides. In one particular study in which digital and glass slides from 600 cases were compared, the results showed a diagnostic accuracy of 94% with WSI versus 99% with light microscopy.10 Several studies in recent years have demonstrated that primary histopathologic diagnoses can be rendered digitally using WSI.11–18 Discrepancies in diagnoses between digital and glass slides in publications were attributed to image quality, rarely missed tissue on the digital image, inadequate clinical metadata, and pathologists' lack of experience using the WSI system. Specific microscopic details (eg, organisms, nuclear atypia, apoptosis, mitotic figures, eosinophil granules) were sometimes noted to be difficult to identify because of poor image resolution on high magnification or went undetected (eg, minute focus of prostate adenocarcinoma) in the digital image.19–22 It was also observed by some investigators that the time required to review a virtual slide took longer than that needed to examine a glass slide.
The growing worldwide success of WSI is attributed to advances in image quality, improved technology of WSI scanners, increased computational power of computers, better network connectivity, and relative ease of slide reproduction and distribution. Rendering routine pathologic diagnoses using a WSI system is feasible if the image represents an accurate digital reproduction of the scanned glass slide that can be saved, archived, reviewed, and later retrieved without degradation of the image. At present, adoption of WSI for rendering pathologic diagnoses has been used primarily for second opinions (ie, review via teleconsultation) and for telepathology of frozen sections.22–25 To date, WSI has been used for making primary (ie, initial or immediate) diagnoses on a routine clinical basis in only limited practice settings outside of the United States.26 Wider adoption of WSI in pathology practice is anticipated to occur following further technical advancements, better workflow, integration of these systems with the laboratory information system, promotion of reimbursement for technical services, lowered costs, standardization for use in clinical practice, clarification of regulations, and pathologist acclimatization.2,9,27 Also impacting the clinical use of WSI, the US Food and Drug Administration has indicated that it considers WSI systems class III (highest risk) medical devices and has advised that they should be regulated as such.28–30
Validation, in the context of new technology or instrumentation, refers to a process that aims to demonstrate that the new method performs as expected for its intended use and environment prior to its application for patient care. Therefore, validation is recommended to determine that a pathologist can use a WSI system to render an accurate diagnosis with the same or better level of ease as with a traditional microscope and without interfering artifacts or technological risks to patient safety.31 Although limited validation studies have been published using WSI, there are currently no standard guidelines available to help with validating WSI for diagnostic clinical use in the laboratory. Guidelines available for using other digital pathology systems (eg, Food and Drug Administration–approved Papanicolaou test imaging systems) are not applicable to validating WSI systems. A white paper32 produced by the Digital Pathology Association in 2011 provided a high-level overview of some of the factors to be considered when validating a digital pathology system. Validation guidelines for digital pathology systems have been developed for the regulated nonclinical environment.33 This evidence-based guideline presents recommendations developed by the College of American Pathologists (CAP) Pathology and Laboratory Quality Center for validating WSI when used for diagnostic purposes in pathology.
The CAP Pathology and Laboratory Quality Center convened an expert panel consisting of members with expertise and experience in digital pathology relevant to using WSI for clinical purposes. Members included practicing US and Canadian pathologists and CAP staff. The CAP approved the appointment of the project chair Liron Pantanowitz and expert panel members. All expert panel members complied with the CAP conflicts of interest policy (in effect April 2010), which required disclosure of financial or other interests that may have an actual, potential, or apparent conflict throughout the project. Refer to the Appendix for disclosures.
The charge to the panel was “to recommend validation requirements for whole slide imaging systems used for diagnostic purposes.” The central question that the panel addressed was, “What should be done to validate a whole slide digital imaging system for diagnostic purposes before it is placed in clinical service?” The intent of the practical recommendations published herein is to guide pathology laboratories in the validation of their own WSI systems for clinical use.
Systematic Literature Review and Analysis
A computerized literature search was conducted in the electronic databases Ovid MEDLINE, CSA Illumina Conference Papers Index, and Google Scholar for relevant articles from January 2000 through January 2012. The search used the following terms: whole slide imaging OR virtual or digital microscopy OR digital pathology OR teleconsultation OR telemicroscopy AND validation; alternate terms digitized slide and whole slide scanner were also used. Reference lists from identified articles were scrutinized for articles not identified in the above search.
Eligible Study Designs
In addition to journal articles, the search identified published abstracts presented at various conferences, including international meetings. The initial search was not limited to the English language, and one Russian article was included for the full text review.
Published studies were selected for full text review if they met the following criteria:
the study referred to WSI, and
the study pertained to clinical use or investigative research.
All clinical fields (eg, pathology, veterinary) were allowed.
Publications involving static and robotic digital imaging, purely technical components, only educational applications, and image analysis were excluded.
Outcomes of Interest
The primary outcome of interest in evaluating selected publications was the correlation between WSI (digitized slides) and glass slides, particularly with respect to accuracy, concordance, average diagnostic certainty, and sensitivity and specificity in the context of validation requirements. Accuracy refers to an agreement between the originally reported final (“true”) diagnosis and the diagnosis drawn from the WSI or glass slide. Concordance between digitized and glass slides refers to an agreement in the diagnosis made when viewing slides with these 2 modalities.
Quality Assessment and Grading of Evidence
The literature review was performed in duplicate by 2 members of the expert panel. A third reviewer was involved if the 2 were not able to reach consensus. A contracted methodologist (AL) and CAP staff (LF) performed final data extraction. Each study was assessed for strength of evidence, which consists of level of evidence, quantity, size of the effect, statistical precision and, quality (risk of bias). The quality assessment of the studies was performed by using the Whiting et al34 instrument. The other components of evidence, such as consistency, clinical impact, generalizability, and applicability to digital pathology, were also considered when determining the strength of evidence.35 (Refer to Table 1 in supplemental material file at www.archivesofpathology.org in the December 2013 table of contents.) The overall grade of each recommendation was obtained by rating all components of the evidence. The overall grade indicates the strength of the body of evidence to assist the users of clinical practice guidelines in making appropriate and informed clinical judgments.35 (Refer to Table 2 in the Supplemental digital content.)
In the evidence evaluation criteria used, Grade A or B evidence supports recommendations, the term we use for guidance based on a body of evidence that can be trusted to guide clinical practice in all or most situations. Grade C evidence is insufficient to support a recommendation; instead we use the term suggestion, for which care should be taken in application. Suggestions may also reflect guidance in cases in which evidence is conflicting or inconclusive. Grade D evidence is weak and does not provide support for either recommendations or suggestions. However, the guideline authors may choose to provide guidance in the form of an expert consensus opinion where they believe that guidance will result in improved patient care, even in cases where the evidence is low or lacking. (Refer to Table 3 in the Supplemental digital content.) In this guideline, guidance includes recommendations, suggestions, and expert consensus opinion; there were no instances of no recommendation offered. (For complete evidence reviews of all guideline statements, refer to Tables 4 through 11 in the supplemental digital content.)
This guideline will be reviewed every 4 years, or earlier in the event of publication of substantive and high-quality evidence that could potentially alter the original guideline recommendations. If necessary, the entire panel will reconvene to discuss potential changes. When appropriate, the panel will recommend revision of the guideline to CAP for review and approval.
Conflict of Interest Policy
Prior to acceptance on the expert panel, potential members completed the CAP conflict of interest disclosure process, whose policy and form requires disclosure of material financial interest in, or potential for benefit of significant value from, the guideline's development or its recommendations (Appendix). The potential members completed the conflict of interest disclosure form, listing any relationship that could be interpreted as constituting an actual, potential, or apparent conflict. Two potential members were not appointed based on this policy. The CAP provided funding for the administration of the project; no industry funds were used in the development of the guideline. Panel members volunteered their time and were not compensated for their involvement.
The CAP developed the Pathology and Laboratory Quality Center as a forum to create and maintain evidence-based practice guidelines and consensus statements. Practice guidelines and consensus statements reflect the best available evidence and expert consensus supported in practice. They are intended to assist physicians and patients in clinical decision making and to identify questions and settings for further research. With the rapid flow of scientific information, new evidence may emerge between the time a practice guideline or consensus statement is developed and when it is published or read. Guidelines and statements are not continually updated and may not reflect the most recent evidence. Guidelines and statements address only the topics specifically identified therein and are not applicable to other interventions, diseases, or stages of diseases. Furthermore, guidelines and consensus statements cannot account for individual variation among patients and cannot be considered inclusive of all proper methods of care or exclusive of other treatments. It is the responsibility of the treating physician or other health care provider, relying on independent experience and knowledge, to determine the best course of treatment for the patient. Accordingly, adherence to any practice guideline or consensus statement is voluntary, with the ultimate determination regarding its application to be made by the physician in light of each patient's individual circumstances and preferences. The CAP makes no warranty, express or implied, regarding guidelines and statements and specifically excludes any warranties of merchantability and fitness for a particular use or purpose. The CAP assumes no responsibility for any injury or damage to persons or property arising out of or related to any use of this statement or for any errors or omissions.
Expert Panel Literature Review and Analysis
A total of 767 studies met the search term requirements. Each study underwent an inclusion-exclusion, dual independent review conducted by staff, chair, and a third member referee when staff and chair review did not agree. The 112 articles that remained were reviewed in full, independently, by 2 of the expert panel members, who each rated and scored the articles on their relevance to clinical validation of WSI systems for diagnostic use (refer to Figures 1 and 2 in the supplemental digital content). Twenty-seven studies received a strong enough score to be considered for data extraction and review by the contracted methodologist. The expert panel performed a preliminary data extraction in the following areas: year of publication, country of origin, publication type, application of study, subspecialty of study, number of pathologists (or individuals), number of cases, validation method, reported concordance, and outcome measurement. Data verification was performed by CAP staff and 4 more studies were removed at that point, providing a total of 23 references for final recommendations.12,13,17,20–24,36–50 Excluded articles were available as discussion or background references.
All publications selected for data extraction, spanning a decade, involved studies of WSI systems for clinical use. More publications arose from European countries compared with the United States. Various commercially available WSI devices were used in these clinical studies. The panel was cognizant of the fact that during the last 10 years several advances in WSI technology have been made, so that it is now technically possible to scan slides much faster and to produce images of higher resolution than formerly. Most of these studies attempted to simulate actual working conditions. In this context, WSI was used for varied clinical applications: primarily for surgical pathology, with fewer studies devoted to the use of WSI for interpreting frozen sections, and some for reviewing gynecologic cytopathology cases. Whole slide imaging of histopathology cases was validated for several subspecialties (eg, dermatopathology) and specific uses (eg, grading fibrosis in liver biopsies, identifying Helicobacter pylori on gastric biopsies). In general, these studies included both common and diagnostically challenging cases from different anatomic locations. The average number of cases selected for these validation studies was 95 cases/study (range, 10–633 cases), and the average number of evaluators used to view and interpret whole slide digital images was 7 individuals/study (range, 2–26 individuals). In most of these publications, validation was performed by qualified pathologists, except for one cytology study in which cytotechnologists were employed and one article in which trainees were reported to also participate.
Whole slide imaging has been implemented in several niche settings for clinical service, particularly for remote viewing of intraoperative frozen sections and for second opinion teleconsultation. The overall reported concordance rate between diagnoses made with WSI systems compared with glass slides ranged from 73% to 98%. For different types of slide preparations (eg, hematoxylin-eosin sections of fixed tissue, frozen sections, cytology), our meta-analysis showed no significant difference in the accuracy between WSI and glass slides. The methods selected to validate WSI systems for these different clinical applications included a comparison of the diagnoses based upon the participants' interpretation of the digital image with (1) that rendered by examination of the original glass slide with a conventional light microscope, (2) the diagnosis that was issued in the original pathology report and/or arrived at by expert consensus, or (3) both of these methods. Measurements used by these researchers for the different validation studies included (1) diagnostic concordance between the digital and glass slide for each individual participant (ie, intraobserver variability), (2) diagnostic concordance between the digital image diagnosis and that provided by consensus or a reference pathologist (ie, interobserver variability), or (3) both intraobserver and interobserver variability. In 2 validation studies, investigators also measured the time required to reach a diagnosis, and in 1 frozen section study, the deferral rate for WSI system interpretations was documented.
Consensus Development Based on Evidence
The panel convened 19 times (18 by teleconference and 1 face-to-face meeting) to develop the scope, draft recommendations, review and respond to solicited feedback, and assess the strength of evidence that supported the final recommendations. Nominal group technique was used by the panel for consensus decision making to encourage unique input with balanced participation among panel members. An open comment period was held from July 22, 2011, through August 21, 2011, during which draft recommendations were posted on the CAP Web site. Based upon public feedback (132 respondents; 531 comments), all but 2 recommendations achieved more than 80% agreement (refer to “Outcomes” in Supplemental digital content for full details). The expert panel modified the recommendations based on the feedback and final quality of evidence. Then, an independent review panel, masked to the expert panel and vetted through the conflict of interest process, provided final review of the manuscript, and recommended it for approval by the CAP Transformation Program Office Steering Committee. The final guidelines are summarized in Table 1.
1. All pathology laboratories implementing WSI technology for clinical diagnostic purposes should carry out their own validation studies. (Expert Consensus Opinion)
A large number of variables may affect the performance and usability of WSI systems. If each institution or practice considering the implementation of WSI technology performs its own validation of a WSI system prior to clinical use, this should provide reasonable assurance that these systems will perform as anticipated in its validated setting. Although users should adhere to the manufacturer's recommended protocol for implementing WSI systems, verification solely as suggested by the manufacturer that a WSI device works is insufficient. The laboratory should validate and document the performance of its WSI system in its own specific laboratory environment prior to clinical use. No published data have specifically addressed this issue.
2. Validation should be appropriate for and applicable to the intended clinical use and clinical setting of the application in which WSI will be employed. Validation of WSI systems should involve specimen preparation types relevant to the intended use (eg, formalin-fixed paraffin-embedded tissue, frozen tissue, immunohistochemical stains, cytology slides, hematology blood smears). (Recommendation)
Note: If a new intended use for WSI is contemplated, and this new use differs materially from the previously validated use, a separate validation for the new use should be performed.
A validation study is necessary for each clinical application in order to demonstrate that the WSI system will perform as expected for each intended diagnostic purpose. This is because different types of specimen preparations (eg, formalin-fixed paraffin-embedded tissue, frozen sections, cytology smears) are subject to different artifacts, and pathologists rely on different morphologic features when evaluating these different specimen types. Moreover, each preparation type may require different WSI capabilities in order for a pathologist to make an accurate primary diagnosis.38,51–53 In other words, a validation study used to support the diagnostic use of digitized slides for routine surgical pathology may not necessarily apply to the use of digitized frozen section slides (eg, frozen section slides may have more tissue folds, more mounting medium, more pale staining, or bubbles, any of which could, for example, affect the focusing algorithms of the scanner). It was the consensus opinion of the expert panel, supported by literature evidence (Table 2), that the specimen preparation type was a much more important performance variable than the source of the tissue or the specific analyte being assessed. Thus, a single validation study may suffice to cover a group of similar intended uses, as long as the overall process of preparation and interpretation is the same. For example, when reading digitized immunohistochemistry slides, the study need only validate that digital slides are able to capture the expected chromagen color(s), intensity, and localization on each slide. Each and every stain does not need to be individually validated, so long as it represents the same type of sample preparation. Whole slide imaging should not be used for clinical purposes other than the one validated, unless a validation for that purpose is undertaken.
3. The validation study should closely emulate the real-world clinical environment in which the technology will be used. (Recommendation)
The validation study should be conducted in a manner that mimics how the WSI system will be used in the specific laboratory's work environment (ie, the study should mimic how the system is going to be used after “going live”) (Table 3). The design of a validation study should accordingly take into account the WSI system's intended use at the institution. Hence, if multiple slides are typically reviewed as part of an existing diagnostic process using traditional light microscopy, then all of the slides for such cases should be compared using glass and digital modalities, rather than just preselected “representative” slides. This is important, because approval of the WSI system will be limited to the conditions under which validation occurred. To provide another example, if the WSI system is intended for frozen sections of one or more organ systems, then each case examined during validation should compare the diagnosis rendered at frozen section using a traditional microscope against the diagnosis rendered when examining the same frozen section slides as whole slide images, provided that the pathologist rendering the diagnosis on the digital images has access to the same information when using a traditional microscope (eg, clinical information, specimen location, other pertinent gross examination information as appropriate). If rapid digitization of glass slides is required for clinical use (eg, for frozen sections), then the validation process should include a determination as to whether the WSI system of choice can facilitate accurate diagnosis within the same specified turnaround time parameters. Interlaboratory validation is unnecessary if the equipment is intended to be used at a single location, but use of the equipment between laboratories requires validation that mimics the intended workflow across facilities.
4. The validation study should encompass the entire WSI system. (Recommendation)
Note: It is not necessary to validate separately each individual component (eg, computer hardware, monitor, network, scanner) of the system nor the individual steps of the digital imaging process.
A WSI system is comprised of a slide scanner, computer hardware, software, network, and viewing monitor. Each of these components may impact digital image quality and therefore interpretation. This includes the WSI instrument (eg, scanning resolution, range of z-axis focus), computers (eg, processing speed, memory), network connectivity (eg, bandwidth, firewalls), and workstation display (eg, monitor size, settings, resolution, luminance). Although each of the aforementioned components are important for optimal functioning and usability of the WSI system, there is currently no substantial evidence to indicate that each individual component or step in the imaging process (eg, image acquisition, storage, viewing) needs to be validated separately. Nevertheless, in some published studies selected for analysis, investigators did report validating certain components of their WSI system (eg, Internet connectivity, configuration of computers and monitors used). Our meta-analysis of those studies, however, showed no significant difference in the accuracy of WSI and glass slides when compared with the reference standard (Table 4). Therefore, it is recommended that the validation study should encompass the entire WSI system. The objective of validating the entire WSI system is to ensure that participants validate that the images they are viewing are in focus and of acceptable quality on their monitors.
Although image management, confidentiality, and security are important, policies regarding image storage and purging of image files are not part of the validation process to assure system performance, and are therefore best determined by individual laboratory needs. Nevertheless, laboratories need to be aware that improper image storage may result in loss of images (eg, overwritten or deleted images), images that are unable to be retrieved, or altered image quality and integrity of data (eg, compression), or may limit the ability to share images. Furthermore, the quality of a digitized slide may reflect limitations of glass slide preparation and stained material (eg, tissue folds and air bubbles), which may impair scanning. If an unanticipated failure of the WSI system would have a significant negative impact on its intended use, then the laboratory should develop an alternative mechanism (eg, resort to its downtime procedure) to examine cases (eg, rescan with another backup device or view glass slides using a conventional light microscope if available). Unanticipated failures can include a pathologist's determination that the digital slide is inadequate for interpretation, inability to properly scan the glass slide (eg, slide is broken, tissue too thick), and/or the WSI system not functioning as expected.
5. Revalidation is required whenever significant change is made to any component of the WSI system. (Expert Consensus Opinion)
A completed validation study should provide a means to demonstrate that the WSI system validated can be used for the intended diagnostic purpose. However, whenever there is a significant change to the WSI system (eg, completely new type of scanner is used, major hardware or software upgrade) that may potentially affect the interpretation of digital slides, the validation process should be repeated with these new changes incorporated in the WSI system to demonstrate that it can still be used as intended. When an additional WSI system of the same make and model as a previously validated scanner is to be used in a laboratory that shares the same network, image management software, and intended clinical use, a separate validation study using a smaller set of cases (eg, 20 cases; refer also to recommendation 7) may be adequate to detect any significant differences in scanner functionality. Minor changes can be managed through a facility's change management procedure. No published data have specifically addressed this issue.
6. A pathologist(s) adequately trained to use the WSI system must be involved in the validation process. (Recommendation)
It is essential that the validation process include a pathologist(s) who will actually be using the WSI system to make diagnoses. The purpose of involving a pathologist who has already been trained to use the WSI system is to ensure that the WSI system can be used to make accurate diagnoses from digitized slides. Although the validation process need not involve all pathologists who might use the WSI system, studies involving multiple pathologists were found to provide the most robust and accurate method of assessing digital imaging technology.36 Although it is important that users be trained to use this technology, the personnel required for WSI and how they should be trained are outside the scope of this document. In only some published validation studies was there documentation that pathologists were appropriately trained on using the WSI system. Our analysis showed that when such training was imparted to pathologists, there tended to be greater accuracy when interpreting a WSI (95% with training versus 79% without training), slightly better concordance between WSI and glass slides (89% with training versus 84% without training), and a shorter interpretation time (4.9 ± 1.6 minutes with training versus 11.5 ± 2.5 minutes without training) (Table 5). The validation team may also include other pathology staff (eg, laboratory managers, histotechnologists, trainees), information technology personnel, and/or consultants. Operators (eg, image technicians) who will be asked to scan slides and manage acquired digital images should also be included in the validation process.
7. The validation process should include a sample set of at least 60 cases for one application (eg, hematoxylin-eosin–stained sections of fixed tissue, frozen sections, cytology, hematology) that reflects the spectrum and complexity of specimen types and diagnoses likely to be encountered during routine practice. (Recommendation)
Note: The validation process should include another 20 cases for each additional application (eg, immunohistochemistry, special stains).
The sample size should be adequate to ensure that pathologists can potentially uncover any problems with the WSI system. Providing a reasonable number of cases for observers to view will also benefit their training/experience with the WSI system. Published studies reported using different numbers of cases for evaluation. When an average of 20 cases (range, 10–46 cases) was used, the studies showed a significantly lower accuracy (77%) and concordance (75%) when WSI was compared with glass slides. When an average of 60 cases (range, 52–90 cases) was used, the studies showed an improved accuracy (90%) and far better concordance (95%) comparing WSI with glass slides. However, when investigators used an average of 200 cases (range, 100–633 cases) in their published studies, although the accuracy improved (100%), the concordance between WSI and glass slides (91%) was actually lower (Table 6). Therefore, the panel determined that a validation study should include a sample set of at least 60 cases for one application, which would not be too onerous for any laboratory to perform. If a laboratory intends to use its validated WSI system for another supplemental application (eg, to evaluate immunostains or fluorescence stains as well as hematoxylin-eosin–stained sections), then another 20 cases of this additional application will need to be validated. The lower number for the subsequent applications is justified by the fact that many elements of the system will already have been validated, and the new validation is focused more on issues that may be unique to the new specimen preparation type. It is important that the type of cases used in the validation reflect the spectrum and complexity of specimen types (eg, biopsies and resections) and diagnoses (eg, easy and difficult cases) likely to be encountered during that laboratory's routine operation. This can be accomplished, for example, by retrospectively or prospectively selecting a consecutive series of archived cases for which the participants are blinded to the original diagnoses. Laboratories should avoid selecting only their best cases for a validation study. A case selected for validation may include variable parts (one or multiple) and/or number of slides (one or many). The panel unanimously agreed that it was impractical for laboratories to validate WSI tools for each and every organ system, specific disease, diagnosis, or microscopic finding prior to clinical adoption, as has been recommended by some authors.25
8. The validation study should establish diagnostic concordance between digital and glass slides for the same observer (ie, intraobserver variability). (Suggestion)
For validation purposes, it is necessary to measure the difference (outcome) between making diagnoses with digital slides and with glass slides. Discrepancy rates for second-opinion glass slide review have been reported to range from 1.4% to 30%.42 Also, it has been shown that interobserver variability is often not due to cases being viewed using different technologies, but related to actual differences in diagnostic interpretation.24 Therefore, the panel advocated that it is more important (as a validation criterion) that a pathologist doing the validation be able to reproduce the same diagnosis with both modalities (ie, intraobserver variability) than the same diagnosis for cases provided by another pathologist, “expert,” or group of pathologists (ie, interobserver variability) (Table 7). The aim of the validation study is to achieve a high concordance rate between diagnoses made using glass versus digital slides. An acceptable (pass/fail) concordance rate for pathologists is best determined by the good medical judgment of the pathologist. For discrepancies that may arise during validation, it is important to evaluate and address the root cause of the problem (eg, poor quality of histology slides).
9. Digital and glass slides can be evaluated in random or nonrandom order (as to which is examined first and second) during the validation process. (Recommendation)
Opinions differ as to whether the order in which cases are presented and the modalities used (glass versus digital) in a validation study should be random or fixed. Some pathologists believe that digital slides should be viewed before glass slides, if the latter are to be considered the gold standard for making diagnoses. Others have suggested that it is best to randomize the order of cases evaluated to minimize recall bias, which may confound results.21,42 To date, a few validation studies have opted to evaluate digital and glass slides in random order.20,21,37,39,46 However, the order of viewing virtual versus glass slides has been shown in one study not to have had any effect on interpretation.54 Our meta-analysis of selected articles showed no marked difference in concordance when comparing glass with digital slides viewed in random versus nonrandom allocation. Therefore, our panel felt that laboratories can decide to evaluate their cases in either random or nonrandom order (as to which is examined first and second) for a validation study (Table 8).
10. A washout period of at least 2 weeks should occur between viewing digital and glass slides. (Recommendation)
A washout period refers to the time interval between viewing the same case/slide using a different (glass or digital) modality. It is important to take into consideration that pathologists may recall pathology images for lengthy periods after reviewing a case, particularly difficult ones. This can be overcome by allowing for ample time between cases viewed using different modalities. On the other hand, with long washout periods a pathologist's experience and/or diagnostic criteria could change over time.47 Few studies have reported washout periods while examining WSI and glass slides. Those researchers used washout periods ranging from 1 to 2 to approximately 3 weeks.37,42,45,47 No study compared the outcomes with different duration of washout periods. Our literature review indicated that a washout period of at least 2 weeks showed good accuracy and concordance between WSI and glass slides (Table 9). Because of limited published data, the effect of other washout periods on accuracy and concordance between WSI and glass slides remains unclear. Until further published evidence becomes available, our panel resolved that a washout period of at least 2 weeks was supported and practical for the purposes of a validation study.
11. The validation process should confirm that all of the material present on a glass slide to be scanned is included in the digital image. (Expert Consensus Opinion)
A digitized slide is produced by scanning an entire glass slide or a user-selected area of the glass slide. Rendering an accurate pathologic diagnosis using such a whole slide digital image is feasible only if the image represents an accurate digital reproduction of the scanned glass slide. Therefore, the panel deemed it very important that the validation process make sure that all material on a glass slide is present in the digital image to be used for diagnostic work. If a particular slide is poorly stained or not in focus, the pathologist will notice this and take corrective actions to assure an accurate diagnosis. However, if diagnostic tissue present on a glass slide is absent from the digital image, the evaluating pathologist will not know this and thus will not have an opportunity to correct the error. Therefore, it is important for the validation study to specifically address this issue. Interestingly, no published data have specifically addressed this issue, so this is considered an expert consensus opinion. It is likewise important that the validation process assure that the digital slide being viewed is actually from the glass slide of the case that was scanned (eg, barcode use, slide label scanned along with the material on the glass slide). If protected health information is to be used with the WSI system, then the software, hardware, and policies surrounding its use must comply with requirements of the final security rule under the Health Insurance Portability and Accountability Act of 1996.55
12. Documentation should be maintained recording the method, measurements, and final approval of validation for the WSI system to be used in the clinical laboratory. (Expert Consensus Opinion)
Validation requires confirmation by providing documented evidence that the requirements for a WSI system, when operated within established parameters, have been fulfilled. Documentation should therefore be maintained by the laboratory recording the method, measurements, and final approval of validation for the WSI system in its clinical laboratory. This should also include documentation of training of all intended users of the system. Final documentation of the validation should be approved by the medical director of the laboratory or his/her designee. If laboratories use WSI systems for making diagnoses, it is also recommended that a statement be included in the pathology report indicating that a WSI system was used. There were no published peer-reviewed data on documentation to analyze.
Validation of WSI is recommended to maximize the likelihood that pathologists using this technology to view digitized glass slides can consistently make the same interpretation as they would from viewing the glass slides using a conventional microscope. Validation should address both technical and interpretative components, and must be specific for the intended clinical use. These 12 guidelines will hopefully provide laboratories with a practical guide for validating their own WSI systems for diagnostic work. This guideline was intended to facilitate the safe use of WSI systems in laboratories. Validation of WSI systems will improve their clinical use in pathology by helping pathologists and laboratories determine their effectiveness, thereby reducing the potential risk of misdiagnosis due to artifacts or other unmitigated problems with this technology. Clinical validation should also serve to meet compliance with emerging regulations that pertain to WSI for clinical diagnostic use. However, users of WSI systems for clinical practice should watch for new regulations from agencies such as the Food and Drug Administration that might place different requirements on the validation process for this technology. These recommendations offer, to the best of our knowledge, the first rigorously developed guidelines for pathology laboratories to use in the validation process of WSI systems for diagnostic purposes. However, as WSI systems and their applications in clinical practice continually evolve, so too should their validation process. Future recommendations regarding the validation of related digital pathology systems and applications (eg, image analysis) are anticipated.
The College of American Pathologists Center thanks Tony Smith, MLS, ECMS (AIIM), and James MacDonald, BS, IT for their help in developing these guidelines.
For author conflict of interest disclosures, see the Appendix.