Context

Whole slide imaging (WSI) produces a virtual image that can be transmitted electronically. This technology has clinical applications in situations in which glass slides are not readily available.

Objective

To examine the results of a validation study performed using the draft version of the WSI clinical validation guideline recently released by the College of American Pathologists.

Design

Ten iScan Coreo Au scanners (Ventana Medical Systems, Tucson, Arizona) were validated, 6 with one set of 100 cases and 4 with a different set of 100 cases, for 1000 case examinations. The cases were selected consecutively from the following case types: internal consultations and malignancies and cases with frozen sections, special stains, and/or immunohistochemistry. Only key slides were scanned from each case. The slides were scanned at ×20 magnification. Pathologists reviewed the cases as both glass slides and WSI, with at least a 3-week washout period between viewings.

Results

Intraobserver agreement between glass slides and WSI was present for 786 (79%) of the 1000 cases. Major discrepancies occurred in 18 cases (1.8%). κ statistics compiled for the subset of cases (n = 504; 50%) with concern for neoplasia showed excellent agreement (κ = 0.8782). Individual scanners performed similarly to one another. Analysis of the results revealed an area of concern: small focal findings.

Conclusions

The results were felt to validate the use of WSI for the intended applications in our multiinstitutional laboratory system, although scans at ×20 magnification may be insufficient for cases hinging on small focal findings, such as microorganisms and inflammatory processes.

Whole slide imaging (WSI) is an emerging technology that allows pathology glass slides to undergo robotic scanning with reconstruction of a complete digital replica that can be viewed on a computer screen. The technology allows the findings on a glass slide to be readily available on a server that can be accessed remotely through the Internet, enabling viewers to rapidly gain access to images produced at a distant site or stored in an archive. This capability has already led to extensive adoption of the technology for educational and research purposes, but clinical applications have been relatively slow to develop. Concerns about the diagnostic quality of the whole slide images have been paramount, but issues of cost and information security have also contributed to reluctance to adopt the technology more widely.1  As the technology becomes less expensive, especially for server storage and transmission capacity, and the potential advantages of rapid, remote access to diagnostic materials have become more apparent, interest in clinical use has steadily increased.

Much of the research into the clinical utility of WSI has focused on the use of the technology for remote consultations24  and for frozen section interpretation,58  contexts in which timely transport of glass slides to remote locations is a significant problem. In these settings, suboptimal image quality or WSI interpretation may be acceptable because the alternative would be an inability to offer a valuable pathology service. However, a number of investigators have also looked into the question of whether WSI could replace conventional microscopes even in settings where glass slides would otherwise be readily available. Many studies have focused on only one organ system,921  but cumulatively, they seem to indicate that WSI is broadly comparable to glass slides for most surgical pathology applications. The most comprehensive WSI studies conducted to show the feasibility of replacing microscopes with computer monitors across many specimen types, essentially replicating an entire hospital workload, have shown a high rate of concordance between glass slide and digital image diagnoses,2226  leading to preparation for an all-digital future for surgical pathology.27 

These developments have prompted regulatory concern.1  The US Food and Drug Administration (FDA) has decided to classify whole slide imagers as medical devices requiring approval. Manufacturers are currently in the process of conducting studies for the FDA. The College of American Pathologists (CAP) has also responded by producing a validation guideline through its Pathology and Laboratory Quality Center.28  The final guideline has only recently been published, but preliminary draft consensus statements have been available since July 2011, and those draft consensus statement served as the basis for the design of this validation study. The principal elements of the guideline are as follows: validation should be performed on 60 to 100 cases that represent the spectrum of intended applications, using intraobserver variation as the primary form of analysis, with observers viewing consecutive cases representative of the material intended for scanning on both glass slides and whole slide images, with at least a 2-week washout period between viewings.

Intraobserver variability is the preferred measure of performance selected by CAP for the measurement of WSI against conventional microscopy. Clearly, however, not every disagreement between glass slide and WSI diagnosis is attributable to shortcomings in scan quality or interpretation. Intraobserver variability is a well-known phenomenon in surgical pathology when comparing interpretations of the same glass slide(s) over time. This raises the thorny issue of determining how much intraobserver variability is attributable to WSI and how much is due to the difficulty of interpreting the cases regardless of the mode used to view the slides. Because interobserver variability is also typically increased in “hard” cases, it may be possible to use interobserver variability as a proxy for underlying case difficulty and, thereby, to estimate the portion of intraobserver variability attributable to WSI. This study was designed to enable cross-analysis of intraobserver and interobserver variability for this purpose.

At our institution, the decision to invest in WSI technology was made to facilitate research and educational missions, as well as to improve the efficiency of our clinical workflow. The hospital system includes a large academic tertiary center with numerous subspecialist anatomic pathologists and a pathology residency program, as well as 4 smaller community hospitals in outer portions of the metropolitan area that have only a few pathologists located at each site. Most laboratory operations are centralized at the main hospital, including the processing of surgical pathology blocks and hematoxylin-eosin staining, special staining, and immunohistochemistry. The WSI technology is intended to allow the central laboratory to make same-day staining and processing requests more readily available to outlying hospitals, as well as to enable remote consultations with experts at the main hospital and archiving of cases for rapid access anywhere in the system for the purpose of case review or education.

Although we are not using WSI as a replacement for glass slides for diagnostic purposes, in accord with the current FDA position, we felt it would be prudent to undertake a comprehensive validation of each scanner, 10 in total purchased by our hospital system, before using the scanner to create WSI from clinical case slides. The validation focused on case types likely to lead to pathologist rereview or archiving that would be facilitated by WSI, including intradepartmental consultations and malignancies and cases with frozen sections, special stains, and/or immunohistochemistry. This report summarizes our experience using the CAP guideline for validation of WSI technology in our laboratory.

Our hospital system validated 10 iScan Coreo Au whole slide scanners and accompanying Virtuoso viewing software purchased from Ventana Medical Systems (Tucson, Arizona; formerly BioImagene, Sunnyvale, California). The validation was performed in 3 phases: 2 scanners (20%) were validated in phase 1, 4 scanners (40%) in phase 2, and 4 more scanners (40%) in phase 3. For each phase, 100 cases were scanned on each scanner. The cases were selected consecutively from the following case types that we anticipated scanning in the future: internal consultations and malignancies and cases with frozen sections, special stains, and/or immunohistochemistry. Only key slides were scanned from each case; for instance, only one hematoxylin-eosin slide of 3 for a biopsy case (33%) or only 1 or 2 slides showing the malignancy in a resection case, similar to our anticipated use of the scanners as a consultative and archiving tool. In cases with special stains or immunohistochemistry stains, all stained slides were scanned. The representative slides were selected by the first author (M.J.T.) at the outset of the study. In total, 2 sets of 100 cases were identified. The first set was used in all 3 phases to validate 2 scanners each time (6 total), and the second set was added in phases 2 and 3 to validate 2 scanners each time (4 total; scanners 4, 6, 8, and 10).

The slides were scanned at ×20 magnification (0.46 μm/pixel) at the standard quality setting (representative points to determine the planes of focus rather than recalibrating the plane at every point). Most slides were scanned using automated tissue detection and focus point assignments. Some slides, such as those with limited tissue, negative immunohistochemistry stains, and extensive pen marks, required manual adjustment. Eight of the scanners (80%) were operated at the same location by the first author (M.J.T.). Two other scanners (20%) were operated at remote locations, each by a different operator. The slide images were transferred to a single Dell (Round Rock, Texas) PowerEdge server with 16 GB RAM and a 1 terabyte hard drive. The first author (M.J.T.) collated all of the scans in Virtuoso, and no significant differences were observed among scans performed at different locations. The WSI were made available for viewing via Virtuoso through the hospital intranet. The intranet encompassed all hospitals in the system, and no significant differences were observed when viewing the images at different sites. A server software error was present during phase 1 that caused intermittent, marked slowness of image uploading. That error was identified and corrected before phases 2 and 3. Many different computers were used for viewing slide images. All computers exceeded the minimum specifications for running Virtuoso, including Microsoft (Redmond, Washington) Windows XP operating system and 2 GB RAM. Most monitors had a screen resolution of 1280 × 1084 pixels, but some were higher. Each pathologist used his or her everyday workstation, and special monitors and settings were not employed.

In phase 1, there were 22 reviewing pathologists, all were practicing anatomic pathologists. In phase 1, subspecialty cases, including hematopathology, neuropathology, medical kidney, and transplant biopsies, went only to pathologists actively signing out those case types. In phase 2, there were 4 practicing anatomic pathologists (18%), 8 anatomic pathology or hematopathology fellows (36%), and 10 second-year or above pathology residents (45%) involved in the validation. In phase 3, there were 8 practicing anatomic pathologists (44%), 1 anatomic pathology fellow (6%), and 9 second-year or above anatomic pathology residents (50%) involved in the validation. In phases 2 and 3, subspecialty cases were given to senior residents or fellows with the most experience in those areas. Overall, 57 pathologists participated, including 5 (9%) who participated in 2 separate phases. Each pathologist was provided with a brief history for each case, which was, essentially, the clinical information provided on the original requisition form. The cases were given half as glass slides and half as images, with at least 3 weeks (21 days) before the cases were returned to be viewed again with the other modality. The glass slides given to the pathologists for review were the same representative slides that had been previously scanned for WSI. Pathologists were trained in the use of Virtuoso before the validation but had little other previous experience with WSI. They were made aware that cases were scanned at ×20 magnification and that the ×40 setting was digitally magnified but did not contain more detail than the ×20 image. They were instructed beforehand not to memorize the cases. Each pathologist provided a diagnosis, date, and time spent for each case, although the time was inadvertently not recorded for a small subset of cases. A comment section was also available for each case, but comments were not mandatory.

κ statistics were calculated with the unweighted method of Cohen on a publicly available statistical calculation Web site (VassarStats29 ). The study was approved by the institutional review board.

Intraobserver agreement between glass slide diagnosis and WSI diagnosis was compiled as the primary validation measure. The threshold for considering a difference in diagnosis a disagreement or variance was deliberately kept low to maximize the number available for analysis. For instance, common variances included normal gastric mucosa versus reactive gastropathy and/or chronic gastritis, as well as differences of one grade or stage in the diagnosis of medical liver biopsies. A small subset of major variances that would result in significantly divergent treatment and follow-up, such as neoplasm versus no neoplasm or 2 different neoplasm classes, was also identified. Table 1 shows the intraobserver variance rates by phase, scanner, and pathologist group (practicing versus trainees).

Table 1.

Rates of Intraobserver Agreement by Study Phase, Whole Slide Scanner Used, and Degree of Experience of the Reviewing Pathologist

Rates of Intraobserver Agreement by Study Phase, Whole Slide Scanner Used, and Degree of Experience of the Reviewing Pathologist
Rates of Intraobserver Agreement by Study Phase, Whole Slide Scanner Used, and Degree of Experience of the Reviewing Pathologist

Comparison of intraobserver variability and interobserver variability was undertaken for each phase to gain a better understanding of how much of the disagreement documented in the validation could be attributed to WSI technologic challenges. Table 2 shows the breakdown of how often each pathologist agreed with him- or herself as well as agreement with the pathologist who looked at the same case during the same phase. Overall, of cases in which one pathologist agreed with her- or himself by both modalities and with one of the interpretations by the pathologist with intraobserver disagreement, 59% (91 of 153) of the time the self-consistent pathologist agreed with the glass slide interpretation of the other pathologist and 41% (62 of 153) of the time with the WSI interpretation. Although 100 of the cases (50%) were viewed by 6 pathologists each, and the other 100 (50%) were viewed by 4 pathologists each, the data are categorized by phases because of the many permutations in agreement and disagreement observed for many of the cases in the study. Only 24 of 100 cases (24%) viewed by 6 pathologists had complete agreement across all phases, and only 32 of 100 cases (32%) viewed by 4 pathologists had complete agreement across phases 2 and 3.

Table 2.

Intraobserver Agreement Versus Interobserver Agreement by Study Phases, n = 500

Intraobserver Agreement Versus Interobserver Agreement by Study Phases, n = 500
Intraobserver Agreement Versus Interobserver Agreement by Study Phases, n = 500

κ statistics for agreement were not readily applied to many of the cases because the diagnoses could not be easily divided into discrete categories. Subgroups were identified that would be amenable to κ analysis of glass slide versus WSI interpretations by the same observer, as displayed in Table 3. The largest subset consisted of cases in which confirming or ruling out malignancy was a significant part of the diagnosis (46 of 100 cases [46%] examined in all 3 phases and 57 of 100 cases [57%] examined in 2 phases, 504 cases total). Those cases were mostly general surgical pathology cases to diagnose or rule out carcinoma (79 cases [16%]), but a few were for lymphoma (n = 9; 2%), neuropathology tumors (n = 8; 2%), melanoma (n = 4; 1%), or soft tissue neoplasms (n = 3; 0.6%). This set was analyzed in 2 tiers and 4 tiers. The 4-tier categorization included (1) benign, with no neoplasms; (2) benign or low malignant potential neoplasms; (3) suspicious for malignancy, and (4) malignant tumors. The κ statistics for all cases in the 4-tier categorization was 0.8264 (95% confidence interval [CI], 0.7797–0.8731; excellent). The 2-tier categorization consisted of (1) benign (no tumor and benign tumors) and (2) malignant, with the suspicious cases excluded. The κ statistics for all cases in the 2-tier categorization was 0.8782 (95% CI, 0.8327–0.9237; excellent). Because all consecutive cases with special stains were included, the sets included numerous medical liver biopsies (11 of 100 cases [11%] examined in all 3 phases and 14 of 100 cases [14%] examined in 2 phases, 122 cases total). κ statistics were analyzed for stage and grade. Cases in which a reviewing pathologist did not assign a stage or grade were excluded from κ analysis. Because most medical liver cases were for follow-up of hepatitis C, a 5-category system was applied (grade or stage 0, 1, 2, 3, or 4) with equivocal grading or staging assigned to the higher category. The κ statistic for all cases with grading was 0.6457 (95% CI, 0.4966–0.7948; good). The κ statistic for all cases with staging was 0.6347 (95% CI, 0.5115–0.7579; good). Finally, because of frequent requests for special stains or immunohistochemistry to rule out organisms in gastric and gastroesophageal junction biopsies, many of those cases were included in the sets (17 of 100 cases [17%] examined in all 3 phases and 8 of 100 cases [8%] examined in 2 phases, 134 cases total). The κ statistic for the presence or absence of organisms (Helicobacter pylori or Candida species) was 0.5186 (95% CI, 0.2461–0.7911; moderate) and for the presence or absence of intestinal metaplasia was 0.6792 (95% CI, 0.5368–0.8216; good).

Table 3.

Intraobserver κ Statistics for Neoplasia, Medical Liver Biopsies, and Gastroesophageal Biopsies

Intraobserver κ Statistics for Neoplasia, Medical Liver Biopsies, and Gastroesophageal Biopsies
Intraobserver κ Statistics for Neoplasia, Medical Liver Biopsies, and Gastroesophageal Biopsies

Table 3 shows that no particular scanner had a consistent, clear tendency toward lower κ agreement levels. Because of the relatively few cases per scanner, however, the κ statistics show marked variability and have low statistical validity. Table 3 also presents a breakdown of κ statistics by experience level. This shows that trainees (residents and fellows) did appear to have modestly lower levels of intraobserver agreement than did practicing pathologists.

κ statistics for interobserver variability were also calculated, using the pathologists who looked at the same cases in each phase as the 2 observers. κ statistics can be compared for interobserver variability when looking at glass slides as opposed to WSI images. Note that the WSI images in this comparison would be generated by different machines from the same glass slides. The interobserver κ results are displayed in Table 4. The WSI showed consistently lower interobserver κ results for every category. The largest differences in the κ statistics were seen in grading activity for liver biopsies and organism identification in gastric and gastroesophageal junction biopsies.

Table 4.

Interobserver κ Statistics, With 2 Observers of the Same Cases During Each Phase, for Neoplasia, Medical Liver Biopsies, and Gastroesophageal Biopsies

Interobserver κ Statistics, With 2 Observers of the Same Cases During Each Phase, for Neoplasia, Medical Liver Biopsies, and Gastroesophageal Biopsies
Interobserver κ Statistics, With 2 Observers of the Same Cases During Each Phase, for Neoplasia, Medical Liver Biopsies, and Gastroesophageal Biopsies

Although comments were optional, many comments were recorded. Analysis of the comments points to areas of concern with WSI technology. Of the 137 comments recorded, 49 (36%) were about poor image quality at high magnification, 20 (15%) were about poorly scanned slides not well-visualized even at low power, 26 (19%) were about system slowness, and 42 (31%) were miscellaneous comments. The most common specific complaints about poor visualization at high power related to concerns about scanning for organisms (18 comments [13%]) and distinguishing types of inflammatory cells (5 comments [4%]). Most (15 of 20 comments; 75%) of the complaints about unviewable WSI pertained to immunohistochemistry slides with limited tissue and little or no positive immunoperoxidase staining. The scanners consistently failed to find the plane of focus for those slides even with manual selection of the scanning areas and focus points. Many of the complaints about system slowness were from phase 1 (14 of 26 comments; 54%), attributable to the server malfunction. Complaints about slowness decreased in phases 2 and 3 with a normally functioning server.

Using the feedback from comments and the observation that particular cases repeatedly caused difficulty for multiple observers, it was possible to identify a significant trend in problematic cases: small, focal findings in WSI images repeatedly caused difficulties. The problems were of 2 kinds: (1) failure to notice a clinically significant finding, and (2) inability to correctly diagnose the findings with confidence even though they were identified. Figure 1, A through D, shows side-by-side comparisons of clinically significant focal findings that were missed by multiple (but not all) observers on WSI. The findings are clearly visible in the WSI on high power, as shown in Figure 1, so insufficient scanning on the part of the pathologist appears to be the source of the problem. Figure 2, A through D, shows side-by-side comparisons of organism stains that observers struggled to accurately interpret at high power. The observers looked at those foci on the WSI but frequently came to different conclusions than they did with glass slides and expressed diagnostic uncertainty when filling out the validation forms. The WSI images were screen captures from the Virtuoso software display, which were not modified, except for cropping. The glass slide images were captured by a Leica Microsystems (Wetzlar, Germany) DFC495, 8-megapixel CCD [charge-coupled device] microscope-mounted camera set to produce 3264 × 2448 pixel images with 24 bits/pixel. In addition to cropping, the contrast and gamma correction were modified to closely match the color characteristics of the corresponding WSI image.

Figure 1.

Comparison of focal findings repeatedly not identified on whole slide images. All images show a comparison of a microscope-mounted camera image taken at ×20 magnification (left, A and C) with a screen capture of a corresponding whole slide image at ×20 (right, B and D) from the same hematoxylin-eosin–stained slide. A and B, The top images show a small focus of gastric invasive adenocarcinoma only present in one of several biopsies on the slide. C and D, The bottom images show one of only a few foci of crypt abscess formation and acute inflammatory activity in a colon biopsy.

Figure 1.

Comparison of focal findings repeatedly not identified on whole slide images. All images show a comparison of a microscope-mounted camera image taken at ×20 magnification (left, A and C) with a screen capture of a corresponding whole slide image at ×20 (right, B and D) from the same hematoxylin-eosin–stained slide. A and B, The top images show a small focus of gastric invasive adenocarcinoma only present in one of several biopsies on the slide. C and D, The bottom images show one of only a few foci of crypt abscess formation and acute inflammatory activity in a colon biopsy.

Close modal
Figure 2.

Comparison of organism stains at high power. All images show a comparison of a microscope-mounted camera image taken at ×60 magnification corresponding to 0.073 μm/pixel in the original (left, A and C) with a screen capture of a corresponding whole slide image at ×40 “virtual zoom” from an image scanned at ×20 corresponding to 0.46 μm/pixel in the original (right, B and D). The top images show an immunohistochemical stain for Helicobacter pylori in a gastric biopsy with hematoxylin counterstain. Pathologists consistently interpreted the glass slides as positive (A), but struggled with the whole slide images (B) and either interpreted them as inconclusive or dismissed the staining altogether. C and D, The bottom images show a Gomori methenamine silver stain for fungal elements in an esophageal biopsy. Pathologists regarded the staining as negative or nondiagnostic on glass slides (C) but often interpreted the whole slide image (D) as positive or at least suspicious for Candida.

Figure 2.

Comparison of organism stains at high power. All images show a comparison of a microscope-mounted camera image taken at ×60 magnification corresponding to 0.073 μm/pixel in the original (left, A and C) with a screen capture of a corresponding whole slide image at ×40 “virtual zoom” from an image scanned at ×20 corresponding to 0.46 μm/pixel in the original (right, B and D). The top images show an immunohistochemical stain for Helicobacter pylori in a gastric biopsy with hematoxylin counterstain. Pathologists consistently interpreted the glass slides as positive (A), but struggled with the whole slide images (B) and either interpreted them as inconclusive or dismissed the staining altogether. C and D, The bottom images show a Gomori methenamine silver stain for fungal elements in an esophageal biopsy. Pathologists regarded the staining as negative or nondiagnostic on glass slides (C) but often interpreted the whole slide image (D) as positive or at least suspicious for Candida.

Close modal

Times were recorded for both the time spent viewing glass slides and that spent viewing WSI for 929 of the 1000 cases (93%) reviewed. The average time spent viewing the glass slides was 177 seconds, as opposed to 250 seconds for WSI. The median time for glass slides was 132 seconds, and it was 210 seconds for WSI. Excluding the results for phase 1, which were affected by the intermittent server malfunction, times were recorded for 752 of 800 cases (94%) for phases 2 and 3. The average time in phases 2 and 3 for glass slides was 181 seconds versus 235 seconds for WSI. The median time in this subset was 134 seconds for glass slides and 203 seconds for WSI.

Although there is now a fairly explicit guideline from the CAP regarding the design of validation studies,28  it is by no means clear what constitutes an acceptable amount of intraobserver variability. At this time, each laboratory must decide for itself whether its own results are sufficient to demonstrate that WSI performs comparably enough to glass slides to allow for its use as a substitute in designated situations. This article aimed at making more intraobserver variability data available so that more-specific benchmarks might be established in the future.

Previously published large studies2325  of WSI technology for a diverse set of specimens have shown high levels of intraobserver agreement between glass slides and WSI. This study differs somewhat from the others in that the validation was not designed to determine the comparability of glass slides to WSI in complete consecutive cases with the implied or actual intention of complete substitution of WSI for glass slides. This study was instead focused on a subset of slides from a subset of cases most likely to be representative of the materials scanned in a laboratory that continues to make primary diagnoses by glass slides. In particular, this study used key representative slides from consecutive cases from the following categories: intradepartmental consultations and malignancies and cases with frozen sections, special stains, and/or immunohistochemistry. This selection strategy for cases significantly increased the “difficulty” of the validation slides. The 200 cases chosen for inclusion were culled from more than 900 cases (22%) during the selected period. This increased concentration of challenging cases, along with fewer redundant slides examined, likely contributed to the relatively low intraobserver agreement rates seen in this study (79%; 786 of 1000). Only a small portion of that variance rate, however, would be considered a major discrepancy (1.8%; 18 of 1000). Other laboratories, working under different conditions and using different case types in their validations, will find different results. This is the reason for individualized validation for each laboratory, as currently recommended by the CAP. Results may also vary depending on the information technology infrastructure and staff support available at a given institution. Not only the scanners but also the servers, the intranet and/or Internet connections, and the personal computers of the pathologists have a role in the performance of the system. Our server issues in phase 1, for instance, may have contributed to lower intraobserver agreement because of pathologist frustration with technical aspects of the system not related to the scanners.

The validation study design, using only 2 sets of 100 cases repeatedly examined on 10 WSI scanners through 3 phases, facilitates many useful comparisons. Although intraobserver agreement remained fairly steady across scanners, a tendency toward slightly lower intraobserver agreement can be seen in later phases of the validation. This is felt to be attributable to changes in the composition of the pathologists viewing the cases. Phase 1 used only practicing pathologists, and all cases were viewed by pathologists who actually sign out similar cases in everyday practice. In phases 2 and 3, pathology residents and fellows participated extensively, and pathologists with specialized expertise in areas like neuropathology, medical kidney, and transplantation were not participating. An increase in intraobserver variability would be expected under those conditions. Comparison of the agreement rates and κ statistics for trainees versus practicing pathologists bears that out.

The latter 2 phases also used a second validation set of 100 cases not used in phase 1, which could also have contributed to the slightly lower performance of those phases. However, comparison of the agreement rates and κ statistics for the scanners using the second validation set (scanners 4, 6, 8, and 10) did not show a striking difference between sets, which provides limited support to the notion that sets of cases with similar selection criteria will perform similarly in validation.

Taking into account the differences between validation phases described above, examination of the intraobserver agreement rates and κ statistics across the phases shows that the scanners performed similarly to one another with reproducible results when using the same glass slides. Full validation of 100 cases was performed for each scanner in this study because the CAP draft guideline was ambiguous regarding the need for full validation of each individual scanner. The recently published final guideline28  allows for more-limited validation (20 slides each) of additional scanners of the same make and model. The results of this study provide support for this more-limited validation requirement. Although other details changed between the draft guideline we used and the final published guideline, this is felt to be the only change that would have significantly altered our validation approach and findings had we used the final guideline instead. The reductions in case numbers required in the final version as opposed to the draft and the reduction of the duration of the washout phase from a minimum of 3 weeks to a minimum of 2 weeks should considerably reduce the difficulty of conducting a validation relative to what we experienced. Overall, the CAP guideline requires a high standard of validation that is not typical for anatomic pathology. No doubt, the reason for this was the consideration that, at some point in the future, WSI might replace glass slides with cases signed out entirely based on digital review. We are not currently planning to use WSI in that manner, and such thorough validation may not have been undertaken before the introduction of the draft CAP guideline. Now that the guideline exists, however, it would be wise to consider it a minimum standard for validation because it is likely that the CAP laboratory inspection checklist will change to incorporate that guideline in the future.

The design of this study facilitated the use of interobserver variability as a means of estimating the amount of intraobserver variability attributable to WSI technology issues. Calculation of interobserver κ results for glass slides and WSI for each phase, not surprisingly, showed that the glass slides had higher κ results, indicating more-consistent interobserver agreement. The consistently poorer performance of WSI supports the intuitive supposition that loss of image detail during scanning exacerbates disagreement. The evaluation of organisms in gastroesophageal biopsies and the grading of liver biopsies showed the steepest decrease in interobserver κ between glass slides and WSI. Based on pathologist comments during the study, it seems likely that the decline reflected difficulty in clearly seeing small organisms and inflammatory cells on high power when viewing WSI images created at ×20 magnification.

The data analysis for this study also employed a novel method for evaluating intraobserver variances by using another observer as the gold standard. When one pathologist disagreed with him- or herself on a given case, but the other pathologist had intraobserver agreement, the interpretation of the self-consistent pathologist was used as the “correct” diagnosis for comparison of the 2 discrepant interpretations. Using that method, many (41%; 62 of 153) of the intraobserver variances were shown to be likely due to variances in interpretation of the glass slides, rather than the WSI, because the WSI interpretation was closer to both interpretations of the self-consistent pathologist. Assuming that an equal number of the discrepancies in the WSI cases were attributable to factors that were also present in glass slides, it is possible to estimate that most (approximately 62 of 91 [68%]) of the intraobserver variances found during the study do not derive from technical limitations of WSI. Although imperfect, this method helps to dispel the notion that most intraobserver discrepancies observed during validation are attributable to WSI technology shortcomings.

It is certainly true that WSI has limitations. Identification of problematic cases that frequently produced intraobserver disagreement or diagnostic uncertainty, in addition to an analysis of the comments submitted by validating pathologists, demonstrated one problematic area in particular: the nonidentification or misidentification of small focal findings. Often, the diagnostically significant foci were visible, in retrospect and with knowledge of the diagnosis, in the WSI images. Misinterpretation in those cases resulted from insufficient attention to the critical foci. Several different likely explanations for this phenomenon present themselves. Because these were not “real” cases with consequences to patients in the event of misdiagnosis, pathologists did not have a very strong incentive to look at the cases carefully. The pathologists performing the validation also had limited experience with WSI and may not have been able to “scan” the slides efficiently, leading to failure to thoroughly examine the images. Both of those problems would presumably resolve as pathologists gained more experience with WSI and applied the same diligence to the images as to glass slides. Another possibility is that WSI viewed on a monitor are more disorienting and difficult to comprehensively scan than glass slides under a microscope. If that were the case, more experience would not entirely resolve the problem. Studies of image perception using WSI show greater sophistication of search patterns by more-expert pathologists,3032  demonstrating that skills learned from conventional microscopy translate to WSI. However, that does not prove that WSI can be searched as efficiently as glass slides.

Also of concern are the discrepant interpretations that occurred in cases in which the pathologists did see the focal findings of concern on WSI but were unable to reproduce their glass slide diagnosis. In most cases, that was attributable to a lack of image clarity at magnification above ×20, which is an inherent limitation of the technology. Although “virtual” zoom is allowed in the Virtuoso software, if the slides are scanned at ×20, the ×40 image does not contain more information than the ×20 image and becomes pixilated and unclear. Microorganisms, small cells such as inflammatory cells, and nuclear details in larger cells cannot be seen well. In the case of microorganisms, validating pathologists gave both false-positive and false-negative interpretations for several of the problematic cases and frequently expressed an inability to make a confident diagnosis. Other investigators have noticed similar problems.4,17,21,22  The use of WSI scanned at ×20 magnification alone in situations in which patient care might hinge on high-power interpretations would not be advisable. Pathologists should be cognizant of the limitations of WSI and insist on examination of the original glass slides in such situations. Previous work indicates that scanning at ×40 magnification with 9 stacked levels may come very close to the performance of glass slides, even for Helicobacter pylori organisms.33  Thus, WSI has the potential to overcome that obstacle, but at the cost of greatly increased scanning time and image size.

Finally, this study demonstrated that examination of WSI takes considerably more time than conventional microscopy does. We found a 30% increase in time spent per case (235 seconds versus 181 seconds). Presumably, greater experience with WSI would reduce the time discrepancy by making pathologists more adept. However, much of the delay is no doubt attributable to the need to stream images over a network connection.

Overall, these studies are felt to support uses of WSI originally envisioned at the outset of our validation. Possible exceptions would be special stains and immunohistochemistry for microorganisms as well as biopsies for inflammatory conditions. Further evaluation using ×40 scanning magnification seems warranted for those applications. Based on these findings, however, the use of WSI alone with no reference to glass slides, for the kinds of challenging cases selected for these study sets, could be problematic. There is no currently accepted standard for what level of performance decrement for WSI relative to glass slides is “acceptable” for primary diagnosis purposes. Hopefully, the ongoing publication of WSI studies will lead to more definitive standards.17  In the meantime, the CAP guideline provides a useful basis for the design of validation studies.

1
Cornish
TC
,
Swapp
RE
,
Kaplan
KJ.
Whole-slide imaging: routine pathologic diagnosis
.
Adv Anat Pathol
.
2012
;
19
(
3
):
152
159
.
2
Al Habeeb
A
,
Evans
A
,
Ghazarian
D.
Virtual microscopy using whole-slide imaging as an enabler for teledermatopathology: a paired consultant validation study
.
J Pathol Inform
.
2012
;
3
:
2
.
3
López
AM
,
Graham
AR
,
Barker
GP
,
et al
.
Virtual slide telepathology enables an innovative telehealth rapid breast care clinic
.
Hum Pathol
.
2009
;
40
(
8
):
1082
1091
.
4
Wilbur
DC
,
Madi
K
,
Colvin
RB
,
et al
.
Whole-slide imaging digital pathology as a platform for teleconsultation: a pilot study using paired subspecialist correlations
.
Arch Pathol Lab Med
.
2009
;
133
(
12
):
1949
1953
.
5
Evans
AJ
,
Chetty
R
,
Clarke
BA
,
et al
.
Primary frozen section diagnosis by robotic microscopy and virtual slide telepathology: the University Health Network experience
.
Hum Pathol
.
2009
;
40
(
8
):
1070
1081
.
6
Fallon
MA
,
Wilbur
DC
,
Prasad
M.
Ovarian frozen section diagnosis: use of whole-slide imaging shows excellent correlation between virtual slide and original interpretations in a large series of cases
.
Arch Pathol Lab Med
.
2010
;
134
(
7
):
1020
1023
.
7
Gould
PV
,
Saikali
S.
A comparison of digitized frozen section and smear preparations for intraoperative neurotelepathology
.
Anal Cell Pathol (Amst)
.
2012
;
35
(
2
):
85
91
.
8
Slodkowska
J
,
Pankowski
J
,
Siemiatkowska
K
,
Chyczewski
L.
Use of the virtual slide and the dynamic real-time telepathology systems for a consultation and the frozen section intra-operative diagnosis in thoracic/pulmonary pathology
.
Folia Histochem Cytobiol
.
2009
;
47
(
4
):
679
684
.
9
Al-Janabi
S
,
Huisman
A
,
Vink
A
,
et al
.
Whole slide images for primary diagnostics of gastrointestinal tract pathology: a feasibility study
.
Hum Pathol
.
2012
;
43
(
5
):
702
707
.
10
Al-Janabi
S
,
Huisman
A
,
Vink
A
,
et al
.
Whole slide images for primary diagnostics in dermatopathology: a feasibility study
.
J Clin Pathol
.
2012
;
65
(
2
):
152
158
.
11
Al-Janabi
S
,
Huisman
A
,
Willems
SM
,
Van Diest
PJ.
Digital slide images for primary diagnostics in breast pathology: a feasibility study
.
Hum Pathol
.
2012
;
43
(
12
):
2318
2325
.
12
Camparo
P
,
Egevad
L
,
Algaba
F
,
et al
.
Utility of whole slide imaging and virtual microscopy in prostate pathology
.
APMIS
.
2012
;
120
(
4
):
298
304
.
13
Chargari
C
,
Comperat
E
,
Magné
N
,
et al
.
Prostate needle biopsy examination by means of virtual microscopy
.
Pathol Res Pract
.
2011
;
207
(
6
):
366
369
.
14
Gage
JC
,
Joste
N
,
Ronnett
BM
,
et al
.
A comparison of cervical histopathology variability using whole slide digitized images versus glass slides: experience with a statewide registry
.
Hum Pathol
.
2013
;
44
(
11
):
2542
2548
.
15
Jen
KY
,
Olson
JL
,
Brodsky
S
,
Zhou
XJ
,
Nadasdy
T
,
Laszik
ZG.
Reliability of whole slide images as a diagnostic modality for renal allograft biopsies
.
Hum Pathol
.
2013
;
44
(
5
):
888
894
.
16
Krishnamurthy
S
,
Mathews
K
,
McClure
S
,
et al
.
Multi-institutional comparison of whole slide digital imaging and optical microscopy for interpretation of hematoxylin-eosin-stained breast tissue sections
.
Arch Pathol Lab Med
.
2013
;
137
(
12
):
1733
1739
.
17
Massone
C
,
Soyer
HP
,
Lozzi
GP
,
et al
.
Feasibility and diagnostic agreement in teledermatopathology using a virtual slide system
.
Hum Pathol
.
2007
;
38
(
4
):
546
554
.
18
Molnar
B
,
Berczi
L
,
Diczhazy
C
,
et al
.
Digital slide and virtual microscopy based routine and telepathology evaluation of routine gastrointestinal biopsy specimens
.
J Clin Pathol
.
2003
;
56
(
6
):
433
438
.
19
Nielsen
PS
,
Lindebjerg
J
,
Rasmussen
J
,
Starklint
H
,
Waldstrom
M
,
Nielsen
B.
Virtual microscopy: an evaluation of its validity and diagnostic performance in routine histologic diagnosis of skin tumors
.
Hum Pathol
.
2010
;
41
(
12
):
1770
1776
.
20
van der Post
RS
,
van der Laak
JA
,
Sturm
B
,
et al
.
The evaluation of colon biopsies using virtual microscopy is reliable
.
Histopathology
.
2013
;
63
(
1
):
114
121
.
21
Velez
N
,
Jukic
D
,
Ho
J.
Evaluation of 2 whole-slide imaging applications in dermatopathology
.
Hum Pathol
.
2008
;
39
(
9
):
1341
1349
.
22
Campbell
WS
,
Lele
SM
,
West
WW
,
Lazenby
AJ
,
Smith
LM
,
Hinrichs
SH.
Concordance between whole-slide imaging and light microscopy for routine surgical pathology
.
Hum Pathol
.
2012
;
43
(
1
):
1739
1744
.
23
Fónyad
L
,
Krenács
T
,
Nagy
P
,
et al
.
Validation of diagnostic accuracy using digital slides in routine histopathology
.
Diagn Pathol
.
2012
;
7
:
35
. doi:.
24
Gilbertson
JR
,
Ho
J
,
Anthony
L
,
Jukic
DM
,
Yagi
Y
,
Parwani
AV.
Primary histologic diagnosis using automated whole slide imaging: a validation study
.
BMC Clin Pathol
.
2006
;
6
:
4
. doi:.
25
Jukic
DM
,
Drogowski
LM
,
Martina
J
,
Parwani
AV.
Clinical examination and validation of primary diagnosis in anatomic pathology using whole slide digital images
.
Arch Pathol Lab Med
.
2011
;
135
(
3
):
372
378
.
26
Bauer
TW
,
Schoenfield
L
,
Slaw
RJ
,
Yerian
L
,
Sun
Z
,
Henricks
WH.
Validation of whole slide imaging for primary diagnosis in surgical pathology
.
Arch Pathol Lab Med
.
2013
;
137
(
4
):
518
524
.
27
Stathonikos
N
,
Veta
M
,
Huisman
A
,
van Diest
PJ.
Going fully digital: perspective of a Dutch academic pathology lab
.
J Pathol Inform
.
2013
;
4
:
15
.
28
Pantanowitz
L
,
Sinard
JH
,
Henricks
WH
,
et al
College of American Pathologists Pathology and Laboratory Quality Center. Validating whole slide imaging for diagnostic purposes in pathology: guideline from the College of American Pathologists Pathology and Laboratory Quality Center
.
Arch Pathol Lab Med
.
2013
;
137
(
12
):
1710
1722
.
29
Lowry
R.
VassarStats: Web site for statistical computation
.
http://vassarstats.net/. Accessed March 31
,
2014
.
30
Krupinski
EA
,
Tillack
AA
,
Richter
L
,
et al
.
Eye-movement study and human performance using telepathology virtual slides: implications for medical education and differences with experience
.
Hum Pathol
.
2006
;
37
(
12
):
1543
1556
.
31
Mello-Thoms
C
,
Mello
CA
,
Medvedeva
O
,
et al
.
Perceptual analysis of the reading of dermatopathology virtual slides by pathology residents
.
Arch Pathol Lab Med
.
2010
;
136
(
5
):
551
562
.
32
Treanor
D
,
Lim
CH
,
Magee
D
,
Bulpitt
A
,
Quirke
P.
Tracking with virtual slides: a tool to study diagnostic error in histopathology
.
Histopathology
.
2009
;
55
(
1
):
37
45
.
33
Kalinski
T
,
Zwönitzer
R
,
Sel
S
,
et al
.
Virtual 3D microscopy using multiplane whole slide images in diagnostic pathology
.
Am J Clin Pathol
.
2008
;
130
(
2
):
259
264
.

Author notes

Presented in part at the annual meeting of the United States and Canadian Academy of Pathology; March 3, 2014; San Diego, California.

The authors have no relevant financial interest in the products or companies described in this article.