Context.—Whole slide imaging (WSI) is now used for educational purposes, for consultation, and for archiving and quantitation of immunostains. However, it is not routinely used for the primary diagnosis of hematoxylin-eosin–stained tissue sections.
Objective.—To compare WSI using the Aperio digital pathology system (Aperio Technologies, Inc, Vista, California) with optical microscopy (OM) for the interpretation of hematoxylin-eosin–stained tissue sections of breast lesions.
Design.—The study was conducted at 3 clinical sites; 3 breast pathologists interpreted 150 hematoxylin-eosin–stained slides at each site, 3 times each by WSI and 3 times each by OM. For WSI, slides were scanned using an Aperio ScanScope and interpreted on a computer monitor using Aperio ImageScope software and Aperio Spectrum data management software. Pathologic interpretations were recorded using the College of American Pathologists breast checklist. WSI diagnoses were compared with OM diagnoses for accuracy, precision (interpathologist variation), and reproducibility (intrapathologist variation). Results were considered accurate only if the interpretation matched exactly between WSI and OM. The proportion of accurate results reported by each pathologist was expressed as a percentage for the comparison of the 2 platforms.
Results.—The accuracy of WSI for classifying lesions as not carcinoma or as noninvasive (ductal or lobular) or invasive (ductal, lobular, or other) carcinoma was 90.5%. The accuracy of OM was 92.1%. The precision and reproducibility of WSI and OM were determined on the basis of pairwise comparisons (3 comparisons for each slide, resulting in 36 possible comparisons). The overall precision of WSI was 90.5% in comparison with 92.1% for OM; reproducibility of WSI was 91.6% in comparison with 94.5% for OM, respectively.
Conclusions.—In this study, we demonstrated that WSI and OM have similar accuracy, precision, and reproducibility for interpreting hematoxylin-eosin–stained breast tissue sections. Further clinical studies using routine surgical pathology specimens would be useful to confirm these findings and facilitate the incorporation of WSI into diagnostic practice.
The practice of surgical pathology relies on image-based light microscopy diagnosis, which enables the detailed evaluation of the cytologic and architectural features of hematoxylin-eosin (H&E)–stained tissue sections. Significant recent technological advancements have allowed for the acquisition and storage of high-quality digital images. Several commercially available platforms are available for scanning H&E-stained tissue sections, generating digital whole slide images (WSI) for viewing and interpretation.1 The field of digital pathology is rapidly evolving toward the creation of a digital environment for interpreting and managing pathologic information contained in a glass slide.
Several applications of digital pathology are currently being embraced by pathologists. Whole slide imaging is commonly used for pathology education in medical school, pathology workshops, and slide conferences.2–7 It is also gaining popularity for multidisciplinary tumor board presentations at several institutions, and a digital breast histopathology atlas is available online.8 Whole slide imaging is also useful for remotely reading intraoperative frozen sections, for evaluating surgical pathology quality assurance programs, and for proficiency testing.9–14 Another routine application in many clinical pathology laboratories is to analyze immunostained slides to determine the protein expression of markers such as estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2, which are routinely evaluated for selecting patients with breast cancer for targeted therapy. 15 Whole slide imaging is also becoming popular for obtaining second opinions from expert pathologists.16,17 Several applications for WSI exist in surgical pathology, but it is not routinely used to render primary histologic diagnoses. Whole slide imaging's diagnostic utility and acceptability by pathologists depend on its performance compared with conventional optical microscopy (OM). Interpretive concordance is crucial for the adoption of the WSI platform in routine surgical pathology practice. To this end, we conducted a multi-institutional study to determine the accuracy, precision, and reproducibility of WSI using the Aperio Digital Pathology System (Aperio Technologies, Inc, Vista, California) compared with those of conventional OM; the same tissue sections were used, representing a variety of benign and malignant breast lesions.
MATERIALS AND METHODS
This study was conducted at 3 Clinical Laboratory Improvement Amendments–qualified clinical sites (2 major academic cancer centers and 1 large community hospital) with institutional review board–approved protocols and waiver of informed consent from the patients. The Aperio Digital Pathology System was used to scan and store WSI. This system included a ScanScope XT digital slide scanner, Spectrum software (to store, manage, and access digital slides), ImageScope software (to view digital slide images), a digital slide repository server, and viewing workstations, including liquid crystal display monitors. The ScanScope XT instrument is a purpose-built slide scanner, whereas the server and monitors were off-shelf commercial products provided by Aperio as standard components. The images were captured at a magnification of ×20 or ×40 with a ×20/0.75 NA Plan Apo objective. The resolution of the captured images was 0.50 μm/pixel for ×20 and 0.25 μm/pixel for ×40 magnification. The eSlide file formats included standard pyramid-tiled TIFF (SVS) with JPEG2000 or JPEG compression.
At each site, pathologists used their own OMs. The study included the participation of 3 board-certified pathologists with special expertise in breast pathology, 1 of whom served as the principal investigator at each site. One or more laboratory technicians or research assistants at each site, under the direction of the principal investigator, pulled, labeled, and scanned slides to create digital slide images. At each site, 150 good-quality H&E-stained breast tissue sections (optimally cut and stained, without artifacts) were used. These cases represented normal tissue and benign and malignant breast lesions: consecutively accessioned breast specimens from routine workflow in 1 site and archived specimens in the pathology laboratory in 2 sites. For the 2 sites at which slides were chosen from the archives, the principal investigator at the site chose a cross-section of cases representative of the different possible categories in the College of American Pathologists breast checklist. The cases were chosen to maximize the possibility that cases representing each possible interpretation in the checklist were selected for the study. For both archival and routine work flow cases, one representative well-cut and -stained H&E-stained tissue section of each accession was used in the study.
We used H&E-stained tissue sections prepared from core needle biopsies, surgical excisions, and mastectomy specimens. Digital WSI were obtained by scanning H&E-stained slides of breast tissue with the Aperio ScanScope instrument; digital images were managed by Aperio Spectrum and evaluated using Aperio ImageScope software on a computer monitor. Each slide was diagnosed 3 times using WSI and 3 times using OM by each pathologist. After the slides had been evaluated using one modality, the label and sequence were randomized for interpretation by the alternate modality to avoid interpretation bias. A washout period of at least 7 days was required between reads.
A diagnostic recording method was created to record the primary histologic diagnosis on the basis of the College of American Pathologists breast cancer checklist. Table 1 shows the College of American Pathologists breast checklist that was used to record the results in the study. Adhering to this checklist enabled consistent recordings and comparisons using WSI and OM. A diagnosis comparison method was also created to allow any 2 diagnoses to be compared in a consistent manner. The raw data captured from the studies were collected in an XML database, as created and updated using the diagnosis recording tool.
We compared the WSI and OM interpretations for accuracy, precision (interpathologist variation), and reproducibility (intrapathologist variation) at classifying breast tissue as no evidence of carcinoma, noninvasive carcinoma, or invasive carcinoma. All interpretations made using WSI were compared, pairwise, with those made using OM. The percentage of agreement was computed as WSI accuracy. Similarly, all OM interpretations were compared, pairwise, with all other OM diagnoses of the same slide. The percentage of agreement was computed as the measurement of OM accuracy. All interpretations made using WSI by each pathologist were compared, pairwise, with all other WSI interpretations of the same slide made by other pathologists. The percentage of agreement was computed as the interpathologist variation (precision). Similarly, all OM interpretations made by each pathologist were compared, pairwise, with all other OM interpretations of the same slide made by other pathologists. The percentage of agreement was computed as the OM interpathologist variation. All WSI interpretations were compared, pairwise, with all other WSI interpretations made of the same slide by the same pathologist. The percentage of agreement was computed as the WSI intrapathologist variation (reproducibility). All OM interpretations made by the same pathologist were evaluated to determine OM precision. The percentage of agreement was computed as the OM intrapathologist variation (reproducibility). We recorded only the interpretation of the slides using the College of American Pathologists breast checklist; the time taken for the interpretation using WSI or OM was not recorded in this study.
We also assessed Cohen κ statistics to summarize the interrater agreement between WSI and OM with respect to the 3 groups of no carcinoma, noninvasive carcinoma, and invasive carcinoma among the readers.
RESULTS
This study included 150 H&E-stained breast tissue slides from the routine workflow at 1 clinical site and the pathology archives at the other 2 sites. The determination of accuracy of WSI in comparison with OM resulted from making 9 comparisons per slide per pathologist at each site, amounting to a total of 81 comparisons per slide for all the pathologists at all sites. The accuracy of OM was determined by comparing the reads made using OM with each other, including 3 pairwise comparisons per slide per pathologist at each site, resulting in 36 possible comparisons. The accuracy of WSI compared with OM ranged from 85.8% to 97.2%; that of OM compared with itself ranged from 86.7% to 98.7%. These results are summarized in Table 2 and depicted in Figures 1 and 2. Precision (interpathologist variation) was determined by comparing the 3 reads per slide per pathologist using WSI and OM; this resulted in 36 pairwise comparisons. The precision of each pathologist for WSI ranged from 83.5% to 96.4%; that of OM ranged from 86.7% to 98.7%. These results are summarized in Table 3. Reproducibility of WSI ranged from 83.5% to 96.1% and that of OM ranged from 91.8% to 97.3% (Table 4).
The accuracy, precision, and reproducibility of WSI diagnoses compared with OM reads, for all pathologists, are summarized in Table 5. Overall, the WSI reads corresponded with the OM reads at a rate of 90.5%, and the OM reads corresponded with themselves at a rate of 92.1%. For precision, 92.1% of the pairwise OM comparisons were the same, and 90.5% of the WSI comparisons were the same. For reproducibility, 94.5% were similar for OM and 91.6% for WSI.
The κ statistics averaged across 3 readings indicating the agreements between the diagnosis rendered using WSI and that rendered using OM were 0.89, 1.00, and 0.95 for the 3 pathologists in site 1; 0.95, 0.82, and 0.87 in site 2; and 0.79, 0.79, and 0.78 in site 3. Figure 3 illustrates these averaged κ statistics. These results indicate substantial agreement between the 2 modalities for the interpretation of H&E-stained tissue sections of different types of breast lesions.
A detailed analysis of the cases that were associated with variability in the readings using the same platform or between the 2 platforms enabled us to determine the types of lesions that resulted in the differential diagnoses. Discrepancy in interpretations between reads using either WSI or OM by the same pathologist and between pathologists occurred with selected lesions such as florid hyperplasia, atypical ductal hyperplasia, ductal carcinoma in situ with or without microinvasion, and papillary lesions. Figure 4, A, through C, illustrates some of the cases that led to discrepancy between the reads using either WSI or OM. Focal changes of in situ and invasive carcinoma in a tissue section were prone to be missed using either of the 2 platforms. The discrepancy in interpretations among different reads by the same pathologists and by different pathologists at each site occurred with both platforms. However, such discrepancies occurred slightly more often with WSI than with OM, resulting in marginally superior results of accuracy, precision, and reproducibility of OM in comparison with WSI.
COMMENT
Conventional OM allows viewing of fields representing areas of a stained tissue section by manually navigating rapidly through the slide by panning through the fields at different magnifications to evaluate the histologic features in the tissue section. Several factors must be considered when making a histologic interpretation from a slide, including the nature and quality of the image, the experience of the examiner, and the ease of navigation. When viewed at high resolution, a single stained tissue section in a slide has literally thousands of individual fields that are potentially available for inspection. In the reading process, the tissue section is viewed at low resolution, and areas to examine at higher magnifications are selected. A digital pathology system provides similar magnification and navigation capabilities to those of OM, with the help of a navigation aid such as a mouse. However, the means of displaying images differs in that a computer monitor is used instead of the eyepiece of an OM. The mechanisms for navigating the image also differ between the 2 platforms, in that physical knobs operated with the hand are used with the former and a computer keyboard and mouse are used with the latter.
The Aperio Digital Pathology System used in our multi-institutional study was perceived by the study pathologists to be user friendly and easy to become familiar with. The much larger field of vision at any given magnification, a wider range of magnification, and the ease of measuring and annotating any given area of interest in the tissue section were recognized as distinct advantages of the digital platform. The image quality of the H&E-stained tissue sections was similar to that of traditional light microscopy, with insignificant differences in sharpness and resolution. Minor reductions in sharpness or depth of focus with WSI did not pose significant problems and were not considered confounding factors for rendering pathologic interpretations in the study. However, WSI was perceived to be more time consuming than was OM, perhaps because of the inherent learning curve associated with the new technology. Because the exact time taken to render a histologic interpretation was not recorded in this study, a strict comparison between the 2 platforms could not be made. The time factor was recognized by all study pathologists as a potential limitation that may discourage pathologists from adopting the digital platform in routine surgical pathology practice.
In this large multi-institutional study, we demonstrated that WSI and OM have similar performance, with substantial equivalence, in interpreting H&E-stained tissue sections of benign and malignant breast lesions. The overall accuracy, precision, and reproducibility of the histologic interpretations were only marginally higher with OM than with WSI. The discrepancies could have resulted from failure to identify focal changes in tissue sections or from general variability in categorization. The latter issue led to discrepancies in diagnosing florid hyperplasia, atypical ductal hyperplasia, ductal carcinoma in situ with microinvasion, and papillary lesions and is well recognized in the field of breast pathology. The subjectivity of interpreting breast lesions when quantitative criteria are used (atypical ductal hyperplasia versus ductal carcinoma in situ and papilloma versus papilloma with atypia or papillary carcinoma) has been well addressed in the pathology literature.18–20 The cases that were discrepant in our study were the types of lesions that generally cause variability in interpretation. It is to be noted that variability in the histologic interpretations, which resulted from failure to recognize focal tissue changes or changes in the final categorization between reads, occurred in both WSI and OM. However, these variations occurred more often with WSI than with OM; thus, OM performed slightly better than did WSI overall.
Several previous studies have compared the performance of pathologists using WSI and OM. These studies used tissue from different anatomic sites, such as the breast, prostate, skin, lungs, and stomach.21–26 Similar to that in our study, the interpretive agreement between the 2 platforms was high: more than the 90th percentile. The concordance between WSI and OM was reported to be lower for nonneoplastic lesions and inflammatory lesions in previous studies. This was previously attributed to the difficulty in identifying finer details such as the nature of inflammatory cells, nuclear atypia, and the presence of apoptotic bodies because of reduced image resolution at higher magnifications with older imaging technologies. The less-equivalent image quality of digital images was cited by many investigators in these previously reported studies as the major cause of WSI's slightly poorer diagnostic performance. We did not find that image quality was a major factor in diagnostic discrepancy between the 2 platforms in this current investigation. The same H&E-stained tissue sections resulted in diagnostic variation using both platforms. The cause of the higher discrepancy rate for WSI than for OM in this study is not entirely clear but is most likely related to the use of a new modality for interpretation instead of the familiar and routinely used light microscope.
On the basis of the available evidence from studies of WSI and OM for making primary histologic diagnoses, WSI seems to be at least a suitable adjunct to OM for making a primary histologic diagnosis. The evolving roles of WSI for the remote interpretation of frozen sections, intraoperative histologic diagnosis, and surgical management of patients, with a high diagnostic accuracy, are encouraging. Similarly, the increasing popularity of WSI for second opinion consultation of expert pathologists is testimony to the fact that WSI can be used successfully for interpreting stained tissue sections. Familiarizing pathologists with digital pathology and the incremental step-by-step use of WSI for interpreting specimens such as core biopsies in the first step may facilitate the incorporation of a digital platform for routine surgical pathology workflow.
This multi-institutional study is one of the largest systematic studies of WSI compared with OM for making histologic interpretations of breast tissue stained with H&E. This study provides valuable data on the comprehensive evaluation of accuracy, precision, and reproducibility of WSI compared with OM. Although we found the performance of the pathologists using WSI to be comparable with that using OM, OM was marginally superior to WSI. However, our study did not address the time taken to diagnose using WSI compared with OM, which needs to be considered in future studies. Other discreet components of digital pathology, such as scanning, data storage, and integration with existing laboratory informatics systems, should be studied in the future as well. Several reviews have addressed the benefits and issues related to incorporating digital pathology into an anatomic pathology practice.27–32 Our large validation study demonstrates substantial equivalence between the 2 platforms in key interpretive areas. Prospective studies using a design similar to that of our large multi-institutional study are needed to further validate our data regarding the comparison of WSI with OM in different tissue types.
References
Author notes
Dr Liang is a clinical consultant for Aperio Technologies Inc, Vista, California. The other authors have no relevant financial interest in the products or companies described in this article.
Competing Interests
Presented at the annual meeting of the United States and Canadian Academy of Pathology; March 2012; Vancouver, British Columbia, Canada.