Context.—

Despite several studies focusing on the validation of whole slide imaging (WSI) across organ systems or subspecialties, the use of WSI for specific primary diagnosis tasks has been underexamined.

Objective.—

To assess pathologist performance for the histologic subtyping of individual sections of ovarian carcinomas using a light microscope and WSI.

Design.—

A panel of 3 experienced gynecologic pathologists provided reference subtype diagnoses for 212 histologic sections from 109 ovarian carcinomas based on optical microscopy review. Two additional attending pathologists provided diagnoses and also identified the presence of a set of 8 histologic features important for ovarian tumor subtyping. Two experienced gynecologic pathologists and 2 fellows reviewed the corresponding WSI images for subtype classification and feature identification.

Results.—

Across pathologists specialized in gynecologic pathology, concordance with the reference diagnosis for the 5 major ovarian carcinoma subtypes was significantly higher for a pathologist reading on a microscope than each of 2 pathologists reading on WSI. Differences were primarily due to more frequent classification of mucinous carcinomas as endometrioid with WSI. Pathologists had generally low agreement in identifying histologic features important to ovarian tumor subtype classification with either an optical microscopy or WSI. This result suggests the need for refined histologic criteria for identifying such features. Interobserver agreement was particularly low for identifying intracytoplasmic mucin with WSI. Inconsistencies in evaluating nuclear atypia and mitoses with WSI were also observed.

Conclusions.—

Further research is needed to specify the reasons for these diagnostic challenges and to inform users and manufacturers of WSI technology.

More than 90% of ovarian malignancies consist of epithelial cancers,1  which are further classified into 5 major subtypes, namely high-grade serous carcinoma (HGSC), low-grade serous carcinoma (LGSC), clear cell carcinoma (CCC), endometrioid carcinoma (EC), and mucinous carcinoma (MUC). Some less common variant subtypes include seromucinous, undifferentiated, mixed carcinomas, and carcinosarcoma. Accumulated data indicate that these subtypes are distinctly different diseases,2  each with its own characteristic epidemiology, risk factors, prognosis,3  precursor lesions,4  molecular events during oncogenesis, biomarker expression patterns,2  clinical behavior, and response to chemotherapy.2,5  These findings have fueled enthusiasm for the development of targeted therapies directed at specific ovarian carcinoma subtypes,6  aiming to improve the survival rate of ovarian cancer, which has remained stagnant for the last 20 years.7,8  Accurate subtype classification has accordingly become more clinically important.

Ovarian carcinoma subtype classification is considered a challenging task for a number of reasons, including the relatively low incidence of ovarian cancer (breast cancer has a 10-fold higher incidence in the United States8 ), its heterogenous nature evident by admixtures of different morphologic patterns side by side within an individual neoplasm,9  and the rarity of some of the subtypes.10  As a result, the reproducibility of ovarian carcinoma subtype classification was historically poor.11,12  However, findings from numerous studies10,1318  during the last decade or so have demonstrated improved reproducibility as the histologic criteria for the 5 major subtypes have been refined and validated using immunohistochemical and molecular techniques.

One area that has been underexamined is the histologic subtype classification of ovarian carcinomas using digital pathology review. Advances in whole slide imaging (WSI)19  and the recent US Food and Drug Administration clearances of the first 2 WSI systems for primary diagnosis have accelerated efforts to incorporate digital pathology into clinical practice.20  However, there are limited data on the use of WSI for this task. Ordi et al21  conducted a validation study of WSI for primary diagnosis in gynecologic pathology and concluded that diagnosis of gynecologic specimens by WSI is accurate and may be introduced into routine diagnosis. However, the study included a total of 4 ovarian malignant cases, which clearly does not constitute an adequate sampling of ovarian carcinoma subtypes. The large study by Mukhopadhyay et al,22  which provided evidence for the first WSI system authorized by the US Food and Drug Administration for primary diagnosis use,23  included 30 ovarian malignant cases without any further information on the subtype breakdown. Of those 30 cases, 6 had major discrepancies between WSI and the reference standard.22  This result suggests that a larger study specific to ovarian cancer assessment with WSI using a representative cohort of ovarian carcinoma subtypes is warranted.

Another underexamined aspect is the reproducibility of ovarian histologic subtyping across pathologist experience and subspecialization. In the United States, the degree of specialization in anatomic pathology practice varies widely—in some centers pathologists practice across the entire spectrum of anatomic pathology, and in others there are varying degrees of subspecialization, with pathologists focusing on a few or even a single specific area of subspecialization, such as gynecologic pathology. Kommoss et al24  reported discrepancies in histologic subtyping between general and specialist gynecologic pathologists in 28% (128 of 454) of patients. However, the study was conducted prior to the updated guidelines by the World Health Organization (WHO), which separated the category of LGSC and addressed a number of problematic areas in subtyping. Patel et al25  also reported significant discrepancies between general and gynecologic pathologists for histologic subtyping of 58 ovarian carcinomas. The Patel et al study did not include any LGSC or MUC cases.

In this manuscript we present a multisite observer study that examined the histologic subtyping of a representative cohort of ovarian carcinomas by a diverse group of pathologists in terms of experience and subspecialty. Pathologist reviews were conducted with both microscope and digital review of WSI-scanned slides. For each modality, concordance rates were calculated between the subtype classifications by observers and those by a panel of experienced gynecologic pathologists. Comparisons were then made between the resulting concordance rates. Even though several studies have focused on validation of WSI across multiple organ systems or subspecialties,21,22,2629  this is the first study that focuses specifically on validation of WSI for a subspecialty primary diagnosis task. Subgroup analyses included concordance of observers across cases for which the expert classification was unanimous versus nonunanimous, thus exploring the impact of diagnostic difficulty on findings. In addition to the subtype diagnosis, the study examined interpathologist variability in the identification of morphologic features relevant for ovarian subtyping, which was one of the major sources of disagreement in a study of discrepant cases.44 

Ovarian Carcinoma Cohort

This study included microscope and digital review of 212 individual sections from 109 ovarian carcinoma cases. As previously described in Seidman et al,10  the cohort was created by accruing 60 consecutive cases from the gynecology and gynecologic oncology service of a large community and tertiary care hospital, and then enriching for the less common subtypes (non–high-grade serous carcinomas), also as they appeared consecutively. The cases included 69 single-section and 40 multisection cases (144 sections, 2–6 sections per case), for a total of 213 sections derived from formalin-fixed, paraffin-embedded tumor tissue and stained with hematoxylin and eosin. The sections were selected by 1 of the experts as representative of the primary ovarian tumor. The average number of sections per case varied across subtypes (1.7 for HGSC, 2.0 for LGSC, 3.1 for MUC, 1.6 for EC, and 1.9 for CCC). One of the tissue slides could not be scanned and was excluded from this study.

WSI Acquisition

The remaining 212 tissue glass slides (sections) in the cohort were digitized at ×40 using a Hamamatsu NanoZoomer 2.0 HT (Hamamatsu Photonics, Bridgewater, New Jersey) located at the Tissue Array Research Program, Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland. Basic scanner characteristics31  included a scanning resolution of 0.23 μm, a halogen 3250-K light source, an Olympus 20× UPlan Apo objective lens with 0.75 numerical aperture, and a 3-CCD camera referring to the use of 3 separate charge-coupled devices (CCDs), each one taking a separate measurement of red, green, or blue.

Study Design

Pathologists from 5 different sites with varying experience in gynecologic pathology participated in this study. The study was designed for the review of individual sections (section-based review) instead of a case-based review. This strategy reflects a trade-off between maintaining the search aspect of a pathology review of an entire case versus narrowing the search area to single sections to reduce interobserver variability caused by differential weighing of the presence of histologic features in different tumor areas. The latter was particularly relevant in this study because of the intratumoral heterogeneity known to exist in ovarian cancer.9  Sections were reviewed in a randomized order with a control to avoid sections from the same cases being read consecutively. Even though immunohistochemistry has been shown to improve the interobserver reproducibility of ovarian carcinoma classification,13  it was not used in this study to refine classification because the focus was on differences between modalities for the histologic review of hematoxylin-eosin–stained tissue.

The study was conducted in 3 phases. In the first phase, 3 experienced gynecologic pathologists (EGyn1, EGyn2, and EGyn3), all of whom were co-authors of the 2014 WHO guidelines for the classification of gynecologic tumors,32  reviewed each slide independently in a randomized order on a microscope and provided subtype diagnosis. The majority consensus diagnosis by this panel (a subtype diagnosis agreed by at least 2 of 3 pathologists) defined the reference diagnosis for the histologic subtype for each section.

In the second phase of the study, pathology review on the microscope was expanded to include the identification of 8 histologic features important for ovarian subtype classification. The features are defined below. This type of review was conducted by 1 of the 3 pathologists involved in the initial classification who was also available for this phase of the study (EGyn1), and 2 additional board-certified pathologists from a different academic hospital. One had 3 years of clinical practice in gynecologic pathology (GynP), and the other had specialized in hematopathology (HemP) and had completed general anatomic pathology training 20 years prior. All 3 pathologists reviewed all 212 tissue slides and provided a subtype classification and the presence or absence of any of the histologic features.

In the third phase of the study, 4 pathologists conducted a digital review of all 212 WSIs in randomized order and were instructed to report on the presence or absence of any of the 8 histologic features and to also provide a subtype diagnosis for each WSI. This group of pathologists consisted of 1 of the pathologists in the first group (EGyn1), an experienced gynecologic pathologist who was also a coauthor of the WHO guidelines (EGyn4), a fellow in gynecologic pathology (FeGynP), and a fellow in surgical pathology (FeSurgP).

Histologic Feature Review

The following 8 histologic features were selected after literature review and consultation with the senior author for their ability to distinguish between ovarian carcinoma subtypes. (1) High-grade atypia, based on 3-fold nuclear size variation: this feature is considered the primary feature for distinguishing between high-grade and low-grade serous carcinomas.33  (2) High mitotic count (benchmarked at >12 per 10 high-power fields [×40 objective], although this was not specifically counted): this feature is considered the secondary feature for distinguishing between high-grade and low-grade serous carcinomas when there is doubt about the nuclear grade.33  (3) Intracytoplasmic mucin: this feature supports the diagnosis of MUC.34  (4) Hyalinized stroma: this feature can support the diagnosis of CCC rather than HGSC.34  (5) Clear cell architectural patterns (hobnail, tubulocystic, or tubulopapillary patterns): these patterns define the diagnosis of CCC by architecture but do not include the cytoplasmic clearing.35  (6) Sarcomatous component, characterized by malignant features typical of sarcomas, with or without more specific differentiation: this feature supports the diagnosis of carcinosarcoma rather than HGSC.36  (7) Squamous differentiation in the form of morules or aggregates of spindle-shaped epithelial cells32 : this feature favors EC rather than MC or HGSC.34  (8) Morphology suggestive of endometriosis: this feature favors EC rather than MC34  and often, in low-stage ovarian carcinomas, displays endometriotic cysts showing an intact endometrial lining, stroma, and scattered foci of fresh or old hemorrhage. Features diagnostic of endometriosis were not required to be present.

The features above were not quantified, and therefore a small focus of a feature was sufficient to designate it as present. For this review pathologists searched for each of the 8 features and reported any that were present in a given whole slide image of a scanned slide.

Digital Review

For digital review, WSI images were displayed to the observers in each site on the same model and type of monitor to reduce variability associated with the use of different monitors. The monitor was an LED-backlit LCD HP Z24x 24-inch monitor (Hewlett-Packard, Palo Alto, California) with basic specifications, including native resolution of 1920 × 1200 at 60 Hz, 16:10 aspect ratio, and 1.07 billion color support. Each monitor was calibrated using an Eye-One calibration kit (X-Rite, Tewksbury, Massachusetts).

A customized interface based on the Philips Pathology Education Tutor (previously marketed as Path XL Tutor, Philips North America Corporation, Andover, Massachusetts) was used for WSI review and for collecting observer responses from a questionnaire that was shown alongside each WSI. Within the software, users were able to pan and view WSIs at multiple magnifications (4× up to 40×, native WSI resolution). The same computer and graphical interface were used by all observers. The questionnaire provided to observers instructed them to perform 2 tasks. For the first task, the instruction was to “Please select all histologic features present in this section.” Response choices included the 8 features described above along with a “None of these” option. For the second task, observers were instructed: “Assuming the current section is the only section available, please review and select the subtype of the tumor.” Response choices included HGSC, LGSC, MUC, EC, CCC, seromucinous carcinoma, carcinosarcoma, malignant Brenner tumor, undifferentiated, or mixed. The Figure shows a screenshot of the graphical interface used in the digital review phase of the study.

Screenshot of the review questionnaire used in the study for collecting pathologist ovarian subtype classifications.

Screenshot of the review questionnaire used in the study for collecting pathologist ovarian subtype classifications.

Close modal

Observer Training

All pathologists were reminded of the instructions for handling problematic areas as outlined in Seidman et al.10  The 2 nonexpert pathologists in the second group (GynP and HemP) were provided with training in the form of a PowerPoint (Microsoft, Redmond, Washington) presentation showing multiple fields of view from each subtype along with textual descriptions and visual illustrations of relevant histologic features.

The training for the pathologists in the third group in the use of WSI consisted of a 2-step procedure conducted using the same interface on the same workstation. For the first step of training, observers were presented with digital textbook descriptions of ovarian subtypes along with images of subtype-specific histologic features and had access to 3 WSIs of each subtype for review practice. The second step of observer training consisted of an interactive practice session on 20 full-size WSIs within the digital viewing software, where observers were asked to repeat their review until the answer provided matched the reference standard. In addition to familiarity with the diagnostic task, this session provided familiarity with the controls and digital interface.

Statistical Analysis

Comparison of Pathology Review for Subtype Classification Using Microscope and WSI

This analysis compared the concordance rates against the expert panel's subtype classifications of the observers using the microscope to the concordance rates of the observers using WSI. Two different approaches were used to calculate concordance with the reference standard by the expert panel. The first approach was to derive a reference subtype classification, determined as the consensus subtype classification for each section agreed upon by at least 2 of 3 expert panelists. For this analysis, concordance between a pathologist and the reference standard was defined as the percent of sections in the cohort for which the subtype classification by an observer was the same as the reference subtype classification. Differences in concordance were then calculated across observers using a microscope and observers using WSI. This approach required consensus by at least 2 of 3 pathologists on subtype classification; however, in some instances there was no consensus by the 3 experts (each one provided a different classification). Those cases, hereafter referred to as undetermined, had to be excluded from this analysis.

In order to account for uncertainty in deriving a reference diagnosis and avoid excluding any cases, a second approach to calculate concordance with the reference standard was used. The second approach calculated the average concordance of observers with each of the 3 expert panelists and compared the average concordance rates for pathologists using a microscope to those of pathologists using WSI. Concordance of observers with each individual panelist was also calculated. The 95% CIs on concordance and differences in concordance rates were derived using bootstrap resampling. Bootstrap resampling was performed on sections for paired concordance measures and on sections and observers for calculating 95% CIs on average concordance.

Expert panelist EGyn1, who also reviewed cases with WSI, was excluded from both analyses but was included in the interobserver analysis for feature identification described below. Intraobserver, intermodality analysis was not reported because the only one who conducted a review with both a microscope and WSI was the senior author of the study (EGyn1), whose familiarity with this particular cohort of carcinomas could bias findings and hinder their generalizability.

Interobserver Agreement for Identification of Histologic Features

Interobserver agreement using a specific modality (microscope or WSI) for the identification of each histologic feature was quantified using percent positive agreement, defined as the percent of cases in a data set for which both pathologists identified a certain feature versus the number of cases where either of them identified that feature.

Distribution of Ovarian Carcinoma Subtypes in Cohort

Three expert gynecologic pathologists (EGyns 1–3) conducted microscope reviews to provide a reference standard determination of subtype for all sections in the ovarian carcinoma cohort (Table 1). The table shows the breakdown of sections that had unanimous versus majority consensus diagnosis. Results show that of 212 sections, 56 (26.4%) had nonunanimous agreement, which, considering the expertise of the observers, indicates a challenging data set. Sections from seromucinous, EC, and MUC cases had the highest rates of nonunanimous agreement (4 of 8 [50.0%]; 7 of 16 [43.7%]; and 14 of 43 [32.6%], respectively). Subtype could not be determined (these sections are hereafter referred to as undetermined) for 12 of 212 sections. In these sections, all 3 experts reported different subtype classifications.

Analysis of Concordance With Expert Panel Reference Subtype Classification

To determine a baseline of concordance with the expert panelists, 2 pathologists, GynP and HemP, reviewed glass slides of the ovarian carcinoma cases on a microscope. Concordance with the expert-based reference for GynP was 73.5% across all subtypes and increased to 75.8% when limiting the assessment to the 5 major subtypes (Table 2). Concordance with the reference for slides for which the expert panel was unanimous (84.0%) was substantially higher than for slides for which the panel determined subtype by majority (nonunanimous) consensus (46.4%). Concordance with the expert-based reference was significantly lower for HemP (55.5% across all subtypes, 55.1% 5 major subtypes only) than GynP (73.5% for all, 75.8% for major). The difference in concordance rate was 18.0% (95% CI, 10.0%–26.0%) across all subtypes and 20.7% (95% CI, 12.4%–29.2%) across the 5 major subtypes.

We then tested concordance with the expert-based reference for the pathologists reading WSI. Concordance rates with the reference subtype classification by EGyn4, FeGynP, and FeSurgP (also shown in Table 2) were 68.7%, 68.5%, and 58.5%, respectively, across all subtypes, and 67.8%, 66.3%, and 57.3%, respectively, across the 5 major subtypes. Similar to the pathologists reviewing on a microscope, concordance was substantially higher for sections that had unanimous consensus (76.8%, 75.0%, and 63.9% for the 3 pathologists) than for sections that had majority consensus (48.2%, 51.8%, and 44.6%, respectively). FeGynP's concordance with the expert-based reference was significantly higher than that of FeSurgP across all subtypes (difference: 10.0%, 95% CI [2.0–17.5]) as well as across the 5 major subtypes (difference: 9.0% [1.1–17.4]). In addition to having more experience with diagnosis of ovarian carcinomas, this finding could be partly attributed to the fact that FeGynP was trained under 2 of the expert panelists.

We wished to determine whether there were systematic differences in the propensity of microscope review or WSI review to give better agreement with the expert-based reference. We therefore compared the rates of concordance with the expert-based reference subtypes for GynP reading with a microscope to EGyn4 and FeGynP reading WSI (Table 3). Results are shown here for pathologists for whom, despite their difference in expertise, histotyping of ovarian tumors is a core skill. It can be observed that GynP (attending with subspecialty in gynecologic pathology) reading with a microscope had higher concordance with the expert panel than the pathologists with comparable experience reading on WSI (EGyn4 and FeGynP), but the differences only achieved statistical significance when limiting our analysis to the 5 major subtypes.

The specific subtype classifications provided by GynP, EGyn4, and FeGynP are shown in Table 4. It can be observed that the biggest contributor to differences in concordance rates between GynP (microscope) and EGyn4 or FeGynP (WSI) was MUC classification. GynP classified 29 of the 43 expert-based mucinous cases (67.4%) as MUC, whereas EGyn4 and FeGynP classified only 19 (44.2%) and 20 (46.5%), respectively, of these cases as MUC. Most of the misclassified MUCs were classified as ECs, with a rate of 11 of 43 (25.6%) for GynP, 17 of 43 (39.5%) for EGyn4, and 22 of 43 (51.2%) for FeGynP.

Analysis of Average Concordance With Individual Panel Experts for Subtype Classification

Average concordance rates with the 3 individual experts by observers using a microscope or WSI was used as a complementary metric to account for uncertainty in defining a consensus subtype classification. Findings shown in Table 5 were comparable to those shown above. Average concordance rates with the expert panel were higher for GynP reading on a microscope (69.5% across all sections and 74.0% across the 5 major subtypes) than EGyn4 (64.8% and 66.1%) or FeGynP (62.7% and 63.5%) reading on WSI, with the differences being statistically significant when the analysis was limited to the 5 major subtypes (Table 6). As a point of reference, interexpert concordance was 76.7% across all sections and 81.6% across the 5 major subtypes. Differences in concordance with individual experts between GynP and EGyn4 or GynP and FeGynP (Table 6) were all positive but varied in terms of whether the result was statistically significant.

Analysis of Interobserver Agreement for the Identification of Histologic Features

The rates of agreement between pathologists for identifying the 8 selected histologic features using a microscope are shown in Table 7. Positive percent agreement (PPA) was on the order of 70% for identifying high-grade nuclear atypia and on the order of 60% for identifying high mitotic count across all pairs of pathologists. For the identification of intracytoplasmic mucin, PPA was 74.6% between the expert EGyn1 and GynP and was lower between EGyn1 and HemP (51.2%) and between GynP and HemP (60.3%). For the remaining histologic features, PPA between EGyn1 and GynP ranged from 50% for identifying clear cell architectural patterns to only 10% for identifying features suggestive of endometriosis, and varied widely across the other paired comparisons.

Positive percent agreement results in identifying the 8 selected histologic features between the pathologists using WSI are shown in Table 8. It is generally observed that PPA varied widely across different pairs of pathologists. For the identification of high-grade nuclear atypia, PPAs between EGyn1 and EGyn4 and between EGyn1 and FeGynP were 50.3% and 58.9%, respectively; in comparison, PPA for this feature between EGyn1 reading on a microscope and GynP was 71.4%. Similarly, PPAs for identifying high mitotic count between EGyn1 and EGyn4, and between EGyn1 and FeGynP were 39.4% and 41.1%; in comparison, the PPA for this feature between EGyn1 and GynP on the microscope was 62.0%. For identifying the presence of intracytoplasmic mucin, PPAs between EGyn1 and EGyn4, and between EGyn1 and FeGynP were 54.1% and 36.5%, respectively; in comparison, the PPA for this feature between EGyn1 and GynP on a microscope was 74.6%. For the remaining features, PPAs across pairs of pathologists varied widely, similar to the variation observed for pathologist pairs reading on microscope.

In this manuscript, we have presented the results of a reader study with pathologists from multiple sites focusing on examining pathologist performance for the histologic subtyping of ovarian carcinomas using both a microscope and digital review. We included comparisons of pathologist performance across subspecialties and for pathologists with varying levels of experience. In addition to subtype diagnosis, the study examined pathologist performance for identifying histologic features important for ovarian subtyping.

Results of the study showed that the concordance with the expert reference subtype classification of a pathologist specialized in gynecologic pathology reading on a microscope was significantly higher than the concordance of an expert gynecologic pathologist or a fellow in gynecologic pathologic reading on WSI. This result was primarily due to the substantially lower propensity of the 2 pathologists reading on WSI to diagnose mucinous carcinomas; they classified as mucinous 19 and 20 of the 43 mucinous carcinomas (44.2% and 46.5%, respectively) classified by the expert panel, compared with 29 of 43 (67.4%) by the pathologist reading on a microscope. Most of the missed MUCs were classified as EC, probably because of the difficulty in identifying the presence of intracytoplasmic mucin, as determined from our feature analysis. Another challenge found with the pathologists using WSI was distinguishing between high-grade and low-grade serous carcinomas. High-grade serous carcinomas as classified by the expert-based reference were diagnosed as low-grade serous carcinomas at a rate of 9 of 78 (11.5%) for the expert gynecologic pathologist and 11 of 78 (14.1%) for the fellow on WSI, as opposed to a rate of only 6 of 78 (7.7%) for the gynecologic pathologist reading on a microscope. These differences are probably due to difficulty in identifying high-grade nuclear atypia and high mitotic count, as evident by the results of our histologic feature analysis.

An important finding from this study regarding subtype classification was the substantial difference in pathologist concordance with the reference diagnosis, regardless of modality, between sections that were unanimously classified by the panel of experts and those sections that were classified only by majority consensus. For instance, GynP on microscope had 84% concordance on unanimous sections and only 46% on nonunanimously classified sections, whereas for EGyn4 and FeGynP on WSI those concordance rates were 76.8% versus 48.2% and 75.0% versus 51.8%, respectively. This result reflects the spectrum of diagnostic complexity that exists among cases and its impact on pathologist performance. The College of American Pathologists guideline for the validation of WSI recommended the inclusion of “easy and difficult cases”37 ; however, only a few studies have reported on the distribution of challenging cases among study cases.30,38  As such there is no information regarding the presence and proportion of challenging cases among data sets, making it difficult to make comparisons between findings in different research studies or to determine whether an observed diagnostic performance provides a realistic measure of expected performance in clinical practice, where challenging cases are not uncommon. Considering the probable difference in diagnostic complexity between such cases, it is important for studies examining observer performance for a variety of diagnostic tasks, including the evaluation of tools for clinical decision support30  or computer-aided diagnosis,3941  to use data sets representing such differences.

The analysis of pathologist performance for identifying histologic features resulted in 2 main findings. One was that pathologists had relatively low agreement in identifying some important features for ovarian histologic subtyping with both viewing modes. That was apparent for identifying clear cell architectural patterns, sarcomatous component, squamous differentiation, and hyalinized stroma, for which positive agreement between experienced pathologists was ≤50%, and particularly apparent for identifying features suggesting endometriosis, for which agreement was as low as 10%. Even though the correct classification rate of endometroid carcinomas by experienced pathologists was relatively high, disagreement on identifying or evaluating squamous differentiation or features suggestive of endometriosis might have contributed to the misclassification of MUCs as ECs. As was determined during a consensus meeting between experts discussing discrepant ovarian carcinoma diagnoses,44  reasons for such disagreements include the use of different thresholds for defining the presence of a feature (such as the minimum amount of sarcomatous component or squamous differentiation that must be present to be diagnostically important), whether there is any threshold at which a feature becomes a dichotomous classifier, and placing different amounts of importance on the presence of features in different tissue areas. This is an area for which specific guidelines for evaluating histologic features are lacking. The heuristics that lead to making a diagnosis of a mixed tumor are also very likely to differ between pathologists and contribute to diagnostic disagreement. On the other hand, agreement on the classification of tumors as CCC was high, despite low percent agreement on identification of clear cell architectural patterns or hyalinized stroma. This is unlikely to be surprising to pathologists, who know that tumor classification seldom relies directly on identification of discrete pathognomonic features and usually involves a synthesis of many features in a complex process influenced by experience. Still, better interobserver reproducibility in feature identification would likely contribute to improved concordance for diagnostic tasks, such as ovarian subtyping. More research is needed to determine thresholds and related criteria for linking histologic features to certain subtype diagnoses.

The second finding from our analysis of pathologist performance for feature identification was that there were challenges in identifying intracytoplasmic mucin, and to a lesser degree high-grade nuclear atypia and high mitotic count, with WSI. In the case of mucin (Table 8), the PPAs between experts EGyn1 and EGyn4 and between EGyn1 and the fellow in gynecologic pathology reading with WSI were 54.1% and 36.5%, respectively, whereas the PPA for this feature between EGyn1 reading on a microscope and GynP was 74.6%. Comparably lower were the interobserver agreement rates for identifying high-grade nuclear atypia and high mitotic count, 2 histologic features with significance across many tissue types. These findings are consistent with findings reported elsewhere on challenges identifying mucin or mitoses with WSI.42 

Our study was not designed to tease out the possible reasons for the challenges in identifying histologic features or for classifying mucinous carcinomas observed in our study. One possible reason could relate to interobserver variability. In this study, we had a limited number of pathologists who may have used different criteria for subtype classification or for defining the presence of features. Moreover, the study was not fully crossed over, such that pathologists who reviewed using a microscope did not repeat their review using WSI. Likewise, the pathologists reading on WSI were not as experienced in the use of WSI compared with the use of a microscope, again potentially confounding our reported results. Experience with WSI in daily practice varied among the participants; none are using WSI for routine diagnosis of ovarian tumors. Focused training on reading with WSI for specific tasks, such as ovarian subtyping, could potentially improve pathologist performance or interobserver variability. Another possible reason for the observed challenges in identifying features with WSI could relate to limitations in the WSI scanning process, such as issues with color reproducibility, resolution, and depth of focus, or heuristic factors, such as propensity to review less than the whole slide or impatience due to lag time. Our study was not designed to differentiate limitations coming from the reader from those associated with the WSI process; however, identifying WSI limitations for specific diagnostic tasks would be particularly beneficial to pathologists using this technology as well as for manufacturers who could target further improvements.

One might consider that our study's use of commercial-off-the-shelf monitors contributed to the observed performance: it is possible that a medical-grade monitor could result in better feature identification. However, all pathologists in our study used monitors of the same make and model, calibrated in the same way, thus emulating the uniform specifications of a medical-grade monitor. A recent study43  reported that there was no difference in the identification of mitotic figures or Helicobacter pylori burden between a medical-grade and a commercial monitor. That study did not evaluate the impact of display for identifying the histologic features, nor did it include the particular monitor used in our study. The findings of this study strongly suggest more research is needed to identify the impact of pathologist skill and experience, user training, WSI image quality, and display on feature identification and, more generally, on the ability to make a primary diagnosis. Moreover, given that WSI is likely to become part of workflow in the future, another issue to be addressed is how the lack of concordance in diagnosis affects patient management.

Regarding the impact of subspecialty, it was observed that the agreement of a nonspecialist in gynecologic pathology with the expert-based reference was significantly lower than the corresponding agreement of specialists with comparable years of clinical experience to that of the expert panel. This result is consistent with findings by Patel et al25  on a smaller data set that consisted of only HGSC, CCC, and EC. We also observed that agreement between a fellow in general surgical pathology reading on WSI and the expert reference for overall histologic subtyping was lower by about 9% (without statistical significance) compared with the corresponding agreement between the fellow in gynecologic pathology with expert-based reference. A likely mitigating factor is that the gynecologic pathology fellow was a trainee of the expert panel pathologists. The study has a limited number of observers in each category, so the findings related to experience and subspecialization should not be generalized.

It should be noted that the diagnostic discordances observed in this study for either modality could be exaggerated due to the section-based, rather than case-based, review, and the use of a cohort that was enriched with the less common subtypes (non–high-grade serous carcinomas) in terms of both case numbers and numbers of selected sections per case.

In summary, our study found that the diagnosis of mucinous carcinomas and, to a lesser degree, the differential diagnosis between high-grade and low-grade serous carcinomas, were challenging for pathologists reading with WSI probably because of the difficulty in identifying intracytoplasmic mucin, and inconsistencies in identifying high-grade nuclear atypia or high mitotic count. Further research is needed to clarify the reasons for these challenges and to determine if these issues are fundamental to WSI or related to pathologist inexperience with the WSI review process. It was also found that pathologists had generally low agreement in identifying histologic features important to ovarian subtype classification, by either microscope or WSI. This result suggests the need for refined histologic criteria for identifying such features. Finally, the substantial differences in pathologist concordance between cases deemed as challenging and those that were more typical argue for the systematic inclusion and reporting of pathologist performance across cases spanning the range of diagnostic complexity.

1.
Kristensen
GB,
Tropé
C.
Epithelial ovarian carcinoma
.
Lancet
.
1997
;
349
(9045)
:
113
117
.
2.
Köbel
M,
Kalloger
SE,
Boyd
N,
et al
Ovarian carcinoma subtypes are different diseases: implications for biomarker studies
.
PloS Med
.
2008
;
5
(12)
:
1749
1761
.
3.
Wentzensen
N,
Poole
EM,
Trabert
B,
et al
Ovarian cancer risk factors by histologic subtype: an analysis from the ovarian cancer cohort consortium
.
J Clin Oncol
.
2016
;
34
(24)
:
2888
.
4.
Kurman
RJ,
Shih
IM.
Molecular pathogenesis and extraovarian origin of epithelial ovarian cancer—shifting the paradigm
.
Hum Pathol
.
2011
;
42
(7)
:
918
931
.
5.
Prat
J.
Ovarian carcinomas: five distinct diseases with different origins, genetic alterations, and clinicopathological features
.
Virchows Arch
.
2012
;
460
(3)
:
237
249
.
6.
Banerjee
S,
Kaye
S.
The role of targeted therapy in ovarian cancer
.
Eur J Cancer
.
2011
;
47
:
S116
S130
.
7.
Parker
SL,
Tong
T,
Bolden
S,
Wingo
PA.
Cancer statistics, 1997
.
CA Cancer J Clin
.
1997
;
47
(1)
:
5
27
.
8.
Siegel
RL,
Miller
KD,
Jemal
A.
Cancer statistics, 2017
.
CA Cancer J Clin
.
2017
;
67
(1)
:
7
30
.
9.
McCluggage
WG.
Morphological subtypes of ovarian carcinoma: a review with emphasis on new developments and pathogenesis
.
Pathology
.
2011
;
43
(5)
:
420
432
.
10.
Seidman
JD,
Vang
R,
Ronnett
BM,
Yemelyanova
A,
Cosin
JA.
Distribution and case-fatality ratios by cell-type for ovarian carcinomas: a 22-year series of 562 patients with uniform current histological classification
.
Gyn Oncol
.
2015
;
136
(2)
:
336
340
.
11.
Lund
B,
Thomsen
H,
Olsen
J.
Reproducibility of histopathological evaluation in epithelial ovarian carcinoma: clinical implications
.
APMIS
.
1991
;
99
(1–6)
:
353
358
.
12.
Cramer
S,
Roth
L,
Mills
S,
et al
Sources of variability in classifying common ovarian cancers using the World Health Organization classification: application of the pathtracking method
.
Pathol Ann
.
1993
;
28
:
243
.
13.
Köbel
M,
Bak
J,
Bertelsen
BI,
et al
Ovarian carcinoma histotype determination is highly reproducible, and is improved through the use of immunohistochemistry
.
Histopathology
.
2014
;
64
(7)
:
1004
1013
.
14.
Köbel
M,
Kalloger
SE,
Baker
PM,
et al
Diagnosis of ovarian carcinoma cell type is highly reproducible: a transcanadian study
.
Am J Surg Pathol
.
2010
;
34
(7)
:
984
993
.
15.
Kommoss
S,
Gilks
CB,
du Bois
A,
Kommoss
F.
Ovarian carcinoma diagnosis: the clinical impact of 15 years of change
.
Br J Cancer
.
2016
;
115
(8)
:
993
999
.
16.
Gilks
CB,
Ionescu
DN,
Kalloger
SE,
et al
Tumor cell type can be reproducibly diagnosed and is of independent prognostic significance in patients with maximally debulked ovarian carcinoma
.
Hum Pathol
.
2008
;
39
(8)
:
1239
1251
.
17.
Peres
LC,
Cushing-Haugen
KL,
Anglesio
M,
et al
Histotype classification of ovarian carcinoma: a comparison of approaches
.
Gynecol Oncol
.
2018
;
151
(1)
:
53
60
.
18.
Barnard
ME,
Pyden
A,
Rice
MS,
et al
Inter-pathologist and pathology report agreement for ovarian tumor characteristics in the Nurses' Health Studies
.
Gynecol Oncol
.
2018
;
150
(3)
:
521
526
.
19.
Farahani
N,
Parwani
AV,
Pantanowitz
L.
Whole slide imaging in pathology: advantages, limitations, and emerging perspectives
.
Pathol Lab Med Int
.
2015
;
7
:
23
33
.
20.
Zarella
MD,
Bowman
D,
Aeffner
F,
et al
A practical guide to whole slide imaging: a white paper from the digital pathology association
.
Arch Pathol Lab Med
.
2018
;
143
(2)
:
222
234
.
21.
Ordi
J,
Castillo
P,
Saco
A,
et al
Validation of whole slide imaging in the primary diagnosis of gynaecological pathology in a University Hospital
.
J Clin Pathol
.
2015
;
68
(1)
:
33
39
.
22.
Mukhopadhyay
S,
Feldman
MD,
Abels
E,
et al
Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: a multicenter blinded randomized noninferiority study of 1992 cases (pivotal study)
.
Am J Surg Pathol
.
2018
;
42
(1)
:
39
.
23.
US Food and Drug Administration.
FDA allows marketing of first whole slide imaging system for digital pathology [press release]
.
2018
.
24.
Kommoss
S,
Pfisterer
J,
Reuss
A,
et al
Specialized pathology review in patients with ovarian cancer: results from a prospective study
.
Intl J Gyn Can
.
2013
;
23
(8)
:
1376
1382
.
25.
Patel
C,
Harmon
B,
Soslow
R,
et al,
eds.
Interobserver agreement in the diagnosis of ovarian carcinoma types: impact of sub-specialization
.
Lab Invest
;
2012
:
Nature Publishing Group
, 75
New York, NY
.
26.
van der Post
RS,
van der Laak
JA,
Sturm
B,
et al
The evaluation of colon biopsies using virtual microscopy is reliable
.
Histopathology
.
2013
;
63
(1)
:
114
121
.
27.
Al-Janabi
S,
Huisman
A,
Vink
A,
et al
Whole slide images for primary diagnostics of gastrointestinal tract pathology: a feasibility study
.
Hum Pathol
.
2012
;
43
(5)
:
702
707
.
28.
Campbell
WS,
Hinrichs
SH,
Lele
SM,
et al
Whole slide imaging diagnostic concordance with light microscopy for breast needle biopsies
.
Hum Pathol
.
2014
;
45
(8)
:
1713
1721
.
29.
Eccher
A,
Neil
D,
Ciangherotti
A,
et al
Digital reporting of whole-slide images is safe and suitable for assessing organ quality in preimplantation renal biopsies
.
Hum Pathol
.
2016
;
47
(1)
:
115
120
.
30.
Gavrielides
MA,
Miller
M,
Hagemann
IS,
et al
Clinical decision support for ovarian carcinoma subtype classification: a pilot observer study with pathology trainees
.
Arch Pathol Lab Med
.
2020
;
144
(7)
:
869
877
.
31.
Keay
T,
Conway
CM,
O'Flaherty
N,
Hewitt
SM,
Shea
K,
Gavrielides
MA.
Reproducibility in the automated quantitative assessment of HER2/neu for breast cancer
.
J Pathol Inform
.
2013
;
4
:
19
.
32.
Kurman
RJ,
Carcangiu
ML,
Herrington
CS,
Young
RH.
WHO Classification of Female Reproductive Organs. 4th ed
.
Lyon, France
:
International Agency for Research on Cancer;
2014
.
33.
Malpica
A,
Deavers
MT,
Lu
K,
et al
Grading ovarian serous carcinoma using a two-tier system
.
Am J Surg Pathol
.
2004
;
28
(4)
:
496
504
.
34.
Soslow
RA.
Histologic subtypes of ovarian carcinoma: an overview
.
Intl J Gyn Pathol
.
2008
;
27
(2)
:
161
174
.
35.
McCluggage
W.
My approach to and thoughts on the typing of ovarian carcinomas
.
J Clin Pathol
.
2008
;
61
(2)
:
152
163
.
36.
Loizzi
V,
Cormio
G,
Camporeale
A,
et al
Carcinosarcoma of the ovary: analysis of 13 cases and review of the literature
.
Oncology
.
2011
;
80
(1–2)
:
102
106
.
37.
Pantanowitz
L,
Sinard
JH,
Henricks
WH,
et al
Validating whole slide imaging for diagnostic purposes in pathology: guideline from the College of American Pathologists Pathology and Laboratory Quality Center
.
Arch Pathol Lab Med
.
2013
;
137
(12)
:
1710
1722
.
38.
Williams
BJ,
Hanby
A,
Millican-Slater
R,
Nijhawan
A,
Verghese
E,
Treanor
D.
Digital pathology for the primary diagnosis of breast histopathological specimens: an innovative validation and concordance study on digital pathology validation and training
.
Histopathology
.
2018
;
72
(4)
:
662
671
.
39.
Wei
JW,
Tafe
LJ,
Linnik
YA,
Vaickus
LJ,
Tomita
N,
Hassanpour
S.
Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks
.
Sci Rep
.
2019
;
9
(1)
:
1
8
.
40.
Janowczyk
A,
Madabhushi
A.
Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases
.
J Pathol Inform
.
2016
;
7
:
29
.
41.
Gavrielides
MA,
Gallas
BD,
Lenz
P,
Badano
A,
Hewitt
SM.
Observer variability in the interpretation of HER2/neu immunohistochemical expression with unaided and computer-aided digital microscopy
.
Arch Pathol Lab Med
.
2011
;
135
(2)
:
233
242
.
42.
Williams
BJ,
Brettle
D,
Aslam
M,
et al
Guidance for remote reporting of digital pathology slides during periods of exceptional service pressure: an emergency response from the UK royal college of pathologists
.
J Pathol Inform
.
2020
;
11
:
12
.
43.
Norgan
AP,
Suman
VJ,
Brown
CL,
Flotte
TJ,
Mounajjed
T.
Comparison of a medical-grade monitor vs commercial off-the-shelf display for mitotic figure enumeration and small object (Helicobacter pylori) detection
.
Am J Clin Pathol
.
2018
;
149
(2)
:
181
185
.
44.
Gavrielides
MA,
Ronnett
BM,
Vang
R,
Sheikhzadeh
F,
Seidman
JD.
Selection of representative histologic slides in interobserver reproducibility studies: insights from expert review for ovarian carcinoma subtype classification
.
J Pathol Inform
.
2021
;
12
:
15
.

Author notes

The authors have no relevant financial interest in the products or companies described in this article.