Context.—

Seegene Medical Foundation, one of the major clinical laboratories in South Korea, developed SeeDP, an artificial intelligence (AI)–based postanalytic daily quality control (QC) system that reassesses all gastrointestinal (GI) endoscopic biopsy (EB) slides for incorrect diagnoses.

Objective.—

To review the operational records and clinical impact of SeeDP since its launch in March 2022.

Design.—

Operational records of SeeDP were retrieved for the period of March 1, 2022, to February 28, 2023. Among cases scanned during 40 working days (March 10, 2022, to May 4, 2022), all discordant cases encountered by 2 pathologists were reviewed. Cases of SeeDP-assisted revised diagnoses were collected and compared with cases recognized using conventional methods.

Results.—

Occasional scanner failures and various types of aberrant errors compromised QC coverage, resulting in the scanning of only 67.7% (572 254 of 844 906) of all EB slides submitted and 0.8% of the scanned slides being further excluded from the AI analysis. The AI predictions differed from the pathologists’ diagnoses in 42 760 of the 557 672 gastrointestinal EB slides (7.7%) successfully assessed by the AI models; however, a detailed review of discordant slides revealed that true misdiagnosis accounted for only 5.5% (25 of 454) of the disagreements. Compared with conventional error recognition methods, SeeDP detected more misdiagnoses (7 versus 14) within a significantly shorter time (average, 3.6 versus 38.7 days; P < .001), including 1 signet ring cell carcinoma initially diagnosed as gastritis.

Conclusions.—

AI-based daily QC systems are plausible solutions to guarantee high-quality pathologic diagnosis by enabling rapid detection and correction of misdiagnosis.

Quality control (QC) is a process that maintains a desired quality of products by comparison of a sample of the output with the specification.1  In surgical pathology, the product corresponds to the pathology report, whose quality can be evaluated with respect to its accuracy, timeliness, and completeness2  Case review is one of the best-established methods of QC in pathology because it allows for the reassessment of the diagnostic accuracy and completeness of the reports. The current evidence-based guideline for interpretive diagnostic error reduction in surgical pathology and cytology established by the College of American Pathologists recommends a timely review of pathology cases to improve patient care.3  Although the guideline does not specify the number of cases that should be reviewed, the QC program checklist provided by the Korean Society of Pathologists suggests the following options: reviewing 1 in every 50 cases, 1% of all cases, or 25 cases monthly.4  Because the systematic review conducted to formulate the College of American Pathologists guideline identified an overall major discrepancy rate of 6.3% (25th–75th percentile, 1.9%–10.6%) for surgical pathology diagnosis, reviewing as many cases as possible would be beneficial. However, the increased demand for a second review could negatively affect pathologists’ daily practice. The increased workload of pathologists is a globally recognized phenomenon.5,6  This excessive workload, particularly in South Korea, which has the highest rate of physician consultations among the member nations of the Organization for Economic Co-operation and Development,7  appears to be more pronounced in independent laboratories handling specimens from small clinics and hospitals that are not equipped with a laboratory and pathology center.8  Although pathologists in teaching hospitals can rely on a sign-out system to ensure an extensive dual review,9,10  a feasible and effective QC method is urgently needed for pathologists working at independent pathology laboratories.

Therefore, the Seegene Medical Foundation (SMF; Seoul, South Korea), one of the major clinical laboratories in South Korea, has sought a solution grounded in artificial intelligence (AI) to address this overload. The SMF AI Research Center, through a joint research project with the Graduate School of Data Science, Korea Advanced Institute of Science and Technology (Daejeon, South Korea), developed SeeDP, an AI-based daily QC system that double-checks all gastrointestinal (GI) endoscopic biopsy (EB) slides for possible incorrect diagnoses by comparing pathologists’ diagnoses and AI assessments (Figure 1).11  This study aimed to review the operational records and clinical impact of SeeDP since its launch in March 2022.

Figure 1.

Schematic flow of SeeDP. On day 1, a pathologist examines the glass slides, issues a diagnosis via the laboratory information system (LIS), and returns the slides. The slides are subsequently scanned, and the artificial intelligence (AI) model classifies each case as negative (N), low-grade dysplasia (D), or malignancy (M, including high-grade dysplasia). When the pathologists log into the LIS the next morning (day 2), they encounter a list of cases where the AI assessment and their diagnosis differ. Abbreviations: ADC, adenocarcinoma; CG, chronic gastritis; Dx, diagnosis; HP, hyperplastic polyp; Hp-CAG, Helicobacter pylori–associated chronic active gastritis; LG, low-grade dysplasia; M/D, moderately differentiated; TA, tubular adenoma.

Figure 1.

Schematic flow of SeeDP. On day 1, a pathologist examines the glass slides, issues a diagnosis via the laboratory information system (LIS), and returns the slides. The slides are subsequently scanned, and the artificial intelligence (AI) model classifies each case as negative (N), low-grade dysplasia (D), or malignancy (M, including high-grade dysplasia). When the pathologists log into the LIS the next morning (day 2), they encounter a list of cases where the AI assessment and their diagnosis differ. Abbreviations: ADC, adenocarcinoma; CG, chronic gastritis; Dx, diagnosis; HP, hyperplastic polyp; Hp-CAG, Helicobacter pylori–associated chronic active gastritis; LG, low-grade dysplasia; M/D, moderately differentiated; TA, tubular adenoma.

Close modal

Overview of SeeDP

SeeDP comprises AI models, an automated work process that sends a newly scanned slide to the AI model and retrieves its pathologic diagnosis, and a web-based interface to visualize the result of the AI prediction and determine whether it differs from the original diagnosis. SeeDP analyzes a whole slide image (WSI) as follows. First, SeeDP segments a WSI of the hematoxylin and eosin–stained slides into 256 × 256-pixel patches and directs the patches to a patch classifier, where each patch is designated as N, negative; D, low-grade (LG) dysplasia; or M, malignant, including high-grade dysplasia. The results of the patchwise classification are aggregated and supplied to a WSI classifier to establish the WSI diagnosis.

The automated work process is activated when a newly scanned slide file is stored. It searches the laboratory information system (LIS) database (DB) and retrieves the pathology report of the slide using the name (slide number) as a key. First, the organ is identified from the report to determine the AI model that should be called (colorectal or gastric), and the pathologic diagnosis is subsequently parsed and categorized. For example, a diagnosis is classified as M if it contains keywords such as carcinoma or malignancy. Diagnostic reports suggestive of the remaining lesions that do not fall under the aforementioned 3 categories, including neuroendocrine tumors, mesenchymal tumors, and lymphoma, are categorized as U (uncategorized). Because the AI models classify a case only as N, D, or M, all U cases inevitably result in AI-pathologist discordance.

Diagnosing a WSI requires 2 AI models (the patch classifier and the WSI classifier), and separate models are needed for the colorectum (including the terminal ileum) and stomach. Therefore, SeeDP is equipped with 4 different AI models (2 patch classifiers and 2 WSI classifiers). Both the patch and WSI classifiers are based on convolutional neural networks, achieving balanced accuracy rates of 92.8% (stomach) and 94.8% (colorectum) using unbalanced data sets representative of the SMF Pathology Center’s daily GI EB slide composition. More details on the AI models, the parsing logic for organ determination and diagnostic categorization, and the implementation of the web interface are described elsewhere.11 

Laboratory Aspects of SeeDP Operation

In 2018, the SMF AI Research Team purchased a Panoramic 250 Flash III scanner (3DHISTECH, Budapest, Hungary) to develop SeeDP. Tests to estimate the maximum scan capacity of one scanner (ie, how many slides a scanner can digitize) were performed in 2021 to determine the number of scanners required to operate SeeDP for all GI EB slides at the SMF Pathology Center. We found that scanning a rack containing 25 slides took approximately 30 minutes (data not shown). This theoretically equates to 400 slides in an 8-hour period. However, actual tests revealed that approximately 300 slides could be digitized during continuous 8-hour scanner operation. We reasoned that the removal and loading of racks and/or the movement of slides from the racks to the camera consumed some time. Based on the 8-hour experiment, we concluded that continuous scanner operation can digitize up to 900 slides per day, establishing the maximum daily scan capacity per scanner at 900 slides. Two additional Panoramic 250 Flash III scanners were introduced to scan up to 2700 slides per day because the average number of EB slides to be processed daily in 2020 was approximately 2500. To achieve the expected scan capacity, the following laboratory process was established. During the daytime (0830–1730), a designated laboratory worker collects all EB-like slides and loads them onto the scanners. The daytime staff leaves the laboratory after preparing the racks for the night-shift workers. Additionally, the night-shift workers, originally responsible for specimen reception and gross examination, allocate some of their time to loading the prepared racks and handling slides found to be stuck during their working hours (2400–0800).

Information Technology Aspect of SeeDP Operation

Three workstations were installed and assigned to each scanner in the scanner room, located on the third floor of SMF’s main building. To store the digitized slides, perform the AI operation, and service the SeeDP, 3 servers for each purpose were introduced into the server room, located on the fifth floor of SMF’s main building. The storage server is equipped with a 450-terabyte solid-state drive (VAST Data, New York City, New York), and the AI server is equipped with 2 A100 Tensor Core GPUs (NVIDIA, Santa Clara, California) and a third-generation Intel Xeon CPU. When a new slide file is created in a workstation, it is copied to the storage server via 1-Gbps Ethernet. When the file is transferred to the storage server, the bar code identifier of the slide extracted from the file name is used to query the SMF LIS DB, and the data retrieved are stored in the DB of the SeeDP server. The data retrieved include the date and serial number of specimen reception, the names of the patient and the pathologist who signed out the case, and the pathology report, all of which are displayed on the web user interface of SeeDP. As described above, each WSI is segmented into 256 × 256-pixel patches, and the AI model on the AI server, that is, gastric or colorectal, is executed depending on the organ identified in the pathology report. Empirically, approximately 1–2 minutes are required to transfer a scanned file to the storage server and 20–30 seconds for the AI model inference. When the AI assessment is completed, the patchwise and WSI classification results are stored in the SeeDP DB, and the patches are immediately deleted to maintain storage. Similarly, the files scanned at the workstations are deleted regularly.

Pathology Department Aspect of SeeDP Operation

After a successful trial-run phase for the slides of 4 pathologists (September 15, 2021, to December 14, 2021),11  SeeDP was officially launched for all 19 pathologists in the SMF Pathology Center on March 7, 2022. In a regular center meeting held on March 8, 2022, an announcement was made to introduce SeeDP and share the trial run results. Pathologists were encouraged to check for discordant cases every morning, particularly when the pathologist’s diagnosis or AI assessment was categorized as M. Therefore, to entrench this practice in the daily routine workflow, the LIS was updated to automatically pop up the SeeDP web interface upon login by the pathologists. When a misdiagnosis was discovered in the second SeeDP-assisted review, pathologists were instructed to notify the information management team at the SMF Pathology Center to enable them to include the case in the list of SeeDP-assisted revised diagnoses. Notably, this was a usual procedure, because any diagnostic revision or correction normally involves the information management team. Because the SMF Pathology Center serves small clinics or hospitals throughout the country that lack a laboratory, providing a thorough explanation to clients before diagnosis revision/correction is mandatory. Additionally, the information management team is responsible for rigorously monitoring any instance of misdiagnosis, to maintain a high-quality pathologic diagnosis as our competitive advantage.

Data Extraction and Analysis

This retrospective review of the operational records of SeeDP was approved by the Institutional Review Board of the SMF (IRB number: SMF-IRB-2020–007). Records of the SeeDP analysis for specimens received between March 1, 2022, and February 28, 2023, including the total number of slides scanned and assessed by the AI models, cases with discordance between the AI prediction and pathologist’s diagnosis, and whether a WSI was reviewed by relevant pathologists, were obtained by querying the SeeDP DB. Among cases scanned during 40 working days (March 10, 2022, to May 4, 2022), all discordant slides of 2 pathologists (S.-Y.Y. and Y.S.K.) were reviewed to explore the causes and patterns of AI-pathologist discordance. A case was determined to be misdiagnosed upon consensual agreement between the primary (S.-Y.Y.) and secondary (Y.S.K.) reviewers. The total number of EB slides derived from the stomach, terminal ileum, and colorectum was used to compute the QC coverage of SeeDP, defined as the proportion of analyzed slides among the total number of GI EB slides. However, because the SMF LIS DB does not allow text queries of pathology reports, we calculated the total number of slides derived from endoscopic examinations based on the procedure codes of specimens originally used for claiming service fees from the national health insurance. This number differed from the total number of GI EB slides because it included specimens derived from the duodenum and esophagus. Because slides of specimens sampled from the duodenum and esophagus were grossly indistinguishable from the GI EB slides, it was reasonable to assume that the scan rate of the duodenal and esophageal slides was similar to that of the GI slides. Therefore, we computed the scan coverage, which is defined as the proportion of EB slides scanned, as a substitute for QC coverage. Finally, a list of SeeDP-assisted revised diagnoses and misdiagnoses identified using conventional methods was obtained from the information management team. The following 3 types of error recognition routes were regarded as conventional during our routine workflow.

Review of Recut Slides for Patient Referral

Because most of our clients are small clinics incapable of performing endoscopic submucosal dissection or surgical resection, they need to send their patients to advanced general hospitals when a pathologically significant lesion is detected. When an order is made to recut the slides for patient referral, they undergo a second review by pathologists before being dispatched to the client.

Random Review

As a requirement of the QC program of the Korean Society of Pathologists, each pathologist at the SMF Pathology Center reviews 13 EB cases and 12 non-EB cases every month. Slides are randomly picked by the laboratory staff when they collect the returned slides and store them in cabinets, and the information management team distributes the slides to the pathologists.

Review Requests From Clinicians

Our clients occasionally request pathologists to review cases when their diagnoses differ from the clinician’s impression.

Statistical Analysis

The mean times required to recognize misdiagnoses by SeeDP and conventional routes were compared using Welch 2-sample t tests. Data were analyzed using the statistical software package R, version 4.2.1 (R Project for Statistical Computing; https://www.r-project.org).

Repeated Scanner Failure Hindered the Achievement of 100% Scan Coverage

In 2022, an average of 2908 EB slides were produced daily at the SMF Pathology Center. We anticipated that all GI EB slides could be scanned and analyzed using SeeDP within 2 days, given the maximum daily scan capacity of 2700 slides. However, we discovered that we could not cover all the GI EB slides. Only 67.7% (572 254) of the 844 906 endoscopic examination–derived specimens received between March 1, 2022, and February 28, 2023 were scanned (no multiple counts for serial, deeper, or recut sections). Analysis of monthly statistics revealed that scan coverage was satisfactory during the first 2 months but began to deteriorate when both scanners went out of service simultaneously (June 21, 2022, to July 3, 2022) (Figure 2). Therefore, slides from infrequent SeeDP users were excluded from scanning to maintain “effective” QC coverage. Selecting infrequent users was possible because SeeDP records whether a given pathologist has opened a slide. However, the delay in scanning persisted even after excluding these slides, such that some slides from June remained unscanned in mid-July. To ensure timeliness of QC functionality, the work process was modified to promptly scan newly returned slides and leave the unscanned slides. Since then, scan coverage has never reached the highest level achieved in April 2022 (99.2%, 56 074 of 56 530), fluctuating at approximately 60%. One scanner (No. 3) has repeatedly become unavailable (September 1–September 15, September 24–November 28), further reducing the scan coverage to as low as 39.7% (October 2022, 31 949 of 80 504).

Figure 2.

Monthly scan coverage. The number of total endoscopic biopsy (EB) slides, slides scanned, and proportion of scanned slides (scan coverage) every month are displayed. Additional sections were not counted multiple times; thus, serial and deeper sections did not increase the number of slides until the sections were derived from the same slide. The availability of the 3 scanners (Nos. 1–3) is denoted below the main graph. Red indicates periods of failure (from June 21 to July 3, September 1 to September 15, and September 28 to November 28).

Figure 2.

Monthly scan coverage. The number of total endoscopic biopsy (EB) slides, slides scanned, and proportion of scanned slides (scan coverage) every month are displayed. Additional sections were not counted multiple times; thus, serial and deeper sections did not increase the number of slides until the sections were derived from the same slide. The availability of the 3 scanners (Nos. 1–3) is denoted below the main graph. Red indicates periods of failure (from June 21 to July 3, September 1 to September 15, and September 28 to November 28).

Close modal

Technical and Human Errors That Further Compromised QC Coverage

We also discovered various types of errors that excluded scanned slides from the QC coverage of SeeDP (Figure 3). Scan errors occurred in 4447 of 562 203 GI EB slides scanned (0.79%), which were deemed to be invalid because the AI models could not analyze them. Overall, 77 slides (0.014%) were returned and scanned before the final diagnosis, resulting in failure of SeeDP to retrieve the pathology reports. This type of error, which occurred when a pathologist ordered additional sections and forgot to retain the original slides, compromised QC coverage because once SeeDP failed to categorize the pathologic diagnosis as N/D/M/U, discordance with the AI assessment could not be evaluated. Similarly, 6 slides (0.0011%), which were unexpectedly stopped during AI inference, were not covered by SeeDP because of the lack of AI assessment. The discordance between the pathologist’s diagnosis and AI assessment could be evaluated in 557 672 GI EB slides because one case had 2 types of errors. The results of 17 slides (0.0030%) were not sent to the relevant pathologists because the pathologists who signed out the cases could not be properly identified or retrieved. Tracking the work history of the cases revealed that this type of error occurs when changes in specimen reception or slide assignment information result in blank entries into the LIS DB, making SeeDP encounter a blank space when attempting to identify the pathologists who signed out the cases. Thus, we concluded that 99.2% (557 675 of 562 203) of the scanned GI EB slides were properly covered by the SeeDP QC system.

Figure 3.

Various error modes encountered during SeeDP’s quality control work process. Errors inherent to the work process of SeeDP and external errors are aligned to the right and left sides, respectively. Abbreviations: AI, artificial intelligence; DB, database; Dx, diagnosis; EB, endoscopic biopsy, EG, esophagogastric; GI, gastrointestinal; LIS, laboratory information system; R, rectum; TI, terminal ileum.

Figure 3.

Various error modes encountered during SeeDP’s quality control work process. Errors inherent to the work process of SeeDP and external errors are aligned to the right and left sides, respectively. Abbreviations: AI, artificial intelligence; DB, database; Dx, diagnosis; EB, endoscopic biopsy, EG, esophagogastric; GI, gastrointestinal; LIS, laboratory information system; R, rectum; TI, terminal ileum.

Close modal

Prevalence and Patterns of AI-Pathologist Discordance

The AI prediction differed from the pathologist’s diagnosis in 42 760 of the 557 672 GI EB slides (7.7%) in which the AI assessment and pathologist’s diagnosis were compared. A review of the discordant slides was deemed crucial for identifying the exact prevalence and causes of the discordance because SeeDP users complained that most of the discrepancies were attributable to SeeDP misclassification. Therefore, all discordant slides of the 2 pathologists (S.-Y.Y. and Y.S.K.) were retrieved from the slides scanned during 40 working days (March 10, 2022, to May 4, 2022). During this period, 7304 GI EB slides signed out by the pathologists were successfully analyzed using SeeDP. AI-pathologist discordance was reported in 454 slides (6.2%), which corresponded to 11 or 12 slides requiring a second review per working day. In total, 452 slides were reviewed after excluding 2 neuroendocrine tumor cases, which inevitably caused AI-pathologist discordance as described above. Consequently, we discovered that the pathologist’s misdiagnosis accounted for only 5.5% (25 of 452) of disagreements (Table 1).

Table 1.

Patterns and Prevalence of Artificial Intelligence (AI)–Pathologist Discordance (N = 452)

Patterns and Prevalence of Artificial Intelligence (AI)–Pathologist Discordance (N = 452)
Patterns and Prevalence of Artificial Intelligence (AI)–Pathologist Discordance (N = 452)

In total, 4.0% of discordant slides (18 of 452) were considered as false discrepancies because they were linked to technical errors. For example, the AI assessed 11 slides as N because the lesions were unidentifiable on inappropriately scanned slides.

In total, 2.2% of discordant slides (10 of 452) were interpreted as understandable because the diagnoses were debatable or challenging enough to respect the AI’s opinion. For example, a colon biopsy revealed LG tubular adenoma (TA), which the AI assessed as N. Although one reviewer agreed with the AI’s opinion by interpreting the nuclear stratification observed in a few glands to be a consequence of inflammation-related reactive changes, the other reviewer, who signed out the case, refused to change her decision even after additional review. The reviewer reasoned that nuclear stratification and enlargement clearly demarcated the glands from the adjacent glands, and significant inflammation was absent. We conclude that the AI-pathologist discordance with this type of controversial case should be tolerated.

Finally, the remaining discordant slides, which constituted the majority (88.3%, 399 of 452), were attributed to unacceptable misjudgments of the AI models. We further categorized these intolerable error patterns into the following 5 types (Supplemental Table; see the supplemental digital content, containing 1 table and 1 figure at https://meridian.allenpress.com/aplm in the July 2025 table of contents.).

Cellular Changes in the Basal Layer of N Misinterpreted as D

Cells with enlarged, hyperchromatic, and/or stratified nuclei were interpreted as D irrespective of their location and context. For example, patches containing a basal layer of serrated colorectal lesions or metaplastic gastric mucosa were classified as D (Supplemental Figure, A and B).

False-Negative of WSI Classifiers

It was the most frequent pattern of misjudgment (49.0%, 201 of 410), and was caused by the false-negative calls of the WSI classifiers, despite correct identification of the D/M patches (Supplemental Figure, C through E).

Busy Stroma Interpreted as M

This occurred when patches with desmoplastic and/or inflamed stroma were interpreted as M. For example, the AI models interpreted 42 cases of chronic active gastritis and 3 cases of colitis as M (Supplemental Figure, F and G).

Failure to Identify Small Lesions

This happened when the lesion was too small to be readily localized (Supplemental Figure, H).

Heterogenous Assessment Over the Well-Differentiated Adenocarcinoma Area

A single tumor area of gastric well-differentiated adenocarcinoma was occasionally highlighted by alternating D and M patches (Supplemental Figure, I and J). This heterogeneous result of patch classification prevented the WSI classifier from assessing such a case as M.

Pathologists Showed Less Enthusiasm for the SeeDP-Assisted Second Review

Figure 4 shows the monthly trend of the proportion of discordant slides reviewed by relevant pathologists (review rate). Although some pathologists were quite enthusiastic about the SeeDP-assisted second review, we observed that most full-time pathologists whose slides had been consistently scanned from March 2022 to February 2023 had not actively used SeeDP. The monthly review rate of the nonenthusiastic reviewers ranged from 5.9% (August 2022, 180 of 3040) to 33.0% (March 2022, 572 of 1732), fluctuating around the annual review rate of 10.0% (3168 slides reviewed of 31 700 discordant slides).

Figure 4.

Monthly review rate. Each line represents the proportion of discordant slides reviewed by each of the 12 full-time pathologists whose slides had been consistently scanned from March 2022 to February 2023. Monthly average review rates, computed after excluding 2 enthusiastic reviewers as outliers, are presented as a thick red line and as a table below the graph.

Figure 4.

Monthly review rate. Each line represents the proportion of discordant slides reviewed by each of the 12 full-time pathologists whose slides had been consistently scanned from March 2022 to February 2023. Monthly average review rates, computed after excluding 2 enthusiastic reviewers as outliers, are presented as a thick red line and as a table below the graph.

Close modal

Clinical Significance of the AI-Assisted Daily QC System

To explore whether SeeDP managed to facilitate rapid error correction, lists of SeeDP-assisted revised diagnoses and misdiagnoses of GI EB cases recognized by conventional routes were obtained, and the time expended to correct the misdiagnoses was compared. From March 2022 to February 2023, 7 misdiagnosed GI EB cases were detected by conventional methods, which required an average of 38.7 (±6.2) days to identify the errors (Table 2). Because the interval of recognition was prolonged and most errors were clinically insignificant, not all cases were revised. Revised diagnoses were issued for only 3 cases. One case initially diagnosed as gastric TA, LG was revised to Helicobacter pylori–associated chronic active gastritis with intestinal metaplasia, erosion, and regenerative atypia, as recognized during a review of the recut slides for patient referral. Another case, which was initially diagnosed as a traditional serrated adenoma of the colon, was revised to a hyperplastic polyp because the error was identified upon a review requested by the clinician. Finally, a revised diagnosis was issued for colonic TA, LG, initially diagnosed as a hyperplastic polyp after being discovered in a random review.

Table 2.

Misdiagnoses Recognized by Conventional Routesa

Misdiagnoses Recognized by Conventional Routesa
Misdiagnoses Recognized by Conventional Routesa

During the same period, SeeDP-assisted diagnostic revision occurred in 14 cases within a mean interval of 3.6 (±2.1) days (Table 3). This interval of time was significantly shorter than the mean interval of 38.7 days required by the conventional routes used to identify errors (P < .001; Figure 5). All recognized misdiagnoses were revised because of rapid detection. Notably, 1 case of signet ring cell carcinoma, initially diagnosed as H pylori–associated chronic active gastritis, was rescued within 7 days.

Figure 5.

Time interval (days) for recognizing misdiagnoses using conventional methods and SeeDP.

Figure 5.

Time interval (days) for recognizing misdiagnoses using conventional methods and SeeDP.

Close modal
Table 3.

SeeDP-Assisted Revised Diagnoses

SeeDP-Assisted Revised Diagnoses
SeeDP-Assisted Revised Diagnoses

During the past few years, the advent of so-called digital pathology and the remarkable success of AI in computer vision have raised pathologists’ expectations of the myriad ways in which AI could potentially aid them in the future. Although some methods, including diagnostic support12–15  or objective image analysis for patient stratification16–18  have been brought to fruition successfully, AI-assisted QC of histopathologic diagnosis has long been considered plausible, albeit with limited supporting evidence.19,20  In fact, most previous attempts have focused solely on the preanalytic aspect of histopathologic diagnosis, namely AI-assisted detection of poor-quality WSIs.21,22  Ibex Medical Analytics (Tel Aviv, Israel) reported the first cases of an AI-based QC system that works on the postanalytic phase by performing a second read of all signed-out slides of breast and prostate cancer.23,24  However, the reports on AI-assisted revised diagnoses were primarily anecdotal because they disclosed only some rescued cases and omitted details, such as the time required to correct the original diagnoses. To the best of our knowledge, this is the first report of an AI-assisted postanalytic QC system for GI EB slides, with detailed data on its deployment, maintenance, and clinical impact.

When SeeDP was launched using 3 scanners, we expected all GI EB slides to be scanned within 2 or 3 days. Therefore, the default setting of the SeeDP web viewer was set to show the slides scanned during the last 3 days. However, we achieved unsatisfactory scan coverage of 67.7% during the past year. It should be mentioned that the volume of slides generated by ancillary tests (additional sections, immunohistochemical or special stains) was insufficient to compromise the scanning of new slides because orders for ancillary tests accounted only for 1.9% (15 817 of 844 906) of the total EB slides during the study period. We were informed that most cases of scanner malfunction were caused by damage to parts of the scanners due to overuse. Because the SMF pathology center processes a large volume of specimens, requiring continuous operation of the scanners, this could have substantially deteriorated the durability of the scanners. Furthermore, the malfunction event that required the longest repair time, from September 28 to November 28, was due to a ransomware attack, which is hardly a foreseeable occurrence. Collectively, the overall experience indicates that we should have been equipped with greater scan capacity that was sufficient to allow the scanners to rest and cope with unexpected events. This experience will affect our plan to introduce new scanners with higher throughput.

Eluding QC coverage owing to aberrant errors that occurred after scanning was also an unexpected finding (Figure 3). The most prevalent error type was scan failure, which occurred in 0.79% (4447 of 572 254) of scanned slides. Although the scan error rate was substantially lower than those of previous studies,25,26  it should be clarified that we considered a WSI to be erroneous when no patches extracted from the image passed the noise filter designed to eliminate white space or poorly focused areas. However, previous studies estimated the error rate using manual QC. Considering 11 of the 18 false-discrepancy slides were attributed to inappropriately scanned slides, our actual scan error rate was probably higher than 0.79%. In addition to improving the performance of the noise filter, this type of error eventually requires a new work process to notify the staff in charge of the list of poorly scanned slides, to ensure that the slides can be rescanned and appropriately evaluated by SeeDP. This new process could also rescue cases affected by an AI model error due to an unexpected halt in AI inference. The second most common type of error was attributed to the pathologist’s mistake in returning the slides while waiting for additional sections. This and other errors caused by missing data can be collectively prevented by coordinated action with the LIS. For example, SeeDP can perform AI prediction only on WSIs accompanied by a final diagnosis and can notify the system manager of the presence of missing entries in the LIS DB. Therefore, we intend to address this issue in the second-generation LIS for SMF, which is currently under development.

The frequent misjudgment of AI initiated this retrospective analysis of the performance of SeeDP; therefore, it was not surprising that nearly 90% of AI-pathologist discordance was attributed to misclassification by AI. It should be emphasized that this aligns with the abovementioned studies by Ibex Medical Analytics, where 90.9% of cancer alerts for prostate lesions and 75% for breast lesions required no diagnostic amendment.23,24  At the same time, however, it was natural for pathologists to be disappointed with the performance of SeeDP when they realized that more than 9 out of 10 review requests were false calls. A significant decline in the review rate during the first 2 months (from 33.7% to 11.3%) could be attributable to this dissatisfaction. It should be noted that we could not force pathologists to participate in the SeeDP-suggested second review; they would spend their time for review only when they felt it necessary and beneficial. It was unfortunate that we failed to motivate them, giving rise to the low average review rate of 10.0% during the study period. This low review rate explains why misdiagnoses recognized by conventional routes could not be corrected by SeeDP in advance; although SeeDP’s assessment differed from the initial pathologic diagnosis in 4 of the 7 cases, none of the discordant cases were reviewed.

A detailed analysis of AI misclassification highlights the limitations of our current approach in interpreting histologic images. All pathologists would agree that a square patch of several hundred pixels is an illegitimate unit of N/D/M classification, as we cannot always confidently classify a patch as N/D/M. The Panoramic 250 Flash III generates WSIs with a resolution of approximately 0.24 mm per pixel, equivalent to ×40 optical magnification. At this resolution, a 256 × 256 patch contains only a few colonic crypts. Therefore, we used a lower-magnification view to make the patches more appropriate for N/D/M classification. This was achieved with the deep zoom functionality of the OpenSlide library,27  which extracted patches from the WSIs that were downsampled by a factor of 4. This approach is comparable to a pathologist relying solely on a ×10 lens when interpreting slides and can partly explain type 4 misclassification.

Type 1 misclassification further highlights the inadequacy of patch classification by revealing that designating a patch as N or D often depends on its location (basal or superficial) and the appearance of neighboring patches. This contextual interpretation cannot be reflected in the current approach, where patches are individually classified first, and the WSI diagnosis is formulated based on these classifications. As exemplified in type 5 misclassification, gastric well-differentiated adenocarcinoma cases pose another challenge to the current approach because their atypia is often too subtle to be recognized as M at the patch level. These 2 types of misclassification have been described in previous studies in which AI models for GI EBs were implemented similarly to ours.12,13,28  Conversely, type 3 misclassification was not pronounced in other studies. Song et al12  reported that inflamed granulation tissue and inflammatory infiltrates caused false positivity in a gastric cancer model; however, they were excluded from the common failure patterns of their model. This issue might have been alleviated by establishing a balanced data set regarding the degree of inflammation, preventing AI models from learning inflammation and desmoplasia as evidence of malignancy.

Collectively, both types 1 and 3 misclassification prevented WSI classifiers from fully trusting the results of patch classification. From this perspective, the presence of D or M patches does not guarantee a WSI diagnosis, necessitating additional efforts to establish the correct WSI classification. Consequently, the WSI classifiers were somehow trained to compensate for the errors in patch classification by ignoring correctly identified D/M patches, constituting type 2 misclassification. Although these errors are understandable from the AI models’ perspective, they appear so unacceptable to pathologists that they have lost faith in SeeDP. Besides establishing new data sets as mentioned above, we believe that an alternative way of assessing WSI needs to be investigated to improve the performance of AI models. We are currently attempting state-of-the-art techniques such as weakly supervised or semisupervised learning,29,30  so that pathologists can rely on SeeDP’s assessment and willingly agree to review discordant slides.

Despite all the limitations, SeeDP managed to detect more misdiagnoses (7 versus 14) within a significantly shorter time (average 3.6 days versus 38.7 days; P < .001), including 1 signet ring cell carcinoma initially diagnosed as gastritis. It should be highlighted that SeeDP worked synergistically with conventional QC methods to correct misdiagnoses. As mentioned above, SeeDP failed to detect 3 of the 7 misdiagnoses recognized by conventional methods. Given that AI models with 100% accuracy cannot exist and reviewing 100% of discordant slides cannot be enforced, we should not expect SeeDP to replace conventional methods. We are ultimately pursuing a 2-track case review system where all recently diagnosed cases are first reviewed by AI, and conventional QC methods based on human reviewers further work on cases missed by AI.

As described above, our retrospective review reveals several limitations of SeeDP. First, SeeDP’s AI models were trained and tested using WSIs from a single institution (SMF Pathology Center). Our AI models require further validation in the future owing to the lack of external validation and sufficient optimization. Furthermore, the actual accuracy of the AI models remains questionable because only a limited number of slides were reviewed in the present analysis. Finally, QC coverage was compromised by various technical errors and insufficient compliance of pathologists. Further efforts would be needed to improve the model performance based on thorough external and prospective validation and to systematically manage QC coverage by addressing various error modes and motivating pathologists.

In conclusion, AI-based daily QC systems are plausible solutions that guarantee high-quality pathologic diagnosis because they enable rapid detection and correction of misdiagnosis. Nevertheless, further efforts are required to systematically manage QC coverage by addressing various technical issues and improving AI performances.

We wish to thank Seo Yeon Kim, chief of the information management team of the Pathology Center, Seegene Medical Foundation, for providing a list of SeeDP-assisted revised diagnoses and misdiagnoses recognized using conventional routes. We would also like to thank Editage (www.editage.co.kr) for English language editing and formatting.

1.
Quality control.
In:
Oxford English Dictionary
.
Oxford, United Kingdom
;
Oxford University Press
;
2024
.
2.
Nakhleh
RE.
What is quality in surgical pathology
?
J Clin Pathol
.
2006
;
59
(
7
):
669
672
.
3.
Nakhleh
RE
,
Nosé
V
,
Colasacco
C
, et al.
Interpretive diagnostic error reduction in surgical pathology and cytology: guideline from the College of American Pathologists Pathology and Laboratory Quality Center and the Association of Directors of Anatomic and Surgical Pathology
.
Arch Pathol Lab Med
.
2016
;
140
(
1
):
29
40
.
4.
The Korean Society of Pathologists/Korean Society of Cytopathology.
Pathology red book for quality assurance
. https://www.pathology.or.kr/html/?pmode=boardview&MMC_pid=251&seq=21816. Accessed June 20, 2023.
5.
Metter
DM
,
Colgan
TJ
,
Leung
ST
,
Timmons
CF
,
Park
JY.
Trends in the US and Canadian pathologist workforces from 2007 to 2017
.
JAMA Netw Open
.
2019
;
2
(
5
):
e194337
.
6.
Cho
U
,
Kim
TJ
,
Kim
WS
,
Lee
KY
,
Yoon
HK
,
Choi
HJ.
Current state of cytopathology residency training: a Korean national survey of pathologists
.
J Pathol Transl
.
2023
;
57
(
2
):
95
101
.
7.
Organization for Economic Cooperation and Development (OECD),
2021
.
Health at a glance 2021: OECD indicators
.
Paris, France
:
OECD Publishing
. https://www.oecd-ilibrary.org/social-issues-migration-health/health-at-a-glance-2021_ae3016b9-en. Accessed September 1, 2023.
8.
Yoon
HK
,
Diwa
MH
,
Lee
YS
, et al.
How overworked are pathologists? An assessment of cases for histopathology and cytopathology services
.
Basic Appl Pathol
.
2009
;
2
(
4
):
111
117
.
9.
Safrin
RE
,
Bark
CJ.
Surgical pathology sign-out: routine review of every case by a second pathologist
.
Am J Surg Pathol
.
1993
;
17
(
11
):
1190
1192
.
10.
Weydert
JA
,
De Young
BR
,
Cohen
MB.
A preliminary diagnosis service provides prospective blinded dual-review of all general surgical pathology cases in an academic practice
.
Am J Surg Pathol
.
2005
;
29
(
6
):
801
805
.
11.
Ko
YS
,
Choi
YM
,
Kim
M
, et al.
Improving quality control in the routine practice for histopathological interpretation of gastrointestinal endoscopic biopsies using artificial intelligence
.
PLoS One
.
2022
;
17
(
12
):
e0278542
.
12.
Song
Z
,
Zou
S
,
Zhou
W
, et al.
Clinically applicable histopathological diagnosis system for gastric cancer detection using deep learning
.
Nat Commun
.
2020
;
11
(
1
):
4294
.
13.
Park
J
,
Jang
BG
,
Kim
YW
, et al.
A prospective validation and observer performance study of a deep learning algorithm for pathologic diagnosis of gastric tumors in endoscopic biopsies
.
Clin Cancer Res
.
2021
;
27
(
3
):
719
728
.
14.
Eloy
C
,
Marques
A
,
Pinto
J
, et al.
Artificial intelligence–assisted cancer diagnosis improves the efficiency of pathologists in prostatic biopsies
.
Virchows Arch
.
2023
;
482
:
595
604
.
15.
Retamero
JA
,
Gulturk
E
,
Bozkurt
A
, et al.
Artificial intelligence helps pathologists increase diagnostic accuracy and efficiency in the detection of breast cancer lymph node metastases
.
Am J Surg Pathol
.
2024
;
48
:
846
854
.
16.
Park
S
,
Ock
CY
,
Kim
H
, et al.
Artificial intelligence–powered spatial analysis of tumor-infiltrating lymphocytes as complementary biomarker for immune checkpoint inhibition in non-small-cell lung cancer
.
J Clin Oncol
.
2022
;
40
(
17
):
1916
1928
.
17.
Reichling
C
,
Taieb
J
,
Derangere
V
, et al.
Artificial intelligence-guided tissue analysis combined with immune infiltrate assessment predicts stage III colon cancer outcomes in PETACC08 study
.
Gut
.
2020
;
69
(
4
):
681
690
.
18.
Lee
HJ
,
Cho
SY
,
Cho
EY
, et al.
Artificial intelligence (AI)–powered spatial analysis of tumor-infiltrating lymphocytes (TIL) for prediction of response to neoadjuvant chemotherapy (NAC) in triple-negative breast cancer (TNBC)
.
J Clin Oncol
.
2022
;
40
:
595
.
19.
Rakha
EA
,
Toss
M
,
Shiino
S
, et al.
Current and future applications of artificial intelligence in pathology: a clinical perspective
.
J Clin Pathol
.
2021
;
74
(
7
):
409
414
.
20.
Kim
I
,
Kang
K
,
Song
Y
,
Kim
TJ.
Application of artificial intelligence in pathology: trends and challenges
.
Diagnostics (Basel)
.
2022
;
12
(
11
):
2794
.
21.
Haghighat
M
,
Browning
L
,
Sirinukunwattana
K
, et al.
Automated quality assessment of large digitised histology cohorts by artificial intelligence
.
Sci Rep
.
2022
;
12
(
1
):
5002
.
22.
Kohlberger
T
,
Liu
Y
,
Moran
M
, et al.
Whole-slide image focus quality: automatic assessment and impact on AI cancer detection
.
J Pathol Inform
.
2019
;
10
:
39
.
23.
Pantanowitz
L
,
Quiroga-Garza
GM
,
Bien
L
, et al.
An artificial intelligence algorithm for prostate cancer diagnosis in whole slide images of core needle biopsies: a blinded clinical validation and deployment study
.
Lancet Digit Health
.
2020
;
2
(
8
):
e407
e416
.
24.
Sandbank
J
,
Bataillon
G
,
Nudelman
A
, et al.
Validation and real-world clinical application of an artificial intelligence algorithm for breast cancer detection in biopsies
.
NPJ Breast Cancer
.
2022
;
8
(
1
):
129
.
25.
Patel
AU
,
Shaker
N
,
Erck
S
, et al.
Types and frequency of whole slide imaging scan failures in a clinical high throughput digital pathology scanning laboratory
.
J Pathol Inform
.
2022
;
13
:
100112
.
26.
Montezuma
D
,
Monteiro
A
,
Fraga
J
, et al.
Digital pathology implementation in private practice: specific challenges and opportunities
.
Diagnostics (Basel)
.
2022
;
12
:
529
.
27.
Goode
A
,
Gilbert
B
,
Harkes
J
, et al..
Satyanarayanan
M.
OpenSlide: a vendor-neutral software foundation for digital pathology
.
J Pathol Inform
.
2013
;
4
:
27
.
28.
Song
Z
,
Yu
C
,
Zou
S
, et al.
Automatic deep learning-based colorectal adenoma detection system and its similarities with pathologists
.
BMJ Open
.
2020
;
10
:e0
36423
.
29.
Qu
L
,
Liu
S
,
Liu
X
, et al.
Towards label-efficient automatic diagnosis and analysis: a comprehensive survey of advanced deep learning-based weakly supervised, semi-supervised and self-supervised techniques in histopathological image analysis
.
Phys Med Biol
.
2022
;
67
(
20
).
30.
Campanella
G
,
Hanna
MG
,
Geneslaw
L
, et al.
Clinical-grade computational pathology using weakly supervised deep learning on whole slide images
.
Nat Med
.
2019
;
25
:
1301
1309
.

Author notes

Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the July 2025 table of contents.

This research was supported by the Seegene Medical Foundation, South Korea, under the project “Research on Developing a Next Generation Medical Diagnosis System Using Deep Learning” (Grant Number: G01180115).

All authors report employment with Seegene Medical Foundation. Jang, Park, and Ko have intellectual property interests relevant to the work that is the subject of this paper. The authors have no other relevant financial interest in the products or companies described in this article.

Portions of this manuscript were presented as a poster at the 35th European Congress of Pathology; Dublin, Ireland; September 10, 2023.

Supplementary data