Squamous cell carcinoma (SCC) is a histologic type of cancer that exhibits various degrees of keratinization. Identifying lymph node metastasis in SCC is crucial for prognosis and treatment strategies. Although artificial intelligence (AI) has shown promise in cancer prediction, applications specifically targeting SCC are limited.
To design and validate a deep learning model tailored to predict metastatic SCC in radical lymph node dissection specimens using whole slide images (WSIs).
Using the EfficientNetB1 architecture, a model was trained on 6587 WSIs (2413 SCC and 4174 nonneoplastic) from several hospitals, encompassing esophagus, head and neck, lung, and skin specimens. The training exclusively relied on WSI-level labels without annotations. We evaluated the model on a test set consisting of 541 WSIs (41 SCC and 500 nonneoplastic) of radical lymph node dissection specimens.
The model exhibited high performance, with receiver operating characteristic curve areas under the curve between 0.880 and 0.987 in detecting SCC metastases in lymph nodes. Although true positives and negatives were accurately identified, certain limitations were observed. These included false positives due to germinal centers, dust cell aggregations, and specimen-handling artifacts, as well as false negatives due to poor differentiation.
The developed artificial intelligence model presents significant potential in enhancing SCC lymph node detection, offering workload reduction for pathologists and increasing diagnostic efficiency. Continuous refinement is needed to overcome existing challenges, making the model more robust and clinically relevant.
Squamous cell carcinoma (SCC) is a histologic type of cancer commonly observed in areas such as the skin, head and neck, esophagus, lung, and uterine cervix. The detection of lymph node metastases is a critical step in the diagnosis and staging of various types of malignant tumors, including SCC.1 Pathologists play a vital role in this process, spending most of their time meticulously inspecting each dissected lymph node for metastases by microscopic examination. This substantial workload underscores the importance of this task in cancer diagnosis and staging, but it also highlights the potential for efficiency improvements. The development of an artificial intelligence (AI) model capable of accurately identifying lymph node metastases could significantly reduce this workload, freeing up pathologists to focus on other important tasks and potentially improving the overall efficiency of the diagnostic process.
The emergence of AI has brought about significant changes in various fields, including pathology.2,3 AI, particularly deep learning, has shown promise in assisting diagnostic work, with several studies demonstrating its efficacy in cancer detection and diagnosis.4 Recently several reports5,6 have been created using AI models that achieved an area under the curve (AUC) of 0.99 in identifying lymph node metastases in breast cancer patients. Other studies have reported7 AI models detecting lymph node metastasis of bladder cancer with an AUC ranging from 0.97 to 0.99. Additionally, we reported8 a multi-organ adenocarcinoma classification AI model that achieved an AUC ranging from 0.91 to 0.98 in identifying lymph node metastases. However, the application of AI in detecting lymph node metastases in SCC remains limited. Current AI models for lymph node metastasis detection in SCC have shown varying degrees of success, with accuracy rates ranging from 74.4% to 86%.9,10 There is a need for a more efficient and accurate AI model that can assist pathologists in detecting lymph node metastases in SCC.
In this study, we trained deep learning models using weakly supervised learning to predict metastatic SCC in radical lymph node dissection specimens. The training data set consisted of whole slide images (WSIs) of esophagus, head and neck, lung, and skin specimens. In total it had 6587 WSIs, of which 2413 were SCC WSIs and 4174 were nonneoplastic. No manually drawn annotations were available, and only WSI-level labels were used. We evaluated the model on a test set consisting of 541 WSIs (41 SCC and 500 nonneoplastic) of radical lymph node dissection specimens, and the model achieved a receiver operating characteristic curve AUC (ROC-AUC) of 0.94 [CI 0.88–0.98].
METHODS
Clinical Cases and Pathologic Records
In this study, a total of 7128 hematoxylin-eosin–stained histopathologic specimen slides of human SCC and nonneoplastic lesions were collected from surgical pathology files of 5 hospitals: International University of Health and Welfare, Mita Hospital (Tokyo, Japan) and Kamachi Group Hospitals (4 hospitals: Wajiro, Shinkuki, Shinkomonji, and Shinmizumaki Hospitals, Fukuoka, Japan) and from The Cancer Genome Atlas (TCGA) after histopathologic review by surgical pathologists. WSIs from TCGA were downloaded from the TCGA website. Prior to the experimental procedures, each WSI diagnosis (whether it was SCC or nonneoplastic) was first checked by a pathologist and then double-checked by a senior pathologist. All WSIs were scanned at a magnification of ×20 using the same Leica Aperio AT2 Digital Whole Slide Scanner (Leica Biosystems, Tokyo, Japan) and were saved as SVS file format with JPEG 2000 compression. Once slides are scanned at ×20, it is possible then to digitally change them to any magnification that is lower than ×20, such as ×10.
Data Set
Hospitals that provided histopathologic specimen slides were anonymized by randomly assigning a letter (A, B, C, D, or E). Table 1 breaks down the distribution of the training sets from 5 domestic hospitals (hospitals A, B, C, D, and E) and from TCGA. Validation sets were selected randomly from the training sets, and the numbers of validation sets are given in Table 1. The distribution of the test set and SCC origins (primary organs) from 4 domestic hospitals (hospitals A, C, D, and E) is summarized in Table 2. All training and test set WSIs were not manually annotated, and the training algorithm used only the WSI labels that were extracted from the histopathologic diagnostic reports after review by surgical pathologists, meaning that the only information available for the training was whether the WSI contained SCC or nonneoplastic lesions, with no information available about the location of the cancerous lesions.
Deep Learning
We adopted the partial fine-tuning transfer learning approach11 to train our models. This technique efficiently fine-tunes an existing pretrained model by updating only the affine parameters of batch normalization layers and the final classification layer. The base model we used was EfficientNetB1,12 initialized with pretrained weights from ImageNet. Our training methodology is the same as the one used by Tsuneki et al.13
To ensure completeness, we outline the process here. To apply the convolutional neural network to WSIs, we divided the slides into square tiles extracted from tissue areas. For each WSI, we identified and removed most of the white background by using Otsu’s method14 on a grayscale version of the WSI. During training, we initially performed random balanced sampling of tiles from these regions, striving to maintain an equal distribution of each label in the training batch. To achieve this, we arranged WSIs in a shuffled queue and alternated between selecting WSIs with SCC and nonneoplastic labels. Once a WSI was chosen, we randomly sampled a batch of tiles from it to create a balanced batch. To preserve balance across WSIs, we oversampled from them, ensuring the model trained on tiles from all WSIs in each epoch. We transitioned to hard mining of tiles when no improvement was seen on the validation set after 2 epochs. For hard mining, we alternated between training and inference. During inference, the convolutional neural network analyzed all tissue regions in the WSI in a sliding-window manner, selecting the k = 5 tiles with the highest probability for being SCC if the WSI was nonneoplastic and the k = 5 tiles with the lowest probability for being SCC if the WSI was SCC. This step helped identify challenging examples for the model.
To obtain a prediction on a WSI from the test set, we first detected all the tissue regions using Otsu’s thresholding method14 and excluded the white background. We then applied the model in a sliding-window fashion using a fixed-size stride. We then selected the maximum probability among all the tiles to obtain a prediction for the WSI.
We used the Adam optimizer15 with a batch size of 32 and a learning rate of 0.001 during fine-tuning, along with the binary cross-entropy as the loss function. Early stopping was implemented based on the model’s performance on a validation set, automatically stopping training if there was no improvement in validation loss for 10 epochs. The model with the lowest validation loss was chosen as the final model.
Software and Statistical Analysis
Compliance and Ethical Standards
The experimental protocol was approved by the ethical board of International University of Health and Welfare (No. 19-Im-007) and Kamachi Group Hospitals (No. 173). All research activities complied with all relevant ethical regulations and were performed in accordance with relevant guidelines and regulations in the hospitals mentioned above.
Availability of Data and Material
The data sets generated and/or analyzed during the current study are not publicly available because of specific institutional requirements governing privacy protection but are available from the corresponding author on reasonable request. The data sets that support the findings of this study are available from International University of Health and Welfare, Mita Hospital (Tokyo, Japan), and Kamachi Group Hospitals (Fukuoka, Japan), but restrictions apply to the availability of these data, which were used under a data use agreement that was made according to the Ethical Guideline for Medical and Health Research Involving Human Subjects as set by the Japanese Ministry of Health, Labour, and Welfare, and so are not publicly available. However, the data are available from the authors upon reasonable request for private viewing and with permission from the corresponding medical institutions within the terms of the data use agreement and if compliant with the ethical and legal requirements as stipulated by the Japanese Ministry of Health, Labour, and Welfare.
RESULTS
High AUC Performance of WSI SCC Histopathology Images
We trained the model using weakly supervised learning so as to use only the WSI labels. The model is based on the EfficientNetB1 architecture. We used WSI at a magnification of ×10, and we applied the model in a sliding window fashion with an input tile size of 224 × 224 pixels and a stride of 224. The training set used for weakly supervised learning is summarized in Table 1. Subsequently, we evaluated the model on test sets obtained from 4 domestic hospitals, as outlined in Table 2. Figure 1 shows the ROC curve. Tables 3 and 4 summarize the results using the following metrics: ROC-AUC, log loss, accuracy, sensitivity, and specificity (using a probability threshold of 0.5). Figures 2, A through D; 3, A and B; 4, A through F; and 5, A through D, show representative true-positive, true-negative, false-positive, and false-negative cases, respectively, using the model.
A receiver operating characteristic curve with area under the curve (AUC) obtained on the test set.
A receiver operating characteristic curve with area under the curve (AUC) obtained on the test set.
Representative example from the test set of metastatic squamous cell carcinoma of a true-positive prediction in a case of radical lymph node dissection (lymphadenectomy). In the whole slide images of radical lymph node dissection specimens, the heatmap images (B and D) show true-positive prediction of metastatic squamous cell carcinoma cells that correspond, respectively, to hematoxylin-eosin histopathology (A and C). In (A), only 1 lymph node (enclosed in a black frame) was positive for metastatic squamous cell carcinoma (C). The heatmap image (B) shows true-positive predictions that were consistent with areas of metastatic squamous cell carcinoma invasion in the same lymph node (D). The heatmap image (B) also correctly shows no false-positive predictions in all the lymph node areas outside (D). The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (original magnifications ×2 [A and B] and ×10 [C and D]).
Representative example from the test set of metastatic squamous cell carcinoma of a true-positive prediction in a case of radical lymph node dissection (lymphadenectomy). In the whole slide images of radical lymph node dissection specimens, the heatmap images (B and D) show true-positive prediction of metastatic squamous cell carcinoma cells that correspond, respectively, to hematoxylin-eosin histopathology (A and C). In (A), only 1 lymph node (enclosed in a black frame) was positive for metastatic squamous cell carcinoma (C). The heatmap image (B) shows true-positive predictions that were consistent with areas of metastatic squamous cell carcinoma invasion in the same lymph node (D). The heatmap image (B) also correctly shows no false-positive predictions in all the lymph node areas outside (D). The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (original magnifications ×2 [A and B] and ×10 [C and D]).
True-Positive SCC Prediction of Radical Lymph Node Dissection (Lymphadenectomy) WSIs
The detection of lymph node metastases is a critical component in the diagnosis and staging of various types of malignant tumors, including SCC. A lymphadenectomy (radical lymph node dissection) is a surgical procedure to evaluate evidence of metastatic cancer and a crucial step in cancer staging. The accuracy of cancer staging is paramount in determining the most appropriate treatment strategy for patients. The process of examining dissected lymph nodes for metastases using a microscope is time-consuming and requires a high level of expertise. In our study, our AI model achieved high ROC-AUCs (0.880–0.987). Additionally, the model exhibited low values of log loss (0.105–0.162) and high accuracy (0.850–0.904), sensitivity (0.837–1.000), and specificity (0.845–0.899) (Table 3). As shown in Figure 2, A and B, the heatmap image (Figure 2, B) shows true-negative predictions of internal lymph nodes without metastasis. Furthermore, our model successfully detected lymph node metastasis in a representative SCC case (Figure 2, C and D).
True-Negative SCC Prediction of Radical Lymph Node Dissection (Lymphadenectomy) WSIs
Our model shows true-negative predictions of metastatic SCC in lymph nodes without evidence of cancer metastasis (Figure 3, A and B). Figure 3, A, shows a total of 7 lymph nodes without metastasis with a broad range of sizes (small to large) and 1 fragment of connective tissue. The heatmap image shows no prediction of metastatic carcinoma in lymph nodes (Figure 3, B).
Representative example from the test set of a true-negative metastatic squamous cell carcinoma classification of radical lymph node dissection (lymphadenectomy). A, Hematoxylin-eosin–stained image shows 7 lymph nodes and 1 piece of tissue made up of connective tissue (thought to be serous membrane) without evidence of metastatic squamous cell carcinoma. B, The heatmap image shows true-negative prediction of metastatic squamous cell carcinoma. The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (original magnification ×2).
Representative example from the test set of a true-negative metastatic squamous cell carcinoma classification of radical lymph node dissection (lymphadenectomy). A, Hematoxylin-eosin–stained image shows 7 lymph nodes and 1 piece of tissue made up of connective tissue (thought to be serous membrane) without evidence of metastatic squamous cell carcinoma. B, The heatmap image shows true-negative prediction of metastatic squamous cell carcinoma. The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (original magnification ×2).
False-Positive SCC Prediction of Radical Lymph Node Dissection (Lymphadenectomy) WSIs
Figure 4, A through F, shows no evidence of metastatic SCC. Our model exhibits false-positive predictions of SCC for several reasons (Figure 4, A, C, and E). In Figure 4, A, the germinal center (insets of Figure 4, A and B) shows false-positive predictions. In Figure 4, C, the dust cell aggregation causes a false-positive prediction (insets of Figure 4, C and D). Figure 4, E and F, shows lymphoid tissue with crush artifacts due to specimen-handling procedures causing false-positive predictions (insets of Figure 4, E and F). However, it should be noted that false-positive tiles were limited to a very small portion in all cases. Among the 63 false-positive cases, crush artifacts were the most prevalent, representing 29 cases (46.0%). This was followed by false positives in germinal centers in 16 cases (25.4%), dust cell aggregations in 4 cases (6.3%), and vessel-related false positives in 3 cases (4.8%). Unexplained false positives accounted for 11 cases (17.5%).
Three representative examples from the test set of metastatic squamous cell carcinoma false-positive predictions on cases of radical lymph node dissection (lymphadenectomy). Hematoxylin-eosin–stained images (A, C, and E) show no sign of metastatic squamous cell carcinoma. The heatmap images (B, D, and F) show false-positive predictions of squamous cell carcinoma. Insets of (A) and (B) show the germinal center causing a false-positive prediction. Insets of (C) and (D) show dust cell aggregation causing a false-positive prediction. Insets of (E) show lymphoid tissue with crush artifacts due to specimen-handling procedures causing false-positive predictions (insets of F). The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (original magnifications ×4 [A through F] and ×20 [A through F insets]).
Three representative examples from the test set of metastatic squamous cell carcinoma false-positive predictions on cases of radical lymph node dissection (lymphadenectomy). Hematoxylin-eosin–stained images (A, C, and E) show no sign of metastatic squamous cell carcinoma. The heatmap images (B, D, and F) show false-positive predictions of squamous cell carcinoma. Insets of (A) and (B) show the germinal center causing a false-positive prediction. Insets of (C) and (D) show dust cell aggregation causing a false-positive prediction. Insets of (E) show lymphoid tissue with crush artifacts due to specimen-handling procedures causing false-positive predictions (insets of F). The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (original magnifications ×4 [A through F] and ×20 [A through F insets]).
False-Negative SCC Prediction of Radical Lymph Node Dissection (Lymphadenectomy) WSIs
In Figure 5, A, 2 lymph nodes are identified, and metastatic foci of cancer are observed in one of them. The tumor cells demonstrate a lack of keratinization, including features such as keratin pearls and individual cell keratinization, consistent with poorly differentiated SCC (Figure 5, C and D). However, the heatmap image does not predict any SCC cells (Figure 5, B).
Representative example from the test set of metastatic squamous cell carcinoma false-negative prediction on a case of radical lymph node dissection (lymphadenectomy). This case (A) has metastatic poorly differentiated squamous cell carcinoma foci in the boxed areas (C and D, higher-magnification images of areas in right and left boxed areas, respectively) but not in other areas. The heatmap image (B) exhibited no positive squamous cell carcinoma prediction. The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (hematoxylin-eosin, original magnifications ×4 [A and B] and ×20 [C and D]).
Representative example from the test set of metastatic squamous cell carcinoma false-negative prediction on a case of radical lymph node dissection (lymphadenectomy). This case (A) has metastatic poorly differentiated squamous cell carcinoma foci in the boxed areas (C and D, higher-magnification images of areas in right and left boxed areas, respectively) but not in other areas. The heatmap image (B) exhibited no positive squamous cell carcinoma prediction. The heatmap uses a jet color map where blue indicates low probability and red indicates high probability (hematoxylin-eosin, original magnifications ×4 [A and B] and ×20 [C and D]).
DISCUSSION
This study aimed to develop an AI model to detect lymph node metastases in SCC using weakly supervised learning without manual annotations. The model, based on the EfficientNetB1 architecture, was trained using WSIs of esophagus, head and neck, lung, and skin specimens obtained from several hospitals. With ROC-AUCs in the range of 0.880–0.987, it demonstrated good performance in detecting SCC metastases in lymph nodes.
We collected the hematoxylin-eosin–stained samples from 5 medical institutions and from TCGA. This strategy aimed to optimize the diversity in histopathologic variability and quality of samples in our training set (Table 1). In the training set, we did not include radical lymph node dissection specimens because we wanted to train the model based on the primary organs and predict metastatic SCC in lymph nodes. Furthermore, we ensured a diverse distribution of WSIs from esophagus, head and neck, lung, and skin in our training set (Table 1).
Despite numerous studies on the detection of lymph node metastases in adenocarcinoma, research specifically focusing on SCCs remains limited.5,18–20 This is mainly due to the narrow range of organs susceptible to SCC, the small number of SCCs themselves, and the lower incidence of lymph node metastasis compared with adenocarcinoma.21,22 However, just as in adenocarcinoma, the determination of lymph node metastasis and accurate staging in SCC is obviously essential for predicting patient prognosis and informing appropriate treatment strategies.
The implementation of AI models, such as ours, could have far-reaching implications for the field of pathology. Our model can potentially enhance the diagnostic process’s efficiency by accurately identifying true-positive and true-negative cases. The expected benefits of the model include reducing the workload of pathologists and improving the efficiency and accuracy of cancer staging when used in conjunction with human diagnosis to perform the function of double-checking. This could lead to more accurate staging, ensuring that patients receive the most appropriate treatment for their condition.
Nonetheless, our study had its limitations. Although the model demonstrated high performance, it encountered instances of false-positive and false-negative predictions. Certain conditions, like the presence of dust cell aggregation and crush artifacts from specimen-handling procedures, resulted in false-positive predictions (Figure 4). In our study, we identified 2 major patterns contributing to false positives in predicting metastatic SCC. The first pattern involved structures such as lymphoid follicles, blood vessels, and hyalinized stroma, which displayed varying architecture and coloration between their periphery and center. These features could be misinterpreted as resembling keratin pearls under low magnification, potentially leading to a misdiagnosis of metastatic SCC. The second pattern was characterized by clusters of dust cells or chromatin smearing, also known as crush artifact. These could be mistaken for chromatin-dense cell clusters, thereby erroneously suggesting malignancy. Notably, most false positives were confined to a single tile, and extensive false-positive images were absent. Given that these anomalies could be easily identified under low magnification, we believe the screening function is being adequately served. Further refinement using techniques such as reinforcement learning or additional annotation may enhance diagnostic accuracy. Moreover, detection of poorly differentiated SCC proved challenging for the model, leading to false negatives (Figure 5). All 3 cases that were classified as false negatives were lymph node metastases from lung cancer. Although the primary tumors were diagnosed as poorly differentiated SCC, the pathologic reports noted partial positivity for TTF-1 in addition to p40. This raises the possibility that these cases may correspond to adenosquamous carcinoma, which may explain why they were not detected by our current model. These challenges underscore the necessity for continued improvement and refinement of the model’s algorithm to boost its predictive accuracy and reliability. Future work should target these shortcomings to enhance the model. Special attention should be given to its application in clinical settings, with considerations around integration with existing systems, user experience, and cost-effectiveness. Rigorous validation should be carried out to ensure the model’s accuracy and reliability in real-world conditions.
In our model, the positive predictive value was 0.376 (0.284–0.478), and the negative predictive value was 0.993 (0.984–1.000). Generally, a higher positive predictive value is preferred for models used in a screening test. In this study, the test cohort included 41 positive cases and 500 negative cases, with the latter being approximately 12.2 times more frequent. This imbalance could potentially be a contributing factor to the lower positive predictive value observed. In future research, we plan to expand the number of facilities and case numbers to create multiple cohorts with a more balanced distribution of positive and negative cases, allowing for a more robust comparison and validation of the model’s accuracy.
The current model, developed through machine learning, specifically identifies lymph node metastases of SCC, distinguishing them from nonneoplastic lesions. Therefore, it differs from a general model designed to detect lymph node metastases from cancers (including adenocarcinoma, SCC, and others), as it may be overly focused on detecting SCC. Our previous work includes the creation of a model adept at recognizing lymph node metastases from multi-organ adenocarcinomas.8 We plan to synthesize these models to create a universal detector for lymph node metastases, regardless of cancer origin and histologic types. This model is a pivotal milestone in that direction, and our ongoing research is aimed at further refining and developing this innovative method.
In conclusion, our study underscores the significant potential of AI models for detecting lymph node metastases in SCC, offering enhanced efficiency and workload reduction. Nevertheless, it also illuminates the ongoing challenges and underscores the need for further refinement to achieve optimal clinical accuracy and reliability. Thus, our research constitutes an essential stepping stone toward the construction of increasingly robust and reliable AI models for detecting lymph node metastases in SCC.
References
Competing Interests
Kanavati and Tsuneki are employees of Medmain Inc. The authors have no relevant financial interest in the products or companies described in this article.