Increasing implementation of whole slide imaging together with digital workflow and advances in computing capacity enable the use of artificial intelligence (AI) in pathology, including breast pathology. Breast pathologists often face a significant workload, with diagnosis complexity, tedious repetitive tasks, and semiquantitative evaluation of biomarkers. Recent advances in developing AI algorithms have provided promising approaches to meet the demand in breast pathology.
To provide an updated review of AI in breast pathology. We examined the success and challenges of current and potential AI applications in diagnosing and grading breast carcinomas and other pathologic changes, detecting lymph node metastasis, quantifying breast cancer biomarkers, predicting prognosis and therapy response, and predicting potential molecular changes.
We obtained data and information by searching and reviewing literature on AI in breast pathology from PubMed and based our own experience.
With the increasing application in breast pathology, AI not only assists in pathology diagnosis to improve accuracy and reduce pathologists’ workload, but also provides new information in predicting prognosis and therapy response.
Pathology is the gold standard for disease diagnosis. Pathologists are committed to assisting clinicians in the precise treatment of various diseases, especially malignant tumors. The advance in individualized treatment requires higher standards for pathologic diagnosis. The pathologic diagnosis of tumors involves many aspects of analysis and interpretation, such as morphologic identification of tumor cells, evaluation of mitotic counts, determination of lymph node metastasis, interpretation of various biomarkers, etc. However, the accuracy of pathologic diagnosis is limited by a shortage of pathologists, differences in pathologists’ diagnostic skill, and the availability of ancillary studies.1–4
The emergence of whole slide imaging together with a digital workflow has brought many changes to pathology practice, including digital primary sign-out, remote consultations, and archiving slides for teaching, research, and conferences. More importantly, a digital workflow with whole slide images (WSIs) paves the road to implement AI algorithms in routine pathology practice. AI can reduce tedious workload for pathologists, improve their efficiency and accuracy, provide new information of disease prognosis and therapy response, etc. Machine learning (ML), a subfield of AI, develops algorithms to learn repetitive data patterns from a large data set of cases and then matches new cases to the learned data patterns. In recent years, deep learning (DL) algorithms have gained increasing attention in big data and image processing. DL is a subset of ML using artificial neural networks composed of multiple layers (input layer, hidden layers, and output layer) to extract progressively higher-level features from data. The neural network learns data patterns by generating multiple hidden variables from data and also learns hierarchical representations of sophisticated data patterns that cannot be easily identified by humans. DL includes supervised learning, semisupervised learning, unsupervised learning, and transfer learning.5–8 Deep convolutional neural network, one of the DL algorithms, has shown superiority in image recognition and analysis and been used in image-based detection and segmentation to identify and quantify different cells. It can also provide accurate prognosis and identify potential drug target indicators.9,10 Currently, research using ML and pathologic images is a hot topic in medical fields.11–13
Breast pathologists often face a significant workload, with diagnosis complexity and tedious repetitive tasks, such as the quantification of biomarkers and the evaluation of lymph node metastasis. These tasks are time-consuming, labor-intensive, and subjected to interobserver variability. Recent advances in digital pathology workflow integrated with AI have provided promising approaches to solve these challenges. In this review, we summarize current and potential AI applications in diagnosing and grading breast carcinomas, predicting prognosis and therapy response, quantifying breast cancer biomarkers, detecting lymph node metastasis, and predicting potential molecular changes.
AI-BASED DETECTION AND CLASSIFICATION OF BREAST CARCINOMA AND OTHER LESIONS
With the advent of the era of precision medicine, the accurate classification, grading, and determination of the tumor extent of breast cancers have become important for clinical management. Histopathologic diagnosis of breast cancer is the basis for clinical management, and accurate classification of breast cancer is crucial for treatment options. Recently, researchers have developed ML/DL algorithms to detect and classify breast cancers. Han et al14 proposed a novel DL model to automate multiclassification of breast cancer histopathologic types, such as ductal carcinoma, lobular carcinoma, mucinous carcinoma, papillary carcinoma, etc. The model was validated on a large-scale data set with a high level of performance (average accuracy of 93.2%). Cruz-Roa et al15 built a convolutional neural network (CNN) model to classify image blocks (“patches”) from breast cancer WSIs containing invasive ductal carcinoma, and then used a ConvNet classifier to estimate the extent of invasive foci and degree of infiltration on entire WSIs. The study used manually annotated region labels from 400 slides from multiple institutions to train the model and validated it on 200 annotated slides obtained from The Cancer Genome Atlas (TCGA) with a pixel-level F1 score of 75.86%. The model was able to detect invasive carcinoma regions on WSIs with high accuracy, even when tested on a validation set from a different cohort. They found that lesions of invasive carcinoma mixed with in situ carcinoma were most challenging, but the performance could be improved by training a more complex algorithm on a data set with more in situ carcinomas.
Several AI platforms (algorithms) are commercially available to detect/screen breast lesions from breast core biopsy specimens. One example is the GALEN Breast algorithm (IBEX; Figure 1). The algorithm can screen entire breast core needle biopsy WSIs to produce heat maps for different breast lesions, including invasive carcinoma (ductal and lobular), in situ carcinoma (ductal and lobular), and atypical hyperplasia (ductal and lobular), and benign findings, such as sclerosing adenosis, fat necrosis, etc. The algorithm was based on an ensemble of CNNs trained on more than 2 million labeled image patches that were extracted from manual annotations on 2153 hematoxylin-eosin (H&E) slides. A separate different data set of 436 breast biopsies was used to test the algorithm’s performance and the results demonstrated an area under the curve (AUC) of 0.99 for the detection of invasive carcinoma and an AUC of 0.98 for the detection of ductal carcinoma in situ (DCIS). The algorithm differentiated well between subtypes/grades of invasive and in situ carcinoma, with an AUC of 0.97 for invasive ductal carcinoma versus invasive lobular carcinoma and an AUC of 0.92 for high-grade DCIS versus low-grade DCIS/atypical ductal hyperplasia, respectively.16
IBEX GALEN Breast detects invasive lobular carcinoma in breast core biopsy. A, Hematoxylin-eosin (H&E) slide. B, Annotated images with invasive lobular carcinoma area highlighted in red (H&E, original magnifications ×0.5 [A] and ×10 [B]).
IBEX GALEN Breast detects invasive lobular carcinoma in breast core biopsy. A, Hematoxylin-eosin (H&E) slide. B, Annotated images with invasive lobular carcinoma area highlighted in red (H&E, original magnifications ×0.5 [A] and ×10 [B]).
AI-BASED GRADING OF BREAST CARCINOMA
The Nottingham histologic grading system is the most used grading system for invasive breast carcinoma and is well correlated with prognosis and outcome. The Nottingham grading system includes 3 components: tubule formation, nuclear pleomorphism, and mitotic activity. Pathologists usually evaluate these 3 components manually, and interobserver variability does exist.17 Several studies have shown that DL neural networks can improve the accuracy of histologic grading assessment.18–21
Mitotic activity is an important component of Nottingham grading system and a key predictor for breast carcinoma aggressiveness. Mitotic activity is mainly determined by manual count, which is time-consuming and labor-intensive. Recently, studies have reported AI-assisted mitotic count in breast cancer.22–24 Nateghi et al23 proposed a fully automated system to count mitosis. They first constructed a DL model to detect regions of interest (ROIs) by selecting hotspot regions from WSIs, then trained deep neural networks to identify nuclear mitosis in the selected ROIs. Their results showed the model significantly improved the accuracy of tumor proliferation. Li et al24 proposed an AI model to identify mitosis by using a novel multistage DL framework with the following components: (1) a deep segmentation network for dividing mitotic regions when only weak markers were present, (2) a deep detection network for locating mitoses using information from the upper and lower regions, and (3) a validation network that improved the detection accuracy by eliminating false positive mitosis. They validated their method on the 2012 international conference on pattern recognition (ICPR) grand challenge data set and the 2014 ICPR MITOS-ATYPIA challenge data set to achieve the highest F-score of 0.832 on the ICPR 2012 grand challenge data set and F-score of 0.572 on the 2014 ICPR MITOS-ATYPIA challenge data set. Sebai et al25 developed “MaskMitosis” DL framework to estimate mitosis on 2012 ICPR grand challenge and 2014 ICPR MITOS-ATYPIA challenge data sets and their method outperformed all state-of-the-art mitosis detection approaches on the 2014 ICPR data set by achieving an F-score of 0.475. Mahmood et al26 developed a mitotic cell detection method based on Faster region CNN (Faster R-CNN) and deep CNNs using ICPR 2012 and ICPR 2014 (MITOS-ATYPIA-14) data sets and demonstrated that their method achieved the results of 0.876 precision, 0.841 recall, and 0.858 F1-measure for the ICPR 2012 data set, and 0.848 precision, 0.583 recall, and 0.691 F1-measure for the ICPR 2014 data set, which were higher than those obtained using previous methods.
Nuclear pleomorphism is another component of the Nottingham grading system. Morphologic features of nuclear pleomorphism include nuclear size, nuclear contour, nucleolus, vesicular nuclei, and chromatin clump. AI algorithms have the potential to objectively assess these features with improved reproducibility compared with pathologists’ visual assessment.27
There has been a paucity of studies to evaluate tubule formation, another component of the Nottingham grading system. One study developed a DL classifier to automatically identify tubule nuclei from whole slide imaging and then obtain the ratio of tubule nuclei to overall number of nuclei (a tubule formation indicator) to correlate with the corresponding Oncotype DX (Genomic Health, Redwood City, California) risk categories. The study demonstrated a good correlation between a larger tubule formation indicator and low Oncotype DX score.28 The Oncotype DX assay assesses the expression levels of 21 genes involved in the pathways of proliferation, invasion, estrogen and human epidermal growth factor receptor 2 (HER2) signaling to generate a recurrent score (RS) to predict possibility of recurrence and chemotherapy benefit.29,30 Oncotype DX RS has been widely accepted in clinical practice across the United States and Canada to guide decisions on adjuvant systemic chemotherapy if a patient has ER+, HER2− breast cancer.31
Furthermore, Elsharawy et al22 proposed a supervised CNN model to render an AI grade of breast cancers based on nuclear features. Both AI grade and routine Nottingham grade were used to train models to evaluate their correlation with molecular changes and prognosis. Their results showed that AI grade was helpful to identify genetic changes with important prognostic significance. Wang et al17 developed and validated a novel histologic grading model using a DL method trained on WSIs and DL to analyze Nottingham grade 2 breast carcinomas. Their results demonstrated prognostic significance for stratifying patients with Nottingham grade 2 breast cancers.
AI-BASED DETECTION OF LYMPH NODE METASTASIS
Accurate assessment of axillary lymph node metastasis in breast cancer patients is crucial for their clinical management because lymph node metastasis status is significantly correlated with prognosis. Lymph node metastasis evaluation is time-consuming and labor-intensive. Although identifying macrometastasis is straightforward, it may be challenging to manually detect micrometastasis or isolated tumor cells. Recent studies have demonstrated AI algorithms can improve the accuracy and efficiency of lymph node assessment.32–34
During the period from 2015 to 2016, a researcher challenge competition (CAMELYON16) was launched to develop automated solutions for detecting lymph node metastases using a training data set of WSIs with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemistry (IHC). Algorithm performance was evaluated in an independent test set of 129 WSIs (49 with and 80 without metastases). The AUC for the submitted algorithms ranged from 0.556 to 0.994 and the best algorithm (AUC = 0.994) performed significantly better than the pathologists.35 Recently, Huang et al36 developed an AI-assisted lymph node assessment workflow using CNNs trained on 5907 lymph node images. The results demonstrated the algorithm identified metastatic lymph nodes in gastric cancer with an AUC of 0.9936, and the algorithm was also highly robust (AUC = 0.9829) on cross-site evaluation. Steiner et al34 reported the results of 6 pathologists who interpreted 70 lymph node WSIs according to 2 assessment methods (with and without AI assistance). The AI-assisted assessment demonstrated higher accuracy than AI alone or visual assessment by pathologists alone. The AI-assisted assessment significantly improved the sensitivity of micrometastasis detection (91% versus 83%, P = .02). Furthermore, the mean interpretation time per image was significantly shorter for AI-assisted assessment than for visual assessment alone. Liu et al32 developed and validated an AI algorithm using 270 lymph node images from 2 centers as a training set and 129 lymph node images as a validation set. The algorithm’s optimal AUC reached 0.99, whereas the pathologist’s optimal performance AUC was 0.88.
Several lymph node metastasis AI algorithms are commercially available. One example is the Visiopharm lymph node metastasis detection app. The app includes 4 steps: tissue detection, metastasis detection, measurement, and calculation. The process can be automated using directly streamed WSIs from an image management system and batch analysis. The results include annotated WSIs, calculated maximum length, and area of metastasis (Figure 2). Our unpublished data have demonstrated that the AI algorithm detected all 44 metastases (19 macrometastases, 25 micrometastases, 1 with isolated tumor cells) out of 233 lymph nodes with a sensitivity of 100%, specificity of 41.3%, positive predictive value of 28.4%, and negative predictive value of 100%, indicating its utility as a screening modality in routine clinical practice.
Representative images of lymph node metastasis detected by artificial intelligence (AI) algorithm. A, One annotated lymph node with macrometastasis from an invasive lobular carcinoma case. B, High magnification of the lymph node with macrometastasis from invasive lobular carcinoma. C, One annotated lymph node with micrometastasis. D, One annotated lymph node with isolated tumor cells (hematoxylin-eosin, original magnifications ×2 [A], ×40 [B], ×10 [C], and ×20 [D]). Arrows indicate metastatic carcinomas.
Representative images of lymph node metastasis detected by artificial intelligence (AI) algorithm. A, One annotated lymph node with macrometastasis from an invasive lobular carcinoma case. B, High magnification of the lymph node with macrometastasis from invasive lobular carcinoma. C, One annotated lymph node with micrometastasis. D, One annotated lymph node with isolated tumor cells (hematoxylin-eosin, original magnifications ×2 [A], ×40 [B], ×10 [C], and ×20 [D]). Arrows indicate metastatic carcinomas.
AI-BASED BREAST CANCER BIOMARKER QUANTIFICATION
Tissue biomarkers are surrogates for diagnosing disease, predicting prognosis, and selecting patients for targeted therapy. Protein biomarkers are commonly tested using IHC, and nucleic acid biomarkers are tested using in situ hybridization (ISH) or polymerase chain reaction. Clinically, tissue biomarker stains are manually examined and interpreted. Since the wide implementation of digital pathology, it is emerging that tissue biomarkers are assessed using AI tools, which can provide more objective and reproducible results.37–40 In breast pathology, these tissue biomarkers include estrogen receptor (ER), progesterone receptor (PR), HER2/neu, Ki-67, and programmed death ligand-1 (PD-L1) IHCs, as well as ISH analysis of HER2/neu.39–49
Estrogen Receptor and Progesterone Receptor Immunohistochemistry
American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) guidelines recommend that all primary and recurrent/metastatic breast carcinomas should be tested for ER and PR expression, which has both prognostic and predictive values.50–53 ER/PR expression is tested by IHC and usually assessed by pathologists’ manual scoring, which has interobserver/intraobserver variability.54–62 Computational assessment of ER/PR IHC using AI algorithms provides opportunities to improve precision performance (Figure 3). AI algorithms not only demonstrate excellent correlation with pathologists’ manual scoring but also yield higher reproducibility than pathologists’ scoring.39,41–43,63–66 Furthermore, AI algorithms can be coupled with a digital pathology laboratory information system to provide an automated workflow.67,68 Like manual scoring, AI algorithms provide the ratio of positively staining tumor cells. Additionally, AI algorithms can further divide tumor cells into different staining intensity and then calculate an H-score by multiplying the percentages of nuclei with their corresponding staining intensity. Challenges still exist in evaluating ER/PR IHC using automated AI algorithms. False-positive results can be caused by intermixed benign glands in tumor area and DCIS components in invasive carcinoma. False-negative results can be caused by faint IHC staining that is not detected by AI algorithms.69 Therefore, a pathologist’s final review to confirm AI algorithms’ analysis is necessary.
Example of estrogen receptor (ER) quantification by artificial intelligence algorithm. The left panels (A, C, and E) show ER immunohistochemistry staining, and the right panels (B, D, and F) show cell segmentation and ER quantification with pseudocolors (blue, negative staining; red, positive staining). Invasive carcinoma is automatically detected and outlined by the algorithm. A and B, ER with <1% positive staining. C and D, ER with 25% positive staining. E and F, ER with 90% positive staining (immunostains of ER, original magnification ×10).
Example of estrogen receptor (ER) quantification by artificial intelligence algorithm. The left panels (A, C, and E) show ER immunohistochemistry staining, and the right panels (B, D, and F) show cell segmentation and ER quantification with pseudocolors (blue, negative staining; red, positive staining). Invasive carcinoma is automatically detected and outlined by the algorithm. A and B, ER with <1% positive staining. C and D, ER with 25% positive staining. E and F, ER with 90% positive staining (immunostains of ER, original magnification ×10).
HER2 Immunohistochemistry
HER2 is another important prognostic and therapeutic biomarker in breast pathology. Up to 20% of breast carcinomas harbor HER2 protein overexpression/gene amplification.70–73 The 2018 ASCO/CAP guideline recommends that HER2 status should be determined for all invasive breast carcinomas by HER2 IHC and/or by ISH.74 Like ER/PR IHCs, HER2 IHC is usually evaluated by pathologists’ manual scoring, which often shows interobserver variability.74–76 According to the guidelines, HER2 IHC is classified into negative (0 and 1+), equivocal (2+), and positive (3+) based on HER2 membranous staining intensity together with the percentage of tumor cells with membranous staining. Studies have demonstrated AI algorithms provide equally accurate, but more objective and reproducible assessments than manual scoring.44–48,77 Different approaches have been applied in HER2 IHC AI algorithms. Some AI algorithms segment tumor cells, classify each cell into a different staining category, and then calculate to obtain a final score. Other AI algorithms evaluate the connectivity of HER2-stained membrane to determine HER2 IHC score44 (Figure 4). Studies have used such AI algorithms to accurately determine HER2 IHC score and to discriminate HER2+ and HER2− cases.44,78–80 Most AI algorithms couple with WSIs or ROI images to evaluate HER2 IHC. An AI-assisted microscope has been developed by equipping a conventional microscope with an HER2 scoring AI algorithm and an augmented reality module. This AI-assisted microscope enables pathologists to obtain real-time HER2 IHC results for each view field and improves the consistency and accuracy of pathologist scoring.81 As the scientific understanding of HER2 in breast cancer is evolving, it may become necessary to further stratify HER2 negative breast cancers into HER2 IHC 0 and HER2-low (HER2 IHC 1+ and HER2 IHC 2+/ISH negative) categories.82–84 It is suggested that AI may play an important role in classifying these categories.
Human epidermal growth factor receptor 2 (HER2) immunohistochemistry (IHC) and the connectivity analyzed by artificial intelligence (AI) algorithm. A and B, One case with HER2 IHC 0. C and D, One case with HER2 IHC 1+. E and F, One case with HER2 IHC 2+. G and H, One case with HER2 IHC 3+. HER2 IHC (A, C, E, and G); HER2 connectivity (green color line) detected by AI algorithm (B, D, F, and H) (immunostain of HER2, original magnifications ×5 [A and B] and ×10 [C through G]).
Human epidermal growth factor receptor 2 (HER2) immunohistochemistry (IHC) and the connectivity analyzed by artificial intelligence (AI) algorithm. A and B, One case with HER2 IHC 0. C and D, One case with HER2 IHC 1+. E and F, One case with HER2 IHC 2+. G and H, One case with HER2 IHC 3+. HER2 IHC (A, C, E, and G); HER2 connectivity (green color line) detected by AI algorithm (B, D, F, and H) (immunostain of HER2, original magnifications ×5 [A and B] and ×10 [C through G]).
HER2 ISH
HER2 gene amplification is examined by HER2 ISH, including fluorescence ISH (FISH), silver ISH, chromogenic ISH, and dual ISH.86,87 Studies have demonstrated excellent concordance between HER2 ISH and IHC in determining HER2 status.88 AI algorithms have been developed to assist pathologists in detecting, classifying, and counting HER2 signals in cells of interest, and excellent concordance between manual scoring and AI scoring of HER2 FISH has been detected.89 The benefits of using AI to analyze HER2 FISH include improved efficiency and productivity because manual counting of HER2 FISH signals is time consuming.89–93
Ki-67 Immunohistochemistry
Ki-67 is a surrogate marker for cell proliferation and is expressed in all cell cycle phases except in G0.94–97 Ki-67 expression is related to breast tumor grade and biologic behavior, with high expression associated with worse prognosis.97–100 Cyclin-dependent kinases 4 and 6 (CDK4/6) inhibitor was recently approved for early-stage ER+, HER2− breast cancer with high risk of recurrence and a Ki-67 score of 20% or higher.101,102 The Ki-67 IHC MIB-1 pharmDx assay was also approved as a companion diagnostic test for this indication.
Ki-67 expression is assessed by IHC and scored as a percentage of tumor cells stained by Ki-67 antibody. Several methodologies are used to score Ki-67 IHC, including visual estimation, manual counting of ROIs, and manual comprehensive counting of the whole slide. Visual estimation has notable interobserver variability.96,97,103 Similarly, interobserver variability also exists in selecting ROIs for counting.104,105 Manual comprehensive counting of the whole slide improves reproducibility, but it is very time-consuming and impractical during clinical routine practice.106
Automated scoring methods using AI algorithms have shown promise for assessing Ki-67 IHC.37,107–113 AI algorithms automatically detect tumor nuclei based on their shape and size, identify positively stained tumor nuclei based on their color, and then calculate the positivity rate. However, automated AI algorithms have pitfalls. First, AI algorithms can confuse tumor cells with surrounding stromal cells, inflammatory cells, or artefacts. Second, AI algorithms can misclassify lesions, such as by including in situ components as invasive carcinoma for analysis. Therefore, pathologist intervention or extensive supervised learning for sophisticated cell segmentation and classification may be necessary. Recently, new AI algorithms have been developed to more accurately score Ki-67 using dual IHC (Ki-67 plus cytokeratin stain for labeling tumor cells) or sequential IHC stains followed by virtual image reconstruction.114 Nevertheless, AI algorithms are capable of analyzing the whole slide efficiently, which may result in higher accuracy than counting on ROIs.115
PD-L1 IHC
PD-L1 IHC is an emerging test to select triple-negative breast cancer (TNBC) patients for immunotherapy. Two PD-L1 assays (SP142 and 22C3) had been approved as companion diagnostic tests in breast cancer; however, SP142 was recently withdrawn because of a failed clinical trial.116–121 Currently, only the 22C3 test is used for breast cancer. PD-L1 22C3 (Agilent Technologies, Santa Clara, California) uses a combined positive score (CPS) to assess PD-L1 expression. The CPS score is calculated as the number of PD-L1–stained cells (tumor cells, lymphocytes, and macrophages) divided by the total number of tumor cells multiplied by 100.121
Although studies have demonstrated AI algorithms’ assessment of PD-L1 IHC showed excellent correlation with pathologists’ manual scoring, only a few PD-L1 AI algorithms have been validated for clinical practice.122,123 This may be due to the relatively short time of PD-L1 IHC in clinical practice and the complexity of PD-L1 testing (multiple antibodies and different scoring systems).
AI-BASED EVALUATION OF TUMOR-INFILTRATING LYMPHOCYTES IN BREAST CANCER
In breast cancer, especially triple-negative breast cancer and HER2+ breast carcinoma, the presence of tumor-infiltrating lymphocytes (TILs) in the tumor microenvironment is associated with better response to therapy and overall survival.124–128 TILs have been emerging as a biomarker in breast carcinoma.128,129 However, manual assessment of TILs is subjective, with significant interobserver variability and poor reproducibility.124,128
With the widespread implementation of digital pathology, AI algorithms have been developed to evaluate TILs aiming for more accurate and reproducible results.130–135 In 1 study, a computer-aided diagnosis scheme was developed to automatically detect and grade the extent of lymphocytic infiltrate in HER2+ breast carcinoma using WSIs, and the results showed the architectural feature set successfully distinguished samples of high and low lymphocytic infiltrate levels with a classification accuracy greater than 90%.131 In another study, an automated method based on grid subsampling of microscopy image analysis data was developed to extract the tumor-stroma interface and compute immunogradient indicators for TIL density profiles, which provided a strong and independent prognostic stratification in ER+ breast cancer patients.132
AI-BASED PREDICTION OF PROGNOSIS, SURVIVAL, AND THERAPY RESPONSE IN BREAST CANCER
The combined antiestrogen therapy and chemotherapy have significantly reduced recurrence frequency and improved survival rate in certain populations of ER+ breast cancer patients.136 However, these therapies, especially chemotherapy, have toxic side effects; therefore, identifying patients who are more likely to benefit from chemotherapy is important. An ASCO Clinical Practice Guideline recommends that clinicians may use Oncotype DX RS to guide decisions on adjuvant systemic chemotherapy if a patient has ER+, HER2− breast cancer and the Oncotype DX assay has been widely accepted in clinical practice across the United States and Canada.31 However, Oncotype DX is a prohibitively expensive test. Multiple studies have suggested that standard histopathologic variables (including tumor grade, tubule formation, nuclear pleomorphism, and mitotic activity) together with breast cancer biomarkers (ER, PR, HER2), can provide information similar to that provided by the Oncotype DX RS.137–140 An AI-based algorithm using histopathologic WSIs, Image-based Risk Score (IbRiS), was developed to serve as a proxy for Oncotype DX, and demonstrated a mean accuracy of 84% in distinguishing low-risk from high-risk specimens.141
Breast cancer patients’ prognosis and survival are closely related to histopathologic grade (Nottingham grade), as mentioned above. Studies have shown a considerable amount of interobserver variability in breast cancer histopathologic grade determined by pathologists.142 AI-based approaches have been developed to measure the morphologic components in the Nottingham grading system and correlated to the patient’s prognosis and recurrent risk. A DL algorithm was developed to automatically measure tubule formation in breast carcinoma WSIs to generate a tubule formation indicator, which correlated with histologic grade and recurrence risk determined by molecular test (Oncotype DX).28 One study evaluated the ability of computer-extracted nuclear morphology features from histologic images to train 4 ML classifiers (Random Forest, Neural Network, Support Vector Machine, and Linear Discriminant Analysis) for breast cancer risk categorization and demonstrated per-patient accuracies ranging from 75% to 86% by correlating with Oncotype DX risk categories.143 Additional study quantitated computer-extracted image features of nuclear shape and orientation on digitized breast carcinoma images and demonstrated that quantitative histomorphometric features of nuclear shape and orientation were strongly and independently predictive of patient survival.144 In another study, a deep neural network classifier was also developed to quantify mitosis in breast cancer WSIs and demonstrated high accuracy (83%) and good correlation with Oncotype DX risk categories.145
ML algorithms were also developed to predict neoadjuvant chemotherapy response in breast cancer using clinical and pathologic features. In 1 study, a standard multivariable logistic regression (MLR) was developed using patient demographics, histologic characteristics (ER status, HER2 status, Nottingham grade, tumor size, and nodal status), molecular status, and staging information. MLR was compared with 5 ML models (k-nearest neighbor classifier, random forest classifier, naive Bayes algorithm, support vector machine, and multilayer perceptron model) for their performance in predicting neoadjuvant chemotherapy response in breast cancer. The AUC for the MLR was 0.64. Among the 5 ML models mentioned above, the random forest classifier performed best, with an AUC of 0.88.146
AI-BASED PREDICTION OF GENETIC ABNORMALITY IN BREAST CANCER
Multiple studies have sought to predict HER2 status (amplified or nonamplified) using histologic images in breast cancer. Farahmand et al13 developed a novel CNN classifier trained on 188 WSIs manually annotated for tumor ROIs to be able to predict HER2 status with an AUC of 0.90 in cross-validation of slide-level HER2 status and 0.81 on an independent TCGA test set. Additionally, the classifier was tested on pretreatment samples from 187 HER2+ patients who subsequently received trastuzumab therapy to predict trastuzumab response with an AUC of 0.80. Another study trained a classification pipeline to determine HER2 overexpression status of H&E-stained WSIs which achieved an AUC of 0.82 (CI, 0.65–0.98) on held-out cases and an AUC of 0.76 (CI, 0.61–0.89) on the independent data set from TCGA.147 Recently, a group of researchers presented the HER2 on hematoxylin and eosin (HEROHE) challenge, which aimed to predict the HER2 status in breast cancer using a large, publicly available, annotated H&E whole slide imaging data set (n = 509).148 This was a parallel event of the 16th European Congress on Digital Pathology, and 21 teams worldwide submitted their models, and the best-performing models were presented with details of their models’ architectures and key parameters.
Besides predicting HER2 genetic status, AI algorithm was also developed based on H&E images to predict BRCA gene mutation in breast cancer. Wang et al149 trained a deep CNN of ResNet WSIs and validated their model through an external data set that contained 17 BRCA mutated and 47 wild-type cases with AUCs (95% CI) as 0.766 (0.763–0.769), 0.763 (0.758–0.769), 0.750 (0.738–0.761), and 0.551 (0.526–0.575), using ×40, ×20, ×10, and ×5 magnification tiles, respectively.
Recently, an explainable ML approach was developed to investigate integrated profiling of morphologic, molecular, and clinical features from breast cancer histology. This ML approach was able to predict molecular features in breast cancer, including DNA methylation, gene expression, copy number variations, and somatic mutations with balanced accuracy up to 78% and very high accuracy (>95%) in subgroup of patients for specific genes, such as p53, PTEN, etc.150 Another study used histologic images through a DL-based model to predict chromosomal instability status in breast cancer and achieved an AOC of 0.822 in correctly classifying chromosomal instability status.151
LIMITATIONS TO AI APPLICATIONS IN BREAST PATHOLOGY
Several key factors must be considered when adopting AI algorithms in breast pathology. The first one is the quantity and quality of training data from which the AI algorithm was developed. Significant variations exist in file formats of WSIs, scanner quality, and glass slide quality, including staining intensity, coverslip, tissue size, folded tissue, air bubble etc. Manual selection of WSIs with artifact-free and proper quality must be used in the training of an algorithm to develop comprehensive ML/DL models. WSIs with low resolution and low quality have indistinguishable features, making it difficult for an ML/DL model to accurately assess detailed features. Therefore, standardization and normalization of the training data set are necessary for successful AI algorithm development and the application of AI algorithms across different laboratories. Recently, an open-source quality control tool for digital pathology slides, HistoQC, was described to not only identify and delineate artefacts but also discover cohort-level outliers.152 Subsequent study demonstrated HistoQC substantially improved overall concordance in identifying unsuitable WSIs for computational analysis among pathologists (moderate Gwet AC1 range 0.43–0.59 to excellent range 0.79–0.93).153
Second, validating AI algorithms is required before adoption for clinical use. Many above-mentioned AI algorithms are experimental, except for some breast biomarker quantification AI algorithms. Recent studies have demonstrated the successful validation of AI algorithms, and their institutions have started to implement such AI algorithms for clinical practice.69,80
Thirdly, a digital pathology workflow (digital sign-out) is always preferred or necessary to implement AI algorithms for routine pathology practice to achieve an automated workflow. Whole slide imaging technology has been implemented in many institutions worldwide; however, a fully digital pathology workflow has been limited to very few pathology laboratories.67,154
Lastly, pathologists’ trust is also a key limitation in adopting AI algorithms in routine pathology practice. Many AI algorithms are frequently referred to as “black boxes” because they cannot explain clearly what features trigger ML/DL and how neural networks piece together all information. If pathologists do not understand what features AI algorithms use and how the algorithms make decisions, they will be reluctant to believe AI results and adopt AI algorithms.
CONCLUSIONS
With the widespread implementation of WSIs in breast pathology practice, the application of AI algorithms is becoming increasingly popular. Integration of AI algorithms into digital workflow is becoming feasible to significantly improve pathologists’ efficiency and diagnostic accuracy by reducing workload (repetitive tasks, such as lymph node metastasis detection, etc) and diagnostic errors. Some AI algorithms may provide pathologists with new tools to tackle emerging pathologic assessment, such as TIL quantification, prediction of therapy response, and recurrence risk stratification. Furthermore, AI algorithms may complement or replace some expensive molecular tests in breast pathology, such as Oncotype DX and genetic testing.
References
Author notes
The authors have no relevant financial interest in the products or companies described in this article.