Context.—

More people receive a diagnosis of skin cancer each year in the United States than all other cancers combined. Many patients around the globe do not have access to highly trained dermatopathologists, whereas some biopsy diagnoses of patients who do have access result in disagreements between such specialists. Mechanomind has developed software based on a deep-learning algorithm to classify 40 different diagnostic dermatopathology entities to improve diagnostic accuracy and to enable improvements in turnaround times and effort allocation.

Objective.—

To assess the value of machine learning for microscopic tissue evaluation in dermatopathology.

Design.—

A retrospective study comparing diagnoses of hematoxylin and eosin–stained glass slides rendered by 2 senior board-certified pathologists not involved in algorithm creation with the machine learning algorithm’s classification was conducted. A total of 300 glass slides (1 slide per patient’s case) from 4 hospitals in the United States and Africa with common variations in tissue preparation, staining, and scanning methods were included in the study.

Results.—

The automated algorithm demonstrated sensitivity of 89 of 91 (97.8%), 107 of 107 (100%), and 101 of 102 (99%), as well as specificity of 204 of 209 (97.6%), 189 of 193 (97.9%), and 198 of 198 (100%) while identifying melanoma, nevi, and basal cell carcinoma in whole slide images, respectively.

Conclusions.—

Appropriately trained deep learning image analysis algorithms demonstrate high specificity and high sensitivity sufficient for use in screening, quality assurance, and workload distribution in anatomic pathology.

More people receive a diagnosis of skin cancer each year in the United States than all other cancers combined.1  More than 18 million skin lesions are biopsied24  in the United States every year. The number of suspected skin lesions is growing because of an aging population as well as environmental and lifestyle factors.

Diagnosis in dermatopathology requires specialized training because of the large number of skin tumor subtypes and significant variability of visual representation within every morphologic class. Misdiagnosis5  and late diagnosis are the repercussions of high workloads and differentiation difficulty, resulting in frequent disagreement among pathologists, and affecting, for example, the recognition of melanoma versus the melanocytic nevi.6,7 

The rise in the adoption of digital pathology8  provides an opportunity for the use of computer vision deep learning methods to address the error rate associated with human medical image interpretation and to capture efficiency gains in terms of turnaround times and labor cost.9  Pathology laboratories can reap efficiency benefits not only from centralization10  but also from caseload triage in accordance with diagnostic difficulty and optimal distribution of pathologists’ workloads. The latter can reduce diagnostic turnaround times and labor costs associated with the diagnostic process. For example, when case types associated with higher diagnostic difficulty or rarity are interpreted initially by a dermatopathologist with a standard level of training who then shows it to a more experienced subspecialist, delays and effort duplication may occur. With an automated triage system in place straightforward cases can be directed to the general pathologists while helping to unburden senior dermatopathologists, eliminate bottlenecks, increase laboratory capacity, prevent burnout, improve turnaround times, and reduce the cost of the diagnostic process.

Coupled with the increase in detected cancer overall, the growing shortage of pathologists11  results in the need to develop automated, artificial intelligence (AI)–based tools to support pathology workflows, improve diagnostic accuracy, and to provide productivity benefits for dermatopathology labs.

Convolutional neural networks (CNNs) have proven capable of identifying diagnostically relevant patterns in pathology1215  while dealing with unique challenges. Among these obstacles is the large image size of microscopic images (up to 2 gigabytes per single image with a single focal plane depending on compression). Although recent progress in compression algorithms, processing speeds, and scanning speeds has been made, the high magnification and resolution of the images still require efficient analysis algorithms. Additionally, there is variation in image quality due to significant variability in staining16  and tissue preparation, scanning resolution of various devices, presence of artifacts, such as folds or ink markings, as well as a wide variety of visual patterns for each pathology.17  Nevertheless, deep learning–based methods have recently shown promise in whole slide image (WSI) classification,18  with most applications primarily dealing with binary classifications19  or classifications into broad categories20  and with less focus on the actual diagnosis of images via multiclass categorization of all morphologic variants of various diseases, which would be a necessity for a diagnostic support tool, quality control, or workload distribution among pathologists of different levels of expertise and compensation.

In this work, we present the performance of a supervised convolutional neural network–based algorithm pretrained to classify 40 morphologic diagnoses of the tumors of the skin (Table 1) on WSIs containing hematoxylin-eosin (H&E)–stained skin biopsies.

The WSI classification system used is presented in Figure 1. The system receives as an input a WSI and produces probability scores for each of the 40 classes on which it was trained. The training set consisted of punch, shave, and excisional skin biopsy cases from different body parts. Analysis of the WSI is performed in 2 stages—local and global. First, local image patches containing tissue are extracted from WSI at 2 magnifications, and each patch is classified by a convolutional neural network (via a dedicated CNN for each magnification level). This produces a local semantic probabilistic description of the WSI, where each 32 × 32 pixel square of the input WSI is represented by 2 vectors (1 for each magnification) of local probability scores for each tissue type class. This feature map, because of the multiple-magnification approach, provides a good representation of both fine and coarse image features.

At the second stage of classification, we used high probability values (local maxima) of the feature map described above as cues for possible tumor locations. At each such location we extracted 200 × 200 pixel patches, which were then classified by the third convolutional neural network. This step allowed us to identify global WSI features, because each 200 × 200 pixel patch in the feature map represents a 6400 × 6400 pixel patch in the original image.

At the second stage of classification all local maxima produce vectors of class probabilities. The classification of the entire WSI is then based on the aggregated average of these vectors.

Employing transfer learning for both stages of classification, we have used the ResNet architecture21  (specifically, ResNet34 for the first classification stage and ResNet18 for the second classification stage) pretrained on the ImageNet data set. For the first-stage classifiers, the fully convolutional version of ResNet was used, whereas the last, fully connected layer was replaced by the convolution, thus producing local class probabilities for each 32 × 32 pixel patch of the image.

During training of the first-stage classifiers, the tissue patches at 2 magnifications were randomly selected. Batches of 256 images and a 1:4 ratio of tumor to normal patches in each batch were kept constant. In order to account for color and cell size variations in tissue, noise was added to the extracted patches with a randomness factor varying between 0.9 and 1.1. Cross-entropy loss with label smoothing11  was used as the cost function.

Second-stage classifier was trained on additional annotated WSIs, each analyzed by the first-level classifiers to produce the feature maps. These feature maps were augmented by adding small Gaussian noise with the mean of zero.

H&E-stained glass slides for 300 punch, shave, and excisional biopsy cases of face, neck, back, and arms were selected for the study. Each case consisted of a single slide. Of these, 261 (87%) were received from Associated Laboratory Physicians (Harvey, Illinois) and belonged to White patients, whereas 27 (9%) were received from Muhimbili National Hospital (Dar es Salaam, Tanzania), and 12 (4%) were received from Butaro District Cancer Hospital (Butaro, Burera District, Rwanda), for a total of 39 (13%) belonging to Black patients to introduce skin color variability (Table 2). All cases had been diagnosed previously by light microscopy. Two senior board-certified pathologists uninvolved in algorithm development served as the primary evaluators. Both pathologists are medical directors of hospital pathology laboratories in the Chicago, Illinois, area. The 300 cases were randomly divided into 2 sets, and each of the 2 pathologists-evaluators blindly reassessed the scanned images of the 150 cases.

The slides had notable variability in quality of staining, histology, and contained artifacts, such as folds and ink markings representative of real-world pathology slides (Figure 2). The cases in the validation set had the following diagnoses confirmed by board-certified pathologists via light microscopy: 102 with basal cell carcinoma (BCC; 34% of 300) belonging to White patients; 59 with nodular melanoma (20% of 300), including 33 cases (11% of 300) belonging to Black patients and 26 cases (9% of 300) belonging to White patients; 17 with lentigo maligna melanoma (6% of 300), including 6 cases from Black patients (2% of 300) and 11 from White patients (4% of 300); 15 with superficial spreading melanoma (5% of 300) from White patients; 10 with dysplastic nevus (3% of 300) from White patients; 79 with intradermal nevus (26% of 300) from White patients; and 18 with compound nevus (6% of 300) from White patients.

H&E-stained glass slides were scanned using a high-resolution Motic Digital Pathology EasyScan Pro6 scanner at 0.26 μm per pixel and the Ventana iScan Coreo scanner, both set at ×40 magnification. The scanned WSIs were then independently interpreted by the Mechanomind algorithm at the University of Chicago (Chicago, Illinois), Ingalls Memorial Hospital medical campus (Harvey, Illinois), and classified into 1 of 3 diagnostic classes: BCC, melanoma, and nevus (Figure 3). BCC was selected for the high incidence among malignant cases and to address the recognition of keratinocytic tumors versus the melanocytic ones. Melanoma was chosen as the most severe common skin cancer type. The nevus category was selected for the high incidence overall as well as for the need to address the recognition of melanoma versus the melanocytic nevi, because this topic may sometimes lead to disagreements among pathologists. No patient history, and no clinical or gross examination information was used in the process, with only the de-identified WSI being presented to the software algorithm. Batches of 10 images were processed in up to 2 minutes per batch. One of the evaluators compared the algorithm’s results with the confirmed diagnoses, classifying each case as a true positive, true negative, false positive, or a false negative for each of the 3 classes.

Compared with human-rendered diagnoses, the image recognition algorithm identified 89 of the 91 melanoma cases (97.8% sensitivity), 107 of the 107 nevus cases (100% sensitivity), and 101 of the 102 BCC cases (99% sensitivity). The algorithm also identified 204 of the 209 cases as not containing melanoma (97.6% specificity), 189 of 193 cases as not containing a nevus (97.9% specificity), and 198 of 198 cases as not containing BCC (100% specificity), as seen in Table 3. One of the 2 cases of melanoma that the algorithm did not recognize contained desmoplastic melanoma (Figure 4, A), which was absent in the training set, and the other contained superficial spreading melanoma (Figure 4, B) with atypical presentation of small standalone clusters of cancer. The 1 case of BCC not recognized by the algorithm is shown in Figure 4, C. One of the cases initially misdiagnosed by the primary pathologist as a melanoma and even initially overlooked by the evaluator during validation was correctly recognized by the algorithm as a nevus, as later confirmed by both evaluators during the final review of results by the evaluating pathologists.

Although additional studies are needed to demonstrate diagnostic accuracy for rare entities, deep learning techniques applied to microscopic imaging have the potential to solve the shortage of pathologists worldwide, improve diagnostic accuracy and turnaround times, raise the standards of care, reduce healthcare costs, deploy diagnostic expertise to underserved locations, and improve access to quality care.

1.
American Cancer Society
.
Cancer Facts & Figures 2020
.
Atlanta, GA
:
American Cancer Society
;
2020
.
2.
Rogers
HW,
Weinstock
MA,
Feldman
SR,
Coldiron
BM
.
Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the US population, 2012
.
JAMA Dermatol
.
2015
;
151
(10)
:
1081
1086
.
3.
Siegel
RL,
Miller
KD,
Jemal
A
.
Cancer statistics, 2019
.
CA Cancer J Clin
.
2019
;
69
(1)
:
7
34
.
4.
Lott
JP,
Boudreau
DM,
Barnhill
RL,
et al
Population-based analysis of histologically confirmed melanocytic proliferations using natural language processing
.
JAMA Dermatol
.
2018
;
154
(1)
:
24
29
.
5.
Olhoffer
IH,
Lazova
R,
Leffell
DJ
.
Histopathologic misdiagnoses and their clinical consequences
.
Arch Dermatol
.
2002
;
138
(10)
:
1381
1383
.
6.
Elmore
JG,
Barnhill
RL,
Elder
DE,
et al
Pathologists’ diagnosis of invasive melanoma and melanocytic proliferations: observer accuracy and reproducibility study
.
BMJ
.
2017
;
357
:
j2813
.
7.
Lodha
S,
Saggar
S,
Celebi
JT,
Silvers
DN
.
Discordance in the histopathologic diagnosis of difficult melanocytic neoplasms in the clinical setting
.
J Cutan Pathol
.
2008
;
35
(4)
:
349
352
.
8.
Al-Janabi
S,
Huisman
A,
Van Diest
PJ
.
Digital pathology: current status and future perspectives
.
Histopathology
.
2012
;
61
(1)
:
1
9
.
9.
Litjens
G,
Sánchez
CI,
Timofeeva
N,
et al
Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis
.
Sci Rep
.
2016
;
6
:
26286
.
10.
Ho
J,
Ahlers
SM,
Stratman
C,
et al
Can digital pathology result in cost savings?: a financial projection for digital pathology implementation at a large integrated health care organization
.
J Pathol Inform
.
2014
;
5
(1)
:
33
.
11.
Szegedy
C,
Vanhoucke
V,
Ioffe,
S,
Shlens,
J,
Wojna
Z
.
Rethinking the inception architecture for computer vision: proceedings of the IEEE conference on computer vision and pattern recognition
.
2016
;
2818
2826
.
12.
Olsen
TG,
Jackson
BH,
Feeser
TA,
et al
Diagnostic performance of deep learning algorithms applied to three common diagnoses in dermatopathology
.
J Pathol Inform
.
2018
;
9
:
32
.
13.
Campanella
G,
Hanna
MG,
Geneslaw
L,
et al
Clinical-grade computational pathology using weakly supervised deep learning on whole slide images
.
Nat Med
.
2019
;
25
(8)
:
1301
1309
.
14.
Li
J,
Li
W,
Gertych
A,
Knudsen
BS,
Speier
W,
Arnold
CW
.
An attention-based multi-resolution model for prostate whole slide image classification and localization
.
Preprint. Posted online 5 30, 2019. ArXiv. https://doi.org/10.48550/arXiv.1905.13208
15.
Ardila
D,
Kiraly
AP,
Bharadwaj
S,
et al
End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography
.
Nat Med
.
2019
;
25
(6)
:
954
961
.
16.
Tellez
D,
Litjens
G,
Bándi
P,
et al
Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology
.
Med Image Anal
.
2019
;
58
:
101544
.
17.
Ali
S,
May
CV,
Verrill
C,
Rittscher
J
.
Ink removal from histopathology whole slide images by combining classification, detection and image generation models: 2019; IEEE International Symposium on Biomedical Imaging (ISBI)
.
2019
:
928
932
.
18.
Ing
N,
Tomczak
JM,
Miller
E,
et al
A deep multiple instance model to predict prostate cancer metastasis from nuclear morphology
. In:
Conference on Medical Imaging with Deep Learning
.
Amsterdam, The Netherlands
:
2018
.
19.
Bulten
W,
Pinckaers
H,
van Boven
H,
et al
Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study
.
Lancet Oncol
.
2020
;
21
(2)
:
233
241
.
20.
Ianni
JD,
Soans
RE,
Sankarapandian
S,
et al
Tailored for real-world: a whole slide image classification system validated on uncurated multi-site data emulating the prospective pathology workload
.
Sci Rep
.
2020
;
10
(1)
:
3217
.
21.
He
K,
Xiangyu
Z,
Shaoqing
R,
Jian
S
.
Deep residual learning for image recognition
. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); June 27–30, 2016; Las Vegas
.
2016
;
770
778
.

Competing Interests

Brodsky is on the advisory board and has stock options at Mechanomind Inc. Levine, Polak, and Chervony are employees of and own shares of Mechanomind Inc. The other authors have no relevant financial interest in the products or companies described in this article.