Pathologists may encounter extraneous pieces of tissue (tissue floaters) on glass slides because of specimen cross-contamination. Troubleshooting this problem, including performing molecular tests for tissue identification if available, is time consuming and often does not satisfactorily resolve the problem.
To demonstrate the feasibility of using an image search tool to resolve the tissue floater conundrum.
A glass slide was produced containing 2 separate hematoxylin and eosin (H&E)-stained tissue floaters. This fabricated slide was digitized along with the 2 slides containing the original tumors used to create these floaters. These slides were then embedded into a dataset of 2325 whole slide images comprising a wide variety of H&E stained diagnostic entities. Digital slides were broken up into patches and the patch features converted into barcodes for indexing and easy retrieval. A deep learning-based image search tool was employed to extract features from patches via barcodes, hence enabling image matching to each tissue floater.
There was a very high likelihood of finding a correct tumor match for the queried tissue floater when searching the digital database. Search results repeatedly yielded a correct match within the top 3 retrieved images. The retrieval accuracy improved when greater proportions of the floater were selected. The time to run a search was completed within several milliseconds.
Using an image search tool offers pathologists an additional method to rapidly resolve the tissue floater conundrum, especially for those laboratories that have transitioned to going fully digital for primary diagnosis.
In routine clinical practice, pathologists may encounter extraneous pieces of tissue on glass slides that could be because of contamination from other specimens. These are typically called “tissue floaters.” A prior review analyzing this quality issue in 275 laboratories reported approximately 3% of their pathology slides could potentially contain such tissue contaminants.1 Tissue contamination can occur at various steps in anatomic pathology, including grossing or downstream in the histology laboratory during tissue processing, slide preparation, or with staining. Common causes of this problem appear to be because of the pickup of floaters in the water bath during sectioning,2 or during staining of slides when deparaffinized tissue can easily fragment.3 The dilemma pathologists often face is whether such a tissue floater truly belongs to the case in question, or if instead it represents a true contaminant from another patient's sample in which case it should be ignored. This poses a major conundrum when the tissue floater contains malignant cells, and is unlikely to be derived from an obviously different organ or tumor.
There are currently several measures a pathologist can employ to troubleshoot the tissue floater problem. They could ask their colleagues on clinical service if they have recently encountered a case that morphologically resembles the tissue floater (eg, by emailing them a photo of the tissue floater). They could also physically inspect the tissue block (if available) and/or order a recut section to see if this subsequent slide harbors this floater in the same position on the slide, which may help pinpoint where potential cross-contamination occurred. Akin to forensic analysis, some laboratories have implemented molecular techniques (eg, DNA fingerprinting for tissue identity testing) to try resolve this problem by dissecting, testing, and then comparing the molecular results of the tissue floater to the adjacent patient sample on the glass slide.4–8
With increasing global adoption of whole slide imaging (WSI) in pathology departments, it is now feasible to exploit digital pathology to resolve the tissue floater conundrum. WSI refers to the digitization of glass slides, using a slide scanner, in order to generate corresponding digital slides that can be remotely viewed on a computer monitor.9,10 Several pathology laboratories have already transitioned to prospectively digitizing most of their slides in order for pathologists to render a primary diagnoses, without having to examine the original glass slides.11,12 In such a digital environment, the proposed digital solution when a tissue floater is encountered would be to annotate (ie, select) the suspected tissue floater region of interest in the image, and to then search their archived database of recently scanned slides for a matching digital slide that is the probable source of this specimen processing error. Indeed, several publications have shown researchers have been able to develop sophisticated machine and/or deep learning algorithms that can suitably classify histopathological images and perform content-based image retrieval (CBIR).13–20 CBIR is the application of computer vision methods to search for digital images in a large database.21,22
The aim of this study was to demonstrate the feasibility of using a dedicated histopathology image search tool in order to resolve the tissue floater conundrum. This article reports the success of such an image search algorithm developed to match and retrieve similar digital slides or regions of interest using pixels.
MATERIALS AND METHODS
Institutional review board approval was obtained for this study (University of Pittsburgh, Pennsylvania, STUDY18100084, PRO17120392; University of Witwatersrand, Johannesburg, South Africa clearance certificate M191003).
Tissue Floater Slide Creation
For the purposes of this experiment, 3 glass slides were prepared using freshly discarded adult human tumor tissue (Figure 1). Fabricated tissue floater slides were artificially created that contained (1) a large central portion of renal cell carcinoma obtained from a type 1 (basophilic) papillary renal cell carcinoma, and (2) placed toward the edge of the slide 2 separate additional smaller portions of tissue (“tissue floaters”) obtained from a moderately differentiated colon adenocarcinoma and a urinary bladder high-grade papillary urothelial carcinoma. To avoid having these slides stand out as extraordinary in the study the pathology cases selected were relatively easy to diagnose and typical of what would be encountered in routine practice. Also, the prepared slides were stained with hematoxylin and eosin (H&E) according to routine staining protocols. All 3 slides were then entirely digitized at ×40 magnification using an Aperio AT2 whole slide scanner (Leica Biosystems). The quality of these digital slides was checked to avoid inclusion of unique identifiers and/or artifacts.
Fabricated slide containing a section of renal cell carcinoma (A) and 2 adjacent separate colon cancer (B) and bladder cancer (C) tissue floaters (hematoxylin and eosin stain, insets shown at ×20 magnification).
Fabricated slide containing a section of renal cell carcinoma (A) and 2 adjacent separate colon cancer (B) and bladder cancer (C) tissue floaters (hematoxylin and eosin stain, insets shown at ×20 magnification).
Pathology Digital Slide Datasets
The aforementioned WSIs were embedded into 2 datasets of digital slides. The first dataset was established by randomly selecting 300 de-identified WSIs (.svs file format) of H&E-stained surgical pathology cases from the teaching files at the University of Pittsburgh Medical Center, Pittsburgh, PA (“UPMC dataset”). These archival slides were scanned at ×40 magnification using an Aperio ScanScope XT instrument (Leica Biosystems). These WSIs included cases from a wide variety of anatomic sites (eg, colon, brain, thyroid, prostate, breast, kidney, salivary gland, skin, soft tissue, etc) exhibiting varied diagnostic pathologic entities (ie, reactive, inflammatory, benign neoplasms, and malignancies). The other dataset employed in this study was obtained from the publicly available digital pathology slide archive offered by The Cancer Genome Atlas (TCGA) program (https://portal.gdc.cancer.gov). A total of 2025 WSIs were randomly selected and downloaded from the TCGA (“TCGA dataset”). An average WSI was approximately 45 000 × 45 000 pixels. Digital slides of low quality (eg, very poor staining, low resolution, large regions out of focus) were eliminated; magnifications less than ×20 and blurry patches understandably prevent the image search to perform well and poor staining negatively affects extraction of deep features. The TCGA dataset incorporated at least 33 different diagnostic entities from 25 anatomic locations. TCGA slides of frozen sections and those slides with manual annotations (pen markings) present were included. All digital slides were labeled with both the type of malignancy (primary diagnosis) and the affected organ (primary site). This label was assigned to the entire WSI and no individual region was delineated. Table 1 shows the top 20 primary sites with the highest number of WSIs in the combined dataset (ie, UPMC + TCGA datasets).
Computing Platform
All experiments were performed on a Dell EdgeServer Ra with 2x Intel(R) Xeon(R) Gold 5118 (12 cores, 2.30 GHz), 4x Telsa V100 (v-RAM 32 GB each; only 2 graphics processing units [GPUs] were used), and 394-GB random access memory. The code for indexing was written in C/C++. The user interface components were written in multiple languages, but mostly in Python and JavaScript. The high-end GPU power was necessary for indexing the large archives, which was a 1-time task for existing repositories. For daily usage of image search ordinary (low-cost) central processing unit/GPU power will suffice as the barcodes (see below) enable efficient search in large archives.
Image Search Tool
Through an ensemble approach (using a cohort of different algorithms) a reliable search engine prototype was developed that exploited the strengths of both supervised (trained deep networks) and unsupervised (clustering and search) computational methods for image processing.23 This image search tool thus included segmentation and clustering algorithms, deep networks, and distance metrics for search and retrieval. Whereas deep networks are supervised methods and require extensive training with labeled data, the search itself is unsupervised with no prior training. Using a pretrained deep network without fine-tuning does not constitute direct supervision because it is used for feature extraction without any adjustment. This allows our approach to be independent of manually delineated WSIs. We used DenseNet-121, which is publicly available. Image segmentation was used to distinguish tissue from white background. WSIs were broken into patches or tiles at fixed sizes (eg, 500 × 500 μm2 at ×20) with no overlap. The patches were grouped into categories via clustering methods (eg, k-means algorithm) and passed through pretrained artificial neural networks for feature mining. Each feature vector was converted into a linear barcode (Figure 2) and this “bunch of barcodes” indexing process was used to accelerate the search retrieval process.24 The barcode generation only contained binarization of gradient change of deep features. Multiple similarity measures were examined to further increase the matching rate when comparing images. Retrieved image patches were ranked from most to least likelihood of being similar to the queried image (Figure 3) and these results were displayed in a gallery format for the end user to review and interpret. Rank (defined as the rating of a suitable match) was determined by the similarity of the suspected patch with all other patches in the archive, measured through distance calculation between their corresponding barcodes. The more similar (ie, best ranked result) the patch is the smaller is the difference between barcodes. All matched patches were sorted based on this difference (least different is ranked first and so on).
Schematic illustration of the general idea of using barcodes for image representation: whole slide image indexed by converting separate patches into barcodes.
Schematic illustration of the general idea of using barcodes for image representation: whole slide image indexed by converting separate patches into barcodes.
Schematic diagram showing how the origin of a suspected floater gets detected. The process starts with locating the suspicious tissue fragment. A selected patch from the fragment is then fed into a pretrained deep network to extract features. The search engine then receives a generated barcode to search within the “Yottixel Index” that contains barcodes of many patches of many whole slide images (WSIs). Finally, the origin of the floater is recognized by investigating the top ranked patches.
Schematic diagram showing how the origin of a suspected floater gets detected. The process starts with locating the suspicious tissue fragment. A selected patch from the fragment is then fed into a pretrained deep network to extract features. The search engine then receives a generated barcode to search within the “Yottixel Index” that contains barcodes of many patches of many whole slide images (WSIs). Finally, the origin of the floater is recognized by investigating the top ranked patches.
Any tissue fragment can potentially be selected by the end user (pathologist) for searching the archive. As such, a search is manually triggered by the pathologist. The smallest patch size that can be indexed and searched is 500 × 500 μm (∼1000 × 1000 pixels at ×20); any floater smaller than that may not be detectable.
Search Tool Evaluation
After the aforementioned 3 fabricated slides were scanned, indexed, and mixed among millions of image patches from the 2 datasets, the number of patches was reduced to approximately 16 000 patches through clustering (empirically set to 9 groups), whereas only 5% of each cluster was selected to represent a WSI. The search tool was then used to try and identify the matching slide that belonged to each tissue floater (Figure 4). The search was conducted using variable percentages of floater sampling (ie, 5%–100% of tissue floater region selected). The detection accuracy was measured by running each sample 100 times (manually and by automation) and calculating the median rank of a correct detection among search results, as well as the best and the worst rank of the detected floater among the search results using a 95% CI.
(A) Indexing of a sample whole slide imaging (scan with bladder tumor) (B) yielding 33 patches to build a mosaic. (C) Corresponding barcodes of the mosaic can be generated using a MinMax algorithm. The 3 barcodes that match the highlighted patches are shown.
(A) Indexing of a sample whole slide imaging (scan with bladder tumor) (B) yielding 33 patches to build a mosaic. (C) Corresponding barcodes of the mosaic can be generated using a MinMax algorithm. The 3 barcodes that match the highlighted patches are shown.
RESULTS
Initial Algorithm Validation
The image search platform was qualitatively tested using the 300 UPMC WSIs. For this pilot experiment, 100 sample regions were randomly selected in order to demonstrate the search tool's capability. The retrieval results were evaluated by a pathologist (LP) and converted into an accuracy value. Table 2 shows the accuracy in predicting the correct query image using only the top 3 first search results (ie, the best 3 matches).
Image Search Results
The image search results for matching tissue floaters when using the UPMC 300 WSI pilot dataset showed that the median rank best result for both the bladder and colon tumor was 1 (95% CI = 1) when selecting 5% up to 100% of the floater region. Table 3 shows search results when trying to match images to tissue floaters among all 2325 digital slides (ie, 300 UPMC + 2025 National Cancer Institute). Table 4 shows the retrieval accuracy for matching image patches to both tissue floaters when a manual search was performed. The retrieval accuracy (ie, successful rank) of finding a correct match when undertaking a search greatly increased when more of the tissue floater region was selected. Searches were computationally the most expensive to perform when selecting 100% of the tissue floater. The time to perform a search measured was very short and never more than several milliseconds.
Image Search Results That Match Tissue Floaters (UPMC + NCI 2325 Whole Slide Imaging Dataset)

DISCUSSION
Tissue floaters because of cross-contamination between specimens are a noteworthy cause of diagnostic error in anatomic pathology.2 Such mishaps can potentially lead to a serious misdiagnosis. Therefore, apart from implementing a quality assurance program with process improvement in order to mitigate this diagnostic error it is also important that pathologists can adequately resolve this dilemma when encountered in clinical practice. If a tissue block contains the misidentified piece, deeper levels may help verify tissue contamination occurred at the grossing bench or when embedding the paraffin block. However, performing a root cause analysis that may involve ordering recuts is time consuming and often does not satisfactorily resolve the problem. Moreover, molecular testing for tissue identification may not be readily available for many pathology laboratories, and the results could be missed as DNA may be altered by staining or other tissue processing steps. The image search tool demonstrated in our study offers pathologists an additional method to resolve the tissue floater conundrum, especially for those laboratories that have transitioned to using WSI for routine diagnostic work.
The accuracy of the search tool to retrieve matching image patches in our study improved with each iteration of the algorithm. The accuracy when searching for a match to the bladder tumor floater improved from 80% to 93% initially up to 95% and for the colon tissue floater it improved from 66% to 73% to 90% when performing manual searches. After the algorithm was fully developed, we witnessed a near perfect match for top-ranked retrieved image patches. These data further show that when a greater proportion of the tissue floater is selected the search is more apt to almost always yield a useful match with just the first ranked retrieved image. With adequate computing capability the time to run a search was only several milliseconds, which is far quicker than the alternate methods (eg, DNA fingerprinting) proposed for troubleshooting a slide that includes a tissue floater. Once a digital pathology system with this proposed artificial intelligence tool is established it may also be cheaper to workup tissue floaters this way rather than with molecular testing. It is unclear if the superior retrieval results noted with the bladder tumor search compared with the colon tumor floater were because of the fact that the database contained fewer overall bladder cases (n = 43) than colorectal digital slides (n = 123). To the best of our knowledge, we are not aware of a similar published study using image search to resolve tissue floaters on pathology slides.
The digital pathology solution used in this study applied image classification and CBIR (or query by image content) technology. Such technology relies on a variety of techniques including statistics, pattern recognition, traditional computer vision, and deep learning algorithms.21,25–27 With CBIR the search tool uses the contents (eg, features such as color, shape, texture, spatial relationship, etc) within each pixel of the image rather than using annotations or metadata (eg, tags or diagnostic labels associated with the images). This accordingly allows relevant images to be automatically retrieved from a large database that match the contents of the queried image. For very large databases it is impractical to have humans manually annotate each image. To address this challenge, we employed unsupervised learning and deep neural networks when developing the image search tool. We also minimized the demand for expensive computer equipment when running a search query on large WSI files by converting individual image patches into barcodes. The barcodes allowed the WSIs in this study to be completely indexed, precisely recognized, and easily retrieved.
Google's well-known image search tool, called Reverse Image Search (https://images.google.com), allows users to search websites with a picture for similar images. This search tool can also be used to decipher medical images.28 Hanna and Pantanowitz29 previously tested Google's freely available Reverse Image Search tool for diagnostic surgical pathology. They found when employing this online tool to find related images on the web using histopathology images alone as input their search results were unreliable to make a histopathology diagnosis. However, after adding the organ and term “cancer” to their image search they greatly increased the diagnostic specificity of their search results (eg, up to 83% for certain neoplasms). Researchers from Google have since refined their image search capability and developed a deep learning-based reverse image search tool specifically for histopathology images called Similar Image Search for Histopathology (SMILY).30 The authors of this article demonstrated that SMILY was able to retrieve images with similar organ site, histologic features, and cancer grade compared with the original query image. SMILY has great potential for diagnosis, research, and education. If it becomes freely available, perhaps it can also be validated for working-up cases with tissue floaters.
As pathology departments continually scan their glass slides, they will begin to amass large digital databases. If these WSIs are appropriately stored, indexed, and linked with corresponding metadata (eg, pathology reports, annotations, patient outcomes) these archives will become increasingly valuable in the emerging era of artificial intelligence.31,32 This will allow pathology laboratories to not only exploit artificial intelligence–based image search, but additionally leverage their “Big Data” for other computational pathology applications.31,33 Further work is underway to clinically validate our image search tool to deal with actual tissue floaters in a clinical work environment and to enhance the user interface to make it easy for pathologists to employ in routine practice.
The results in this study are partly based upon WSI offered by the TCGA Research Network (https://www.cancer.gov/tcga). We thank the company Huron Digital Pathology for their financial support of this project, the Ontario government for ORF-RE funds, and the federal Canadian government for an NSERC-CRD Grant to support this project.
References
Author notes
Huron Digital Pathology provided financial support for this project, the Ontario government provided ORF-RE funds, and the federal Canadian government provided a NSERC-CRD Grant to support this project.
Competing Interests
Pantanowitz is on the medical advisory board for Leica, Ibex, and Hologic and consults for Hamamatsu. Tizhoosh is on the advisory board for Huron Digital Pathology. Kalra has an industrial internship at Huron Digital Pathology. The other authors have no relevant financial interest in the products or companies described in this article.