Artificial intelligence is a transforming technology for anatomic pathology. Involvement within the workforce will foster support for algorithm development and implementation.
To develop a supportive ecosystem that enables pathologists with variable expertise in artificial intelligence to create algorithms in a development environment with seamless transition to a production environment.
(1) approachable and intuitive user interface, (2) diverse algorithmic modeling options, (3) support capability for internal and external collaborations, (4) a seamless mechanism for discovery to clinical deployment transition, (5) the ability to meet minimum institutional requirements for information technology (IT) review, and (6) the ability to be scalable over time. The ecosystem required platform education, data science guidance, project management structure, and ongoing leadership.
The development team considered internal development and vended solutions. Because of the extended timeline and resource requirements for internal development, a decision was made to use a vended solution. Vendor proposals were solicited and reviewed by pathologists, IT, and security groups. A vendor was selected and pipelines for development and production were established. Proposals for development were solicited from the pathology department. Eighty-four investigators were selected for the initial cohort, receiving training and access to dedicated subject matter experts. A total of 30 of 31 projects progressed through the model development process of annotating, training, and validation. Based on these projects, 15 abstracts were submitted to national meetings.
Democratizing artificial intelligence by creating an ecosystem to support pathologists with varying levels of expertise can break down entry barriers, reduce overall cost of algorithm development, improve algorithm quality, and enhance the speed of adoption.
Anatomic pathologists integrate their observations of tissues with available data to diagnose diseases, predict outcomes, and suggest therapeutic options. To improve their ability to perform these functions, pathologists continuously incorporate evolving technology into their workflows. Since the beginning of the review of gross specimens, there have been 5 major technologic developments: light microscopy, special stains, electron microscopy, immunohistochemistry, and molecular diagnostics. Pathologists are now beginning to integrate the sixth major development, artificial intelligence (AI). Although enabling, the development of whole slide scanning technology is not independently revolutionary. Digital pathology dramatically changes the workflow of the laboratory, but the integration of AI into anatomic pathology will enhance the ability of pathologists to diagnose, predict outcomes, and suggest therapeutic options.1–4
Machine learning and neural networks have been investigated for many decades. The breakthrough in image recognition occurred with the demonstration by Krizhevsky and colleagues5 of the convolutional neural network AlexNet in the 2012 ImageNet Large Scale Visual Recognition Challenge. There had been slow progress in improving image recognition, but with AlexNet the investigators decreased the error rate by more than 50%. The pathology community and the medical community did not significantly notice the revolution that was starting in the imaging community until LeCun and colleagues6 published their review article on deep learning in 2015 in Nature.
Pathology departments will want to implement AI into their laboratories in the coming years. There are 3 major approaches to engagement: developing algorithms within the institution, collaborating with partners to develop algorithms, or purchasing best-of-class vended products. These options are not mutually exclusive. Many institutions will want to implement all 3. This paper focuses on the first, internal development. There will always be investigators who are interested in technology and will be early adopters. Many of our pathologists are subject matter experts and interested in developing AI algorithms but not in learning the fine details of the linear algebra underlying the many machine learning and AI algorithms.7 We aimed to develop an ecosystem that permits everyone to become involved in developing AI algorithms in anatomic pathology.
MATERIALS AND METHODS
Platform Choice and Infrastructure Development
Is There Departmental Interest in Doing AI Projects?
The effort to develop an ecosystem for AI development is both expensive and time-consuming. Prior to embarking, it seemed prudent to query the staff of the Department of Laboratory Medicine and Pathology (DLMP) to gauge their participation interest. A single email was sent to the attending staff requesting project ideas. The email also communicated that project ideas would not be disseminated to leadership or other faculty members. The reason for this decision was to provide pathologists a secure pathway to relay ideas they were personally interested in or invested in performing. Slightly more than 200 project ideas were proposed, representing a substantial interest in AI research participation.
How to Create a Platform for AI Development
To determine if the platform should be a vended product or in-house–developed effort, a small committee consisting of DLMP project management, pathologists with AI development experience, and IT professionals was convened. The committee conducted a brief survey of vendors and determined high-level requirements for in-house development. Investigation determined that in-house development would exceed the scheduled project timeline, require personnel resources that were unavailable, and replicate resources currently available in vended products.
Development of a Request for Proposals for Vended Products
A formal request for proposal (RFP) was developed for dissemination to potential vendors. Six key criteria were defined in the RFP. (1) Approachable and intuitive: The platform was designed for investigators with a range of technical expertise and interest. It was important that entry-level investigators could easily use the product. (2) Variety of modeling options: The program was going to be used for a large variety of experiments, and therefore it needed to be a flexible system that could accommodate many different approaches. The modeling capabilities needed to include rare event detection, segmentation, individual feature and spatial quantitation, classification, and prognostication. Evidence of prior successful model development from many different researchers with different approaches and end points must have been demonstrated. (3) Internal and external collaboration: The investigators may want/need to collaborate with colleagues within or outside the institution. The platform needed to permit validated users outside the firewall to access the program. (4) The platform must meet the minimum requirements of the IT and security groups. (5) Seamless transition from discovery to clinical deployment: There needed to be a pathway for successful algorithms to be implemented in the clinical environment. It would be very frustrating for an investigator to develop a great algorithm and not have an opportunity to use it in practice. (6) Platform scalability over time: Although the pilot would start with a limited number of users, it was important that the platform be able to scale to a more significant number of users if the project was successful. We wanted to plan for success with the anticipation that creating a community of users within the department would create demand for more resources, and the platform should accommodate this expansion.
Evaluation of Vended Products
The responses to the RFP from vendors were evaluated by 4 teams. One of the key criteria for acceptance was for the platform to support both the discovery, that is, the development of algorithms, and deployment in the clinical workflow. Because these 2 functions have different concerns, separate teams evaluated the user experiences for discovery and deployment. For the discovery user evaluations, the vendors established “sandboxes” where users from the department could evaluate the programs using their own images. The users’ group was chosen to include people with varying levels of prior experience with AI development, from novice to expert, and from different practice areas. The IT and information security teams assessed how easily a vendor’s solution integrated into our cloud platform and what vulnerabilities existed. After a review of the evaluations, a vendor, Aiforia Technologies Plc (Helsinki, Finland), was selected.
Infrastructure
Through a team effort, vendor and institutional IT teams installed the platform on the institutional cloud environment. Two environments were created: Aiforia Create and Aiforia Clinical Suite. Aiforia Create is a discovery environment used for algorithm development, and Aiforia Clinical Suite is the production environment for clinical applications. The initial clinical application deployed was the Ki-67 algorithm for breast cancer that is a product from the vendor.
Implementation and Education
Selecting Projects for the Initial Wave
An evaluation process was developed for selecting the first wave of projects. Three primary criteria were used to assess the proposal submissions. The first was the proposal’s ability to align with internal organizational values, including an algorithm’s practical utility operationally, its ability to be developed and deployed into practice quickly, the potential for an algorithm’s repeatability and application to other areas that facilitate workflow prioritization, and that it has a positive impact on patient care or society. The quality of the proposal was the second baseline criterion, assessing whether the potential individual had the expertise/knowledge and resources to pursue the effort and whether the plan was well reasoned, organized, feasible, and based on a sound rationale, making it likely to succeed. The last criterion, alignment with internal strategy, looked at a proposal’s alignment with institutional and department strategy, whether it stratifies AI portfolio risk, whether it provides a competitive advantage, whether it is currently not commercially available, and its degree of novelty. A request for proposals was distributed to the DLMP. Proposals were evaluated and awarded by a panel composed of members with expertise in innovation, pathology, and AI.
Accepted Projects
After awards were completed, the principal investigator for each proposal met with the project manager, an AI platform expert, and a data scientist. The research plan was reviewed, and suggestions were made for optimizing the project. Milestones were discussed and agreed upon. The principal investigator selected slides for their data set and provided them to the program for scanning on a Leica GT450.
Education for Investigators
An education committee was formed with representation from Aiforia, DLMP, and the institutional education group. This committee developed and approved a training plan for the initial onboarding of investigators as well as ongoing training support. Aiforia developed onboarding curriculum, a 3-session training series, and facilitated each lecture in the series multiple times on different days to accommodate different schedules. The program provided a training set of whole slide images for hands-on learning assignments given after each session. The lectures were recorded and archived for asynchronous viewing.
Aiforia Resources for Investigators
Aiforia provided dedicated training instructors who held regularly scheduled virtual office hours. These open office hours provided a relaxed atmosphere for investigators to ask questions, receive guidance in developing their models, discuss and resolve barriers, and learn from the work being done by other investigators. The training instructors were also available for one-on-one tutorial sessions with investigators upon their request. In addition to the full-length lecture recordings, Aiforia created a series of short videos to illustrate specific common tasks for quick and easy consumption. An institutional community website was created, exclusive to program investigators, providing a central location to house educational videos, frequently asked questions and answers, a forum for exchanging ideas, and a ticketing system for investigators to submit questions and receive timely responses by email.
Institutional Resources for Investigators
The institution provided a cytotechnologist, with extensive experience in AI, to be an Aiforia Create superuser. This superuser was available to guide and support investigators in model development activities, such as how to effectively annotate images. The institution provided a data scientist to assist investigators with design and data interpretation. A DLMP project manager provided logistic support, communication management, risk and problem resolution, and metric tracking. All incoming community tickets regarding the institutional cloud were routed to the information technology group for resolution.
Institutional Review Board Approval
All projects were approved by the enterprise Mayo Clinic Institutional Review Board (Rochester, Minnesota).
RESULTS
First Cohort (2022–2023)
A total of 45 applications were received from teams comprising 107 DLMP members. A total of 31 proposals were accepted, representing 84 unique institutional users. Table 1 summarizes some of the details regarding the first cohort. The enterprise is spread across 3 states. There were strong advocates in 2 of the states, which resulted in only 1 accepted proposal from the third state. The gender distribution of principal investigators of accepted projects was 42% (13 of 31) female and 58% (18 of 31) male. Table 2 illustrates that the projects covered many different areas and represented 11 different subspecialties. The largest number of projects were in hematopathology, pulmonary, breast, and gastrointestinal pathology. Table 3 shows the distribution of machine learning algorithms. The most common approaches were segmentation, classification, and rare event detection.
All users completed training, and every proposal team had its data set digitized. A total of 30 of 31 projects progressed through the model development process of annotating, training, and validation. The training sessions, office hours, working one-on-one sessions, one-on-one consultations, and project management support have been praised by the participants as contributing to their success. The time from the start of the program to the end of the first cohort was 1 year. The vast majority of the participants had never participated in an AI project. One metric of success is that 15 abstracts were submitted based on these projects, 13 to the United States and Canadian Academy of Pathology 2024 annual meeting and 2 to specialty societies.
Second Cohort (2023–2024)
Table 1 shows some demographic details from the second cohort that is just beginning. There were approximately the same number of applications. The number of accepted proposals was smaller. There was a more balanced distribution of awarded projects among the 3 sites because of the success and interest generated by first cohort.
Example Pilot Project
The following is a summary of the initial findings from 1 project that developed a deep learning algorithm to predict the risk of endometrioid adenocarcinoma after atypical endometrial biopsy. There are approximately 60 000 new cases of endometrial carcinoma each year in the United States. Endometrioid adenocarcinoma is almost always associated with a precursor lesion, atypical endometrial hyperplasia. There is significant interobserver variation in the interpretation of endometrial biopsies that ranges from benign to malignant. The goal of the project was to develop an algorithm to predict the risk of concurrent and future endometrial endometrioid adenocarcinomas from biopsies showing atypical endometrial hyperplasia. The case selection included 122 patients with both an endometrial biopsy and subsequent hysterectomy. A total of 14 cases were used for training and 82 cases were used for validation. The initial biopsies were used in this study. The algorithmic approach contained 3 layers. The first layer identified endometrium versus other tissue and the second layer identified crowded glands in the endometrium (Figure). Finally, the algorithm was trained to differentiate crowded glands that were associated with carcinoma in the subsequent hysterectomy versus crowded glands that were associated with atypical endometrial hyperplasia. Thus far, the algorithm has a positive predictive value of 91% ((32/(32 + 3)); (TP/(TP + FP))), negative predictive value of 57% ((27/(20 + 27)); (TN/(FN + TN))), sensitivity of 62% ((32/(32 + 20)); (TP/(TP + FN))), and specificity of 90% ((27/(3 + 27)); (TN/(FP + TN))), where TP is true positive, FP is false positive, FN is false negative, and TN is true negative. Future directions include a larger training set and the addition of quantitative analysis of certain features that will likely increase the accuracy of the model.
Visual verification and analysis of training model accuracy for annotations of crowded glands. A and B, Annotations for training and verification of the training data. The areas surrounded by the black annotations are the only areas that are used for training and for verification of the model. The areas surrounded by the blue line annotation are the areas of crowded glands. B, The area shaded in blue is the area that the trained model identified as crowded glands. It perfectly matches the ground truth annotation for the area within the black annotation. The verification uses the same data that were used to train the model. C through E, Analysis of trained model for identifying crowded glands. The analysis of the trained model uses areas that were not used in the training set. The thin gray ovals are the areas that are being evaluated. C, Normal endometrium. The model did not identify any of the normal endometrium as crowded glands. D and E, The area of crowded glands is seen in D. The blue shaded area in E shows that the model identified the entire area as crowded glands (hematoxylin-eosin, original magnification ×200).
Visual verification and analysis of training model accuracy for annotations of crowded glands. A and B, Annotations for training and verification of the training data. The areas surrounded by the black annotations are the only areas that are used for training and for verification of the model. The areas surrounded by the blue line annotation are the areas of crowded glands. B, The area shaded in blue is the area that the trained model identified as crowded glands. It perfectly matches the ground truth annotation for the area within the black annotation. The verification uses the same data that were used to train the model. C through E, Analysis of trained model for identifying crowded glands. The analysis of the trained model uses areas that were not used in the training set. The thin gray ovals are the areas that are being evaluated. C, Normal endometrium. The model did not identify any of the normal endometrium as crowded glands. D and E, The area of crowded glands is seen in D. The blue shaded area in E shows that the model identified the entire area as crowded glands (hematoxylin-eosin, original magnification ×200).
DISCUSSION
Although digital pathology may change how anatomic pathologists review the microscopic features of tissue specimens, AI technology will transform the pathology practice.1–4 The vast majority of practicing pathologists and trainees do not have expertise in algorithm development. However, they have subject matter expertise and a great interest in learning about the technology and applying AI to solve their practice problems. This project identified and installed a vended product that is intuitive for investigators at all levels of AI experience to use, offering flexibility to create algorithms that range from simple to complex.
The program has successfully engaged a large segment of the anatomic pathology community, with 107 members included on proposal submissions and funding extended to 84 investigators. Among investigators, there was a distinction observed between users and power users that required different training approaches to best support a positive experience and successful algorithm development. We found that working with other entities was important and we would not have been nearly as successful without the full support of the Aiforia team. Good ideas came from the collaboration of many different people. As a new endeavor, fostering an environment that empowers everyone to bring forward problems, suggestions, and solutions is important. Another critical component for the program’s success has been the administrative management to establish accountability for each investigator’s learning goals that align with time-based outcome and identify opportunities to enhance the learning curriculum as the need arose. This program helps draw upon the invaluable knowledge the anatomic pathology community members have accumulated through data collection and honing their subject matter expertise. Developing a mechanism for many people to apply their clinical knowledge to create algorithms that may advance next-generation diagnostics and multidisciplinary expertise and transform health care delivery for themselves, others, and patients has been rewarding.
One downside of the approach of using vended products with intuitive interfaces is the inability to export the models to other platforms. The Aiforia algorithms can be exported to other sites that have Aiforia installations, but these algorithms are proprietary and cannot be exported to other programs. Aiforia is developing a marketplace where developers can market their algorithms and receive compensation.
We believe that democratizing AI will allow practicing pathologists to participate in developing AI, reduce the overall costs of developing AI at scale, enhance and improve the algorithms being developed, and increase the speed at which algorithms are adopted.
References
Competing Interests
Stetzik, Lee, Samiei, Beamer, and Westerling-Bui are employees of Aiforia Technologies, and Westerling-Bui has options in Aiforia Technologies. The other authors have no relevant financial interest in the products or companies described in this article.