Generative artificial intelligence (AI) technologies are rapidly transforming numerous fields, including pathology, and hold significant potential to revolutionize educational approaches.
To explore the application of generative AI, particularly large language models and multimodal tools, for enhancing pathology education. We describe their potential to create personalized learning experiences, streamline content development, expand access to educational resources, and support both learners and educators throughout the training and practice continuum.
We draw on insights from existing literature on AI in education and the collective expertise of the coauthors within this rapidly evolving field. Case studies highlight practical applications of large language models, demonstrating both the potential benefits and unique challenges associated with implementing these technologies in pathology education.
Generative AI presents a powerful tool kit for enriching pathology education, offering opportunities for greater engagement, accessibility, and personalization. Careful consideration of ethical implications, potential risks, and appropriate mitigation strategies is essential for the responsible and effective integration of these technologies. Future success lies in fostering collaborative development between AI experts and medical educators, prioritizing ongoing human oversight and transparency to ensure that generative AI augments, rather than supplants, the vital role of educators in pathology training and practice.
The emergence of generative artificial intelligence (AI), primarily driven by advancements in transformer architectures, has marked a pivotal time, with transformative impacts across all aspects of society, including medicine.1 These technologies hold the potential to fundamentally reshape educational approaches, particularly within pathology.2 Representing a significant leap beyond previous generations of machine learning and AI tools, generative AI offers novel opportunities for advancement.
These terms may be new and unfamiliar to many pathologists. Readers are encouraged to review the introductory paper in this special section, which outlines the fundamentals of these emerging technologies. Briefly, the transformer technologies underlying large language models (LLMs) were first described in 2017 and use an attentional mechanism that allows for a sophisticated understanding of context.3 These tools possess the ability to generate humanlike text, offering an unparalleled capacity to generate novel applications, ideas, and concepts. When applied to pathology education, these tools can tailor educational content, provide dynamic feedback, and simulate complex diagnostic scenarios.4,5
Although generative AI presents promising opportunities, there exists a tension between its potential advantages and disadvantages. Potential disadvantages include a lack of control over content, reduced supervision of trainee progress, and a loss of social context within the learning environment. This applies to all learner audiences, from preclinical medical students and advanced fellows to practicing pathologists. However, the potential advantages are significant, promising a deeper, more interactive learning experience that is tailored to individual skills and interests. This review aims to navigate this complex landscape by providing a nuanced examination of generative AI’s role in pathology education. We offer insights into how it can be leveraged to support, rather than replace, the foundational aspects of pathology education.
EDUCATIONAL THEORY
Several widely accepted educational theories hold particular relevance when exploring generative AI and LLMs within medical education. One such model is adult learning theory, pioneered by Malcolm S. Knowles, PhD, which posits that adults learn best when self-motivated and engaged with topics that have practical and immediate implications.6 In this theory, the instructor’s role is to support the student’s intrinsic interest and facilitate independent exploration. Another key observation comes from Benjamin Bloom’s7 seminal PhD paper, “The 2 Sigma Problem,” which demonstrated that students receiving individualized tutoring significantly outperformed those in traditional classroom settings. This highlights the effectiveness of personalized instruction tailored to each learner’s pace and specific needs. More recently, competency-based medical education has emerged as a potentially superior framework for training physicians. Competency-based medical education emphasizes student progression based on achieving defined competencies rather than adhering to a time-focused curriculum.8
Below, we will describe how LLMs can potentially address the implications of these models by creating customized, outcome-based education aligned with individual learning objectives and competencies. Through analysis of individual performance, LLMs can potentially offer tailored readings, case studies, and quizzes, while also assessing growth by presenting learners with increasingly challenging problems. We note that regulatory authorities already mandate such educational approaches. For example, the Liaison Committee on Medical Education requires self-directed learning as a crucial element of medical school curricula.9 Similarly, the Accreditation Council for Graduate Medical Education requires pathology training programs to provide graduated responsibilities as trainees advance.8 LLMs can create realistic clinical scenarios and diagnostic challenges that mirror real-world cases, including simulated sign-out assessments. This allows trainees to apply their knowledge in practical situations, fostering critical thinking and decision-making skills essential for achieving milestones.7
MATERIALS AND METHODS
In preparation for this manuscript, a comprehensive literature search was conducted to identify potential use cases and applications of emerging generative AI tools in educational settings. Given the rapid evolution of this field, we also drew upon the collective expertise of the coauthors. Case studies and examples presented in this manuscript were developed using various generative AI tools, including ChatGPT 4.0 (OpenAI, San Francisco, California), Gemini Advanced 1.0 and Pro 1.5 (Google, Mountain View, California), Claude-3 Opus (Anthropic, San Francisco, California), and OpenEvidence (Cambridge, Massachusetts).
The paper was collaboratively written asynchronously, facilitated by numerous virtual meetings to discuss key points and the overall vision. Read AI (Seattle, Washington) was used to generate transcripts and action items from these meetings. One challenge inherent to multiauthor papers such as this is the integration of diverse writing styles, approaches, and vocabulary. To address this and enhance the overall writing quality, an LLM approach was used. Although many writers use applications like Grammarly (San Francisco, California) for grammar and syntax assistance, the authors opted to leverage a variety of LLMs. A rough draft of the manuscript was uploaded to Gemini 1.5 Pro. The entire manuscript was uploaded to provide context, and, to avoid potential condensation due to output limitations, the system optimized the manuscript one paragraph at a time. Examples of optimized paragraphs are available in the supplemental digital content, available at https://meridian.allenpress.com/aplm in the February 2025 table of contents. The LLM was instructed to optimize the writing for consistent format, flow, and structure while preserving all content, ideas, and concepts. All authors reviewed and approved both the rough draft and the LLM-optimized version of the paper, with a few minor edits to the final version.
USE CASES AND EXAMPLE OVERVIEW
LLMs have significant potential to enhance the “see one, do one, teach one” approach historically so common within the realm of medicine. This section is dedicated to outlining various use cases for these tools, broadening readers’ understanding of their extensive potential. We showcase practical applications and offer actionable advice for integrating these tools into educational practices. Furthermore, we emphasize that through iterative use and refinement of prompts, users can develop a deeper understanding of how to effectively leverage LLMs. Our aim is to inspire readers not only to conceptualize innovative uses of LLMs but also to engage actively with these tools, gaining firsthand experience of their capabilities. Through use and experience, educators can create personalized learning experiences for students by providing blueprints and customizable prompts for various AI-based exercises, such as simulations, mentoring, coaching, cocreation, and tutoring. Additionally, we advocate for the dissemination of knowledge and experiences related to these tools among peers. This collaborative approach is designed to enhance proficiency within the pathology community in harnessing these advanced technologies for educational enrichment and professional development. Detailed examples with sample prompts and results are provided in the supplemental digital content.
Generating Pathology-Specific Questions
LLMs have exhibited remarkable proficiency in generating realistic and targeted practice questions across various domains, including pathology. These systems can ingest content from diverse sources, such as the internet, research articles, review articles, lecture notes, and other texts, to generate a virtually endless stream of questions in any format. This includes multiple-choice, extended-match, short-answer, and even essay-style questions. Students can use these tools for extensive practice and engage with LLMs to clarify challenging questions or obtain feedback on incorrect answers, thus enhancing their learning efficiency. LLMs can also be used to generate plausible incorrect answers (distractors) by prompting them with items automatically retrieved from a question bank.10
Practically, this can be achieved by uploading content or directing models to specific online content. Sample prompts for generating questions include “Please create 50 multiple-choice questions with 5 options based on the provided lecture notes, targeting the level of a junior pathology resident.” Educators can also use this approach to expand question banks and develop robust and targeted questions for students. It is important to note that the quality of generated questions can be significantly influenced by the nature of the prompt and may vary among different LLMs. Output quality can be further improved by chaining prompts together, as illustrated in Table 1. The quality of the answers and distractors varies; for example, in some cases, the correct answer was longer than the distractors. However, this can be revised manually or with the aid of the LLM to alter the length of the responses.
It is noteworthy that even without specific prompting regarding format, the same initial prompt entered in Gemini Advanced 1.0 generated a question structured appropriately for the United States Medical Licensing Examination. However, the question was relatively simple, requiring additional prompting to create a more challenging question. Both models, without further prompting, provided explanations for both the correct and incorrect responses.
This capability allows these systems to function as on-demand tutors, offering explanations through various approaches to ensure trainees grasp difficult concepts. Although AI can provide detailed explanations and critiques, the depth and accuracy of these explanations may vary. Because of the potential for AI to “hallucinate” or generate fabricated information, reliance on AI for understanding complex medical concepts necessitates careful oversight to ensure that explanations are medically accurate and pedagogically sound (see section on ethics and risk mitigation).
Although the interaction described above used a text interface, smartphone cameras can also serve as valuable tools for capturing educational materials. LLMs can process both smartphone-captured images and internet-sourced images to generate personalized questions and explanations, creating a more engaging and tailored learning experience.11,12 Recent advancements in LLMs have enhanced their ability to effectively process visual information, breaking down complex visuals into simpler components for improved comprehension and assessment.
Customizing LLM Response to Learner Level
Often, the key to understanding a challenging concept lies in the ability to approach it from different perspectives or viewpoints. This is where exceptional teachers excel, guiding learners through complex topics and offering clear and accessible explanations. Unfortunately, not all learners have access to such skilled educators. LLMs can bridge this gap by providing explanations for complex concepts tailored to various comprehension levels. As an example, consider the explanations of lepidic lung cancer provided in Table 2. These systems can act as on-demand tutors, offering explanations for difficult concepts in a way that caters to individual needs. Pushing these systems to their limits, such as requesting an explanation of lepidic lung cancer suitable for a kindergarten audience, can yield surprisingly insightful outputs that distill core aspects into easily understandable terms (Table 2). However, it is crucial to remember that the depth and accuracy of these explanations may vary and should always be reviewed to ensure medical accuracy.
ChatGPT 4.0 Explaining Challenging Concepts Using Large Language Models to Generate Novel Analogies and Explanations for Different Learners

This concept of tailoring explanations to specific audiences extends to pathology reports as well. LLMs can “translate” the content of pathology reports into a form that is easily understood by various audiences, such as oncologists or patients. The level of detail can be adjusted to align with the specific educational background or language capabilities of the intended audience, as demonstrated in Table 2 with the example tailored for a patient with a specific level of English comprehension and education.
Streamlining Literature Review With AI “Journal Club”
The overwhelming amount of information available in scientific literature presents a significant challenge for trainees and practicing pathologists striving to stay abreast of the latest advancements. Many groups participate in journal clubs to discuss and dissect new and important papers in the field. However, preparing summaries of these papers can be a time-consuming process. LLMs excel at document summarization; they are capable of accurately and succinctly summarizing even extensive studies within seconds. Furthermore, they allow users to explore and interact with the paper through queries such as “Can you explain the methods of this study?” or “Can you outline how these findings might influence my practice as a pathologist?” It is also possible to provide multiple papers on a similar topic to these tools and request that the system combine and integrate findings from across these studies. This can be an excellent method for rapidly and efficiently expanding knowledge in a new area.
It is important to acknowledge that most LLMs used currently are optimized for text and are limited in the degree to which they can interpret figures within papers. However, future multimodal systems that seamlessly integrate text and visual models hold the potential to understand not only the textual content but also the visual elements of scientific papers. These systems could further streamline and accelerate the review process. Nonetheless, caution is always advised to ensure that all elements of the paper are cross-referenced, and key findings and conclusions are carefully assessed and confirmed by the human reviewer.
Overcoming Language Barriers in Pathology Education
A significant portion of pathology literature and educational resources is primarily available in English, posing a challenge for learners whose first language is not English.13 This language barrier can hinder their ability to stay current with the latest advancements and understanding within the field. Although traditional translation tools may struggle to maintain accuracy for complex scientific texts and concepts, LLMs, with their capacity to understand and maintain context, offer a potential solution. They can often provide more accurate translations that preserve the context, ideas, and nuances of the original language. Additionally, LLMs can generate highly accurate translations of patient reports into a wide range of languages.14,15
AI tools also prove valuable for learners whose first language is not English but who are attempting to engage with content delivered in English, such as live medical meetings or streaming sessions. By providing high-fidelity transcriptions, visual capture of presented slides, and potentially summarized or synthesized content, these tools can mitigate challenges associated with learning in English for those whose first language is not English. Emerging models such as Gemini 1.5 possess enormous input token limits, enabling them to process up to 11 hours of audio or 1 hour of video. This capacity is expected to expand exponentially in the future. The supplemental digital content documents an example of this technology’s power, showcasing the upload of an hour-long video on the basics of lung cancer and the subsequent generation of answers to specific queries based on the video content.
Despite these advancements, human oversight and verification remain crucial, as with other AI applications. Current machine translation quality is not perfect, particularly for specialized scientific terminology and less common languages. Existing LLM tools typically offer a single translation output, limiting learners’ ability to critically engage with and negotiate meaning, a process central to second-language acquisition. Additionally, copyright restrictions may impede the sharing of full articles, and the potential exists for the proliferation of lower-quality translations if used without proper human oversight.
Generating Rubrics for Evaluating Open-Ended Questions
Open-ended questions hold significant value in both formative and summative assessments. Compared with multiple-choice questions, they offer a more comprehensive evaluation of a student’s ability to integrate information, demonstrate a deep understanding of concepts, and formulate coherent responses. However, creating such questions and developing corresponding rubrics for assessment can be extremely time-consuming for educators. Given the various pressures faced by faculty in academic medicine, few institutions possess the resources to fully use this modality.
LLMs can facilitate the process of generating open-ended questions and creating rubrics with varying levels of specificity. These rubrics can then be used to grade student responses and provide targeted feedback for improvement. Although even a simple prompt like “Please provide a rubric for grading an open-ended question on the subject of X” can generate a detailed response from an LLM, rubrics are perhaps best used for evaluating responses to very specific questions.
The following example illustrates this process. First, the LLM was provided with a PDF of a Microsoft PowerPoint lecture (not shown) and prompted to generate the following open-ended question based on its content: “Acute inflammation is a critical response to tissue injury and infection, with the neutrophil playing a central role as the ‘foot soldier’ of the immune system. Describe the sequence of events, using appropriate terminology for each step, that lead to the migration of neutrophils from the bloodstream to the site of inflammation. Conclude with a brief explanation of how these steps collectively contribute to the body’s defense against infection. Your response should not exceed 200 words.” The LLM was then instructed to provide a rubric for grading the response and offering feedback (see Table 3).
Leveraging Integration of Custom Information With LLMs
The propensity of LLMs to hallucinate presents a significant risk, particularly in critical domains such as medicine. To mitigate this risk and improve the accuracy of LLM-generated outputs, custom information can be integrated into these systems. Several approaches can achieve this, with the simplest being the creation of a custom generative pretrained transformer (GPT) appended with specific information. The supplemental digital content provides an example of such a custom GPT augmented with grossing protocols and publicly available research papers.
Another approach involves using retrieval-augmented generation to create a vector database. This database enhances prompts with up-to-date or specific information, thereby refining responses to particular queries.16 This setup allows users to submit questions in natural language, which are then assessed and matched against the retrieval-augmented generation database to provide precise and relevant answers. For instance, a user might ask, “How many sections should I take in a lung cancer case treated with neoadjuvant chemotherapy?” The system would then identify the appropriate lung protocol and highlight the International Association for the Study of Lung Cancer sampling guidelines, integrating this information into a natural language response outlining the recommended number of sections.
Taking this concept further, specialized LLMs such as OpenEvidence, trained primarily on research articles, can be developed. These models can not only respond to queries like the one above but also provide accurate references (see supplemental digital content for an example). Such systems, along with similarly trained models, can guide pathologists with inquiries such as “I have limited tissue in this lung biopsy that appears to be a poorly differentiated carcinoma; what stains should I start with?” These systems prove invaluable for trainees who are in the early stages of learning these concepts and ideas. Although existing resources cover much of this information, LLMs offer easy accessibility and facilitate follow-up queries. For instance, a trainee could use these tools to explore additional tests for refining a differential diagnosis or investigate indications for molecular testing.
Assessing Pathologist and Trainee Performance
Pathology training largely follows an apprenticeship model, where trainees actively participate in the daily workflow of the pathology laboratory. In many training programs, trainees draft preliminary versions of pathology reports, which are subsequently reviewed and refined by staff pathologists prior to final sign-out. However, increasing case volumes and complexity often limit the time available for attending physicians to provide detailed and insightful feedback to trainees regarding the rationale behind report modifications. This often necessitates trainees independently reviewing cases and cross-referencing edits to understand the changes made by staff pathologists.
LLMs offer a potential solution to this challenge. By summarizing the changes between the original and final reports, LLMs can provide trainees with a concise overview of the types and content of modifications made. This allows for the identification of cases with changes in diagnostic content (eg, lymph node status, margin status, classification, or grading) and differentiation from those with stylistic changes only. These systems hold the potential to generate robust metrics of trainee performance, offering valuable feedback on diagnostic skills and supporting trainees’ progress toward generating accurate and comprehensive pathology reports.
However, practical implementation of such tools presents challenges. Automated systems may require integration of LLMs within the laboratory information system. Although manual upload of reports into LLM models is possible, this approach is time-consuming, and uploading reports containing patient identifiers poses privacy concerns unless the institution establishes a Health Insurance Portability and Accountability Act (HIPAA)–compliant, isolated LLM instance.17 As LLM tools become more sophisticated and integrated into laboratory reporting systems, their deployment in this context is likely to become more feasible. Pathologists should consider advocating for such integration with laboratory information system vendors.
LLMs can also help practicing pathologists maintain their skills throughout their career. Specifically, LLMs could be developed to interact with the American Board of Pathology’s continuing certification program, ABPath CertLink, and used to analyze a diplomate’s dashboard, pinpoint challenging topics, and generate a curated list of educational resources (eg, continuing medical education courses, seminars, books, Google YouTube videos) to address these knowledge gaps. Similarly, LLMs could facilitate the tracking of a trainee’s progress over time, highlighting areas of strength and those requiring improvement. This targeted approach would enable both trainees and educators to focus on specific competencies, ensuring a comprehensive, personalized, and well-balanced training experience. The use of LLMs during timed assessments like ABPath CertLink can mirror the reality that practitioners often rely on internet resources for information, which is acceptable when the data are accurate. However, LLMs present a unique challenge as they do not directly access data but use predictive models to generate responses. This raises the risk of hallucinations and incorrect answers, potentially compromising the assessment’s validity. Importantly, incorrect answers in CertLink serve a crucial role in formative feedback, with these questions often reappearing in future examinations to reinforce learning and limiting the impact of using external sources, whether from general internet sources or foundational LLM models. Another area of active exploration is the use of LLMs for oral examinations. LLMs can effectively simulate oral examinations, provide personalized feedback, and alleviate some of the workload associated with conducting these assessments.18 Addressing implicit bias in LLM-based oral examinations for continuing certification would be challenging but crucial to address. Potential strategies could include rigorous data auditing and debiasing, using diverse development teams, using standardized prompts, combining LLM assessments with human evaluators (human in the middle), implementing bias detection algorithms, and maintaining ongoing monitoring and adjustment processes. Although these steps might significantly mitigate bias, eliminating it may not be possible because of the complex nature of language models and subtle human biases. Therefore, LLM-based assessments, particularly oral examinations, must be part of a broad, multifaceted assessment strategy.
Improving Academic Writing
Writing can pose a significant challenge for trainees and often hinders research productivity. One of the most daunting steps is initiating projects in new and unfamiliar areas. Generative AI tools can prove invaluable in this scenario by providing a platform for exploring and discussing novel concepts and ideas. Some systems now enable voice communication with LLMs, offering a natural and intuitive way to develop new research ideas or explore unfamiliar topics. This interactive approach allows learners to exchange ideas with LLMs using spoken language, fostering a constructive process for refining, developing, and cross-referencing ideas with existing information.
These tools also offer significant benefits for individuals whose first language is not English. As previously mentioned, LLMs can generate content in other languages and subsequently translate it into English while preserving the original context and ideas. Additionally, they can provide robust editing capabilities to improve grammar, format, and overall text structure.
In the present review, LLMs were used to enhance the writing process and facilitate collaboration among the international author group, as described in the Materials and Methods section. Initially, a general outline of the manuscript was generated. Authors then contributed content to a draft version, which was subsequently processed by an LLM to improve the flow and cohesion between sections written by different authors. This final step is particularly valuable in the increasingly collaborative landscape of research and publication, where integrating contributions from multiple authors can be a challenging and laborious task.
Working With Synthetic Data
Synthetic data, which refers to artificially generated data that mimic real-world data, holds the potential to revolutionize pathology education. By providing diverse, accessible, and privacy-compliant learning materials, synthetic data can play a crucial role in preparing future generations of pathologists. As technology advances, the integration of synthetic data into educational curricula will become increasingly valuable. It also offers a unique advantage in testing various models before committing time and resources to more extensive real-world studies.
Within the realm of pathology education, synthetic data present several promising applications. Generative AI models, trained on large data sets of digitized pathology slides, can create synthetic virtual slides for educational purposes.19 These slides enable students to practice diagnosis and interpretation skills on a wide range of cases without relying on physical slide collections, thus mitigating concerns regarding patient privacy and HIPAA compliance. Various activities and skills, including feature identification, feature finding, and measurement, can be readily applied to synthetic data, helping to enhance trainees’ critical skills in diagnosis and staging.
Synthetically generated digitized slides also offer the ability to exclude artifacts, allowing trainees to focus on and better understand key findings. As trainees gain proficiency, artifacts can be reintroduced to create real-world challenges that replicate the complexities of diagnosis. Furthermore, synthetic data can be used to generate examples of rare diseases or atypical presentations of common diseases, expanding students’ exposure to a broader spectrum of pathologic conditions, even when real-world examples are scarce. Evidence from other medical disciplines suggests that such exposure can reduce the risk of diagnostic errors.20–22
Access to well-preserved gross surgical specimens often presents a challenge in many training programs. Synthetic 3D images offer a potential solution, providing virtual gross specimens for educational purposes. Additionally, generative AI can be used to create interactive virtual patient cases that simulate the entire diagnostic process, from clinical presentation to histopathologic findings. These synthetic cases, encompassing both anatomic and clinical pathology data, can aid students in developing clinical reasoning skills and understanding the vital role of pathology in patient care.
In clinical pathology education, large synthetic data sets exist that accurately reflect local patient populations.23 These data sets are meticulously constructed to maintain the statistical properties of real patient populations while eliminating the risk of patient reidentification. Outliers and extreme data values may be censored to achieve this goal. Learners can use LLMs to analyze individual “patient” records or query the entire database to answer clinical questions that would otherwise be restricted by HIPAA regulations, such as identifying laboratory result patterns associated with complex disease states like sepsis.
Using Multimodal Generative AI Tools
Although this review has primarily focused on text-based LLMs, it is important to acknowledge that many of these tools are multimodal. This means they can process and integrate not only text but also audio and visual input, and produce various types of output, including video. The supplemental digital content for this paper includes a video that highlights this process. In the video, an image of a screen displaying a transformer model juxtaposed with the Bloom taxonomy is uploaded to the LLM, which correctly identifies both elements. The user then engages in a brainstorming session with the AI tool using voice commands, exploring how transformer-based tools can potentially facilitate students’ progression toward higher-order thinking skills within the Bloom taxonomy.
Result of multimodal chain of audio and image data to produce a representation of synergistic learning between human and machine learners, with 2 distinct outputs shown in (A) and (B).
Result of multimodal chain of audio and image data to produce a representation of synergistic learning between human and machine learners, with 2 distinct outputs shown in (A) and (B).
Other Examples of Using Generative AI
The potential applications of LLMs in education are limited only by the imagination and creativity of the user. To spark inspiration and foster further exploration, Table 4 outlines additional use cases and provides suggestions for implementation.
ETHICS ISSUES OF GENERATIVE AI TECHNOLOGY USE IN MEDICAL EDUCATION
Generative AI technologies, exemplified by tools like ChatGPT, represent a paradigm shift in the way humans acquire, process, and use knowledge. Although query-based search engines (eg, Google, PubMed) are often considered analogous to traditional methods of learning and inquiry (eg, libraries), LLMs are intuitively perceived as a radically different approach to human learning and discourse. This is likely because of the interactive and seemingly anthropomorphic nature of many AI/LLM interfaces. Consequently, there are valid concerns regarding the ethical implications of powerful technologies like LLMs and generative AI within the context of medical education.24 Many contemporary discussions surrounding the use of LLMs raise concerns about potential academic dishonesty.25 However, this is a complex and evolving issue that requires nuanced consideration.
Establishing ethical AI paradigms in health care relies on a few key foundational principles: transparency, accountability, and governance.26 We propose that these same principles can serve as a framework for guiding the appropriate use and implementation of LLMs in pathology education.
Transparency
We suggest that the use of LLMs in education should be transparent at all levels, including personal, departmental, institutional, and societal. Clear guidelines regarding the use of such tools are essential, along with an open and accessible review process. In this paper, we have transparently disclosed our use of LLMs to augment and integrate the writing and ideas of multiple authors (see Materials and Methods). Algorithmic source transparency is crucial for enabling users to assess the validity of information generated by these systems. Additionally, transparency fosters iterative improvement, leading to the development of more effective and reliable educational content over time.
Accountability
Given the rapid evolution of AI and LLM capabilities, establishing accountable paradigms for their use in educational contexts is critical. Accountability represents a shared responsibility to ensure equitable and trustworthy learning outcomes for both trainees and educators. Robust documentation protocols for measuring the impact of educational technologies are essential for maintaining accountability. Such protocols should be implemented with institutional guidance and allow for dynamic updates as needed throughout the implementation lifecycle of an educational AI/LLM method.
Governance/Rules/Policies
When integrating powerful technologies like generative AI/LLMs into health care education, establishing clear governance rules and policies is paramount. These guidelines are necessary at the educator, departmental, and institutional levels. Equally important is ensuring that trainees are aware of educator expectations regarding the appropriate use of AI for educational activities. It is highly likely that tech-savvy trainees are already using generative AI/LLMs in unsupervised settings. By adopting open and transparent policies governing the use of AI/LLM systems, both educators and trainees can harness the immense potential of AI-enabled education while mitigating potential risks.
RISKS OF GENERATIVE AI IN PATHOLOGY EDUCATION AND POTENTIAL SOLUTIONS
Although undeniably powerful, generative AI and LLMs are also susceptible to misuse and present potential risks if not used responsibly.25 Educators and learners must be cognizant of the pitfalls associated with improper use of these technologies.
One significant risk stems from the inherent biases present within the vast data sets used to train generative AI tools. These data sets often encompass multiple domains, both within and outside of health care, and may contain overrepresentation or underrepresentation of certain groups or conditions. Consequently, responses generated by LLMs can reflect these biases, leading to inaccurate or misleading information.
Furthermore, generative AI LLMs function as complex sentence constructors that may hallucinate when faced with gaps in knowledge. Data hallucination poses a significant challenge in educational settings, particularly when LLMs are used to generate answers to questions. Trainees, often lacking extensive domain-specific knowledge, may struggle to discern subtle inaccuracies in LLM-generated content, potentially jeopardizing their learning process. This underscores the crucial role of expert oversight in carefully vetting generated content to ensure accuracy, reliability, and freedom from bias. Educators should encourage trainees to use traditional sources to verify suspicious responses, emphasizing the importance of critical evaluation and validation of information. Additionally, using automated strategies such as cross-verification with alternative LLMs can enhance the scrutiny of generated content and provide an additional safeguard against data hallucination. Concerns regarding hallucination and potential copyright infringement of examination questions have prompted a position statement from the American Board of Pathology. Members of test development committees are currently advised against using AI to generate test items for board examinations (written communication with Gary W. Procop, MD, MS, Med CEO, American Board of Pathology, May 4, 2024).
The emergence of powerful AI tools like LLMs also raises concerns about potential disruption of learners’ ability to acquire fundamental knowledge and skills.27 Overreliance on AI tools without developing critical thinking skills can paradoxically hinder long-term learning and skill development.28 Of particular concern is the potential for “automation bias,” where individuals uncritically accept information provided by LLMs. Learners and educators must recognize that generative AI/LLMs are powerful tools but do not replace the need for critical thinking, independent learning, and human oversight.29
Moreover, there is a critical need to educate educators and administrators regarding the capabilities and limitations of AI tools and establish clear guidelines for their use in pathology education. Appropriate and responsible use of AI/LLMs is crucial for effectively training future generations of pathologists.29 Given the rapid pace of technologic advancements, incorporating these topics into existing pathology curricula may prove challenging. Therefore, pathology leaders must proactively provide educators and trainees with the necessary educational resources to understand and maximize the benefits of these tools while ensuring compliance, safety, and transparency.
Finally, it is essential to address the potential for unequal access to these powerful AI tools due to financial constraints. The cost associated with access to generative AI tools may be prohibitive for many pathology learners worldwide. Therefore, concerted efforts at all levels (educator, departmental, professional, and societal) are required to ensure equitable and meaningful access to these transformative educational resources for pathology learners across the globe.
THE FUTURE
The field of generative AI is evolving at an exponential pace, with advancements rapidly surpassing what was once deemed impossible. A growing community of educational technology professionals maintains a database of AI tools specifically designed for educational purposes, currently encompassing over 300 entries and expanding weekly (https://aitoolsdirectory.notion.site/The-AI-Tools-in-Education-Database-759da4b9aca649a7add36fd7b2833c0b, accessed August 29, 2024).
One area of particular interest, which extends beyond the scope of this review, is the application of vision transformers. Analogous to the transformer technologies underlying LLMs, vision transformers possess the ability to understand and interpret image data. These systems hold immense potential, particularly in the classification and analysis of pathology images. Their capabilities can be further augmented through integration with LLMs and other input/output modalities within multimodal systems. Numerous research groups are actively developing robust systems tailored for the pathology domain, demonstrating impressive proficiency in integrating detailed insights from pathology images. Although still in their nascent stages, these systems are rapidly evolving and promise to revolutionize pathology practice. Future applications may include guiding students in recognizing normal and abnormal structures on pathology slides or developing advanced sign-out simulation tools. With the development of larger pathology-focused training data sets, the creation of robust and easily shareable synthetic training data will become increasingly feasible, facilitating pathology education without compromising patient privacy. Furthermore, the training data sets used for developing AI tools may themselves prove valuable for training human pathologists, potentially accelerating competency development and replicating the diagnostic acumen acquired through years of experience.
The concept of artificial general intelligence (AGI) remains a topic of debate within the AI field. AGI is often conceptualized as a form of AI capable of understanding, learning, and applying knowledge across a wide range of tasks at a humanlike level of competence. Although the eventual achievement of AGI is generally accepted, predictions regarding the timeline vary widely. Nevertheless, it is important to contemplate how future AGI learners might be integrated into the field of pathology. Although the specifics remain unclear, initiating these discussions at this early stage of technologic development is crucial. By fostering dialogue and collaboration, we can strive to develop shared and integrated learning tools and resources that promote synergy and alignment between human and machine learners, ultimately shaping the future of AI-augmented pathology workflows.
SUMMARY
Generative AI has the potential to revolutionize numerous aspects of our lives, with education likely at the forefront of its large-scale implementation and adoption. The use of de-identified materials and content in educational settings circumvents the challenges and regulatory concerns associated with applying these tools in clinical workflows. Although generative AI presents immense opportunities for enhancing pathology education, it is crucial to acknowledge and mitigate potential risks. Active engagement from pathologists is imperative to ensure the responsible and appropriate integration of LLMs into pathology education.
References
Author notes
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the February 2025 table of contents.
Competing Interests
The authors have no relevant financial interest in the products or companies described in this article.