Context.—

Generative artificial intelligence (GAI) technologies are likely to dramatically impact health care workflows in clinical pathology (CP). Applications in CP include education, data mining, decision support, result summaries, and patient trend assessments.

Objective.—

To review use cases of GAI in CP, with a particular focus on large language models. Specific examples are provided for the applications of GAI in the subspecialties of clinical chemistry, microbiology, hematopathology, and molecular diagnostics. Additionally, the review addresses potential pitfalls of GAI paradigms.

Data Sources.—

Current literature on GAI in health care was reviewed broadly. The use case scenarios for each CP subspecialty review common data sources generated in each subspecialty. The potential for utilization of CP data in the GAI context was subsequently assessed, focusing on issues such as future reporting paradigms, impact on quality metrics, and potential for translational research activities.

Conclusions.—

GAI is a powerful tool with the potential to revolutionize health care for patients and practitioners alike. However, GAI must be implemented with much caution considering various shortcomings of the technology such as biases, hallucinations, practical challenges of implementing GAI in existing CP workflows, and end-user acceptance. Human-in-the-loop models of GAI implementation have the potential to revolutionize CP by delivering deeper, meaningful insights into patient outcomes both at an individual and a population level.

The progress and evolution of artificial intelligence (AI) in the past decade have been nothing short of spectacular. This has been enabled through the convergence of 3 key technologies: (1) computational hardware (primarily through enhancements in graphical processing units and central processing units, memory, and data storage); (2) machine learning/AI algorithms; and (3) marked enhancements in the availability and accessibility of data. Heretofore, each of these was a major impediment to the advancement of AI. Yet, during the past 15 years, we have witnessed successive and consistent breakthroughs in AI’s scale, scope, and capability. Specifically, this has been demarcated by the practical application of neural networks and especially “deep learning,” which represents an architectural variant of neural networks characterized by complex, many-layered models. These deep learning tools were quick to show promise in image recognition, with adjacent applications emerging in voice recognition, text, and time series modeling.1  Generative AI (GAI) also has roots that predate the advent of OpenAI ChatGPT, or generative pretrained transformers (GPTs) in general, in the form of generative adversarial networks, long short-term memory networks, and related technologies.2  Importantly, 2017 saw the advent of a specific way of building AI models known as a “transformer,” which expanded the breadth and nuance that generative models could achieve.3  Since that time, upscaling in the training, input, and parameter size of transformer-based models has given rise to the modern wave of language and multimodal AI that we see today, such as the large language models (LLMs) based on the GPT technology. LLMs have captivated popular attention with the launch of ChatGPT from OpenAI. For the sake of definition, GAI tools occupy a spectrum, with diversity in model size (eg, LLMs versus small language models) and modality (eg, language models versus image models versus multimodal models), among other things. A technical comparison across these dimensions is beyond the scope of this article, and we will use the term language model as a more generic term for language-generating AI. This article will focus on the generative nature of these tools and their ability to create outputs comparable to some human benchmarks in the specific context of clinical pathology (CP). It is also important to note that contemporary AI models demonstrate an applicability that was not previously possible and are finding their way into more and more real-world workflows, including pathology. Paradoxically, many of these workflows are creative and are not conventionally “robotic.” This is counter to the predictions from some years ago suggesting the arrival of autonomously driving trucks (still unavailable) before the arrival of linguistic or “creative” AI. Instead, in 2024, we can use AI to create paintings, music, and novels, using LLM models such as ChatGPT.

Within the field of pathology, the use of AI/machine learning has been confined to nongenerative settings, and much attention has rightfully been paid to whole slide imaging classification and image segmentation tasks.4  This has led to more widespread digitization, which, in turn, has enabled image-based AI to access pathology workflows much in the same way that AI has advanced through the specialty of radiology. In contrast to anatomic pathology, CP has received much less attention. This is not surprising since CP represents a diverse, if not orthogonal, collection of subspecialties with different (and specialized) data requirements, workflows, and systems. These inputs range from quantitative and qualitative assay results to various textual details throughout the patient medical record. Moreover, the scope of potentially relevant inputs are nonuniform and—in cases such as genomics—exceptionally large. CP also presides over the clinical laboratory, which has day-to-day operational responsibilities that encompass nonmedical and medical data alike. This introduces unique workflow challenges in the domains of quality control and assurance, inventory management, instrument selection, and procurement. These key observations regarding the nature of work in CP encapsulate the motivation for this article in this series on GAI in pathology.

Currently, the predominant GAI models spanning both closed-source subscription products (eg, OpenAI GPT-4/4o, Anthropic Claude 3, and Google Gemini) and open-source products (eg, Meta Llama 3/3.1, Mistral AI Mistral/Mixtral models) have robust linguistic and sometimes multimodal (eg, text and image) capabilities that can be powerfully and flexibly prompted to perform a diverse set of tasks. Additionally, they retain the ability to generate text and image outputs directly through a user interface and to interact with other technical resources such as application programming interfaces (APIs), allowing these tools to effectively use other software. There has never been a manifestation of AI that was as accessible and as versatile in its implementation as contemporary large language and multimodal AI models. Therefore, the conversation around AI has noticeably shifted from focusing on what can be invented in research and development to what can be implemented into health care practices, including within CP. This effectively relocates the “bleeding edge” of innovation into the hands of practitioners in the workforce, such as clinical pathologists and other laboratory professionals. The ability of health care workers (including physicians) to come up with new and creative ways of applying these AI resources (such as LLMs) to real-world tasks in health care is thus of much interest.

In the next decade, we will likely see widespread implementation of various AI-enabled health care workflows across all specialties (including CP) using GAI approaches. Health care is rife with repetitive tasks both in clinical and administrative contexts, and pathology is certainly no exception to this. These are poised to be the first areas of ingress for GAI in established pathology workflows. Beyond this, we are also likely to witness the use of GAI to demonstrate value as a copilot, helping to supplement a pathologist’s ability to assemble data from the medical record, the internet, textbooks, and custom knowledge databases. Copilots will likely see increasingly consistent usage, perhaps to a point where their use is considered standard of care (and routine) in health care, similar to the use of tools such as Microsoft Word. The scalability of AI models and their ability to behave like cognitive utilities rather than as individual workers likely means that their increased use and exposure to both input data and user feedback will continue to develop over time. However, this is also likely to deepen the gap between copilot-aided tasks versus those done without. Just as one cannot imagine CP without computers, email, and spreadsheet software, it will similarly be impossible to imagine the practice of CP without the assistance or participation of AI in some form or other in the near future. Thus, clinical pathologists (and pathology in general) need to prepare for such a future by anticipating the key areas of AI-enablement and leveraging them to provide enhanced outcomes for patients.

In the current article, we focus on providing an overview of scenarios where GAI, in particular LLMs, could play an important role in improving CP practice in the near future. Specifically, we have focused on how LLMs can be used to support and advance data mining, decision support, translational research, and administrative activities within clinical chemistry, clinical microbiology, hematopathology, and molecular pathology. We chose to group the former (chemistry and microbiology) and latter 2 (molecular pathology and hematopathology) specialties together owing to their relative similarities in workflows, challenges, and needs.

While many aspects of chemistry and microbiology practice are highly automated, hematopathology and molecular pathology practice continue to be relatively manual/semiautomated in nature. However, it is important to remember that specific subareas within a subspecialty may still rely highly on manual activity (eg, clinical bacteriology—Gram staining, fungal smears and culture, and stool parasite examinations), illustrating the overarching complexity that exists in the practice of CP. And yet, a common and consistent theme seen across all CP subspecialties is the desperate need to improve existing workflow protocols. This is an area wherein GAI can potentially excel. The thoughtful use of AI/GAI is perhaps the single biggest opportunity to effect meaningful change in CP practice in the near future.

Finally, this article does not focus on additional areas of CP such as blood banking, clinical immunology, HLA typing, and clinical informatics. These areas have unique applications, challenges, and uses of GAI in each individual context. Yet, the broad GAI principles expounded in this article will be commonly applicable to these subspecialties as well. We also do not cover GAI as it applies to image creation in these specialties. For a review of the history of GAI; glossary of terms; and the evaluation, validation, and implementation of GAI algorithms, the reader is referred to the article by Singh et al in this special section on GAI. Figure 1 illustrates a variety of such common and thematic tasks pertinent to CP, which could be impacted by GAI in the future. We review many such applications of GAI in the following sections.

Figure 1.

Future applications of GAI in clinical pathology. A schematic illustrating various tasks amenable to GAI-assisted usage in clinical pathology. While there are multiple subspecialties in clinical pathology (eg, clinical chemistry, microbiology, hematopathology, and molecular diagnostics, among others), different tasks amenable to GAI automation are similar across various subspecialties. Abbreviations: AI, artificial intelligence; GAI, generative artificial intelligence; LLM, large language model.

Figure 1.

Future applications of GAI in clinical pathology. A schematic illustrating various tasks amenable to GAI-assisted usage in clinical pathology. While there are multiple subspecialties in clinical pathology (eg, clinical chemistry, microbiology, hematopathology, and molecular diagnostics, among others), different tasks amenable to GAI automation are similar across various subspecialties. Abbreviations: AI, artificial intelligence; GAI, generative artificial intelligence; LLM, large language model.

Close modal

Data Mining Applications in Clinical Chemistry and Microbiology

Generative AI, and specifically LLMs, have several potential applications in core laboratory domains such as chemistry and microbiology. Perhaps the most compelling and broadly relevant category of application is data mining. Data mining is a general term but could be broadly described as performing the work of retrieving specific pieces of clinical and laboratory data from larger repositories/warehouses of information. For health care—and CP—data mining is a prerequisite for many common functions performed in routine practice. In addition to their ability to generate language output, language models have a very flexible ability to evaluate textual data. More specifically, language model design gives them powerful capabilities to compare text in nuanced ways. LLMs are capable (in a very human-like manner) of comparing not just words directly, but also their meaning, implications, relevance, and many other conceptual approximations. Owing to these abilities, language models can implement search criteria in surprisingly intelligent ways.5  For example, language models can be used to query research articles, standard operating procedures (SOPs) in a laboratory, and even detailed patient medical records in response to natural language queries such as, “how do I determine if an organism is resistant to erythromycin, what diameter should I expect to see in the inhibitory zone, and has this changed compared to last year’s guidelines?” or “what should I do if I get an error code 250 on a Vitros 5600 analyzer?” Both examples above are instances of complex and chained queries that were previously not amenable to responses from traditional search engines such as Google.

This form of real-time data mining is enabled by 2 critical LLM design principles: vectorization and retrieval-augmented generation (RAG).6  As indicated by its name, a typical RAG platform consists of several key components, which can be categorized into 3 main layers: Retrieval, Augmentation, and Generation. The Retrieval layer typically consists of an “indexing” system that creates, stores, and organizes (ie, in a vector database) the vector embeddings that represent the data of interest, along with a “query processing” module (eg, semantic search) that is responsible for processing the user’s queries and identifying the relevant search terms and context. The Augmentation incorporates data that is retrieved from the indexed vector database and adds it to the LLM prompt. Lastly, the Generation component uses an LLM to generate outputs from the augmented dialog (ie, dialog that contains information retrieved by searching). However, as one can imagine, these RAG platforms also have their own set of limitations and hurdles to overcome, which includes but is not limited to data preparation, evaluation, and user interface needs.

RAG can be an effective technique to constrain both the topical scope and the quality of a language model output, especially in the clinical setting where many concepts and associations are highly specialized and are closely tied to ground truth references.7  Because of this, generative tools—and especially language models—can serve as an intermediary between the data sources such as PDF documents, web sites, and warehouses and distal use cases such as the production of guidelines or test interpretations in an area like CP.

Naturally, the medical record is also a potential data source that can be mined by language models, and one can envision the implementation of many such use cases in high-volume clinical chemistry and microbiology practices. These specialties rely heavily (often in a bidirectional manner) on the need to incorporate patient-specific information from (and to) the medical record. If the needed elements from the medical record are standard, then extracting them from a pathology data warehouse or directly from the medical record via an API like Fast Healthcare Interoperability Resources (FHIR) could be accomplished with a boilerplate query that takes place before using the prompting with a language model.

However, to enhance these functionalities in CP, in the future these tasks can involve more than a standard query or even a simple vector search. Enhanced CP data retrieval and analysis may be accomplished by requiring key design patterns unique to generative tools such as “agents” or “plugins.” In such a case, generative tools like language models may be given instructions about what to do when specific questions or themes arise. For example, language models can be programmed to determine if the prompt would require knowledge of a patient’s complete blood cell count results and, if so, if it can formulate an FHIR API query to retrieve that information from the medical record and then use that information to compose its response, enabling a more “human-like” experience for the end-user.

In this way, a complex medical question such as “should we consider adjusting patient X’s current furosemide prescription given his most recent platelet count and potassium level?” can be answered with the pertinent information across PDF guidelines, FHIR APIs in the medical record, and expert interpretive text brought together by the integrative capabilities of generative language models. Although data mining is not without its issues (eg, confabulation/hallucination, which we will discuss later), the breadth of prompts, agents, and plugin possibilities and the uniquely tailored composability of generative tools make powerful data mining approaches more feasible than ever before within CP. One exciting potential example of this within the core laboratory is using AI to triage incoming specimens. Current triage systems are usually limited to separating STAT from routine specimens, but it can be anticipated that an AI-enabled copilot would be able to query the electronic medical record (EMR) data for each incoming specimen and dynamically adjust prioritization in light of nuanced medical information and could propagate this status not only to pathologists but also at the level of the robotic line or the instrument, or both. Similarly, one can envision a GAI copilot that monitors the patient’s chart and can more efficiently use a sample with insufficient volume for all of the requested laboratory tests simply by prioritizing those tests that are more important for the patient’s clinical picture.

Decision Support Processes in Clinical Chemistry and Microbiology

Enhancements to AI-enabled data mining offer a natural segue into an area where GAI offers considerable promise within CP: decision support. Both microbiology and chemistry serve a critical, real-time role in guiding patient care. Tasks such as communicating the presence or absence of pathogenic organisms (microbiology) or identifying biochemical and/or hematologic anomalies (clinical chemistry) are among the primary responsibilities of the laboratory and those of clinical pathologists. These tasks currently use static data sources such as taxonomies and guidelines for species identification that are periodically updated with data such as local or regional antibiograms and laboratory guidelines in microbiology. In clinical chemistry, tracking and monitoring the evolving real-time status of data such as a patient’s immune status, renal function, electrolytes, active medications, or even progress notes is very useful. Traditionally, creating tracking tools that could automatically incorporate these sources in CP was a nontrivial task owing to the relatively isolated nature of the laboratory information systems (LISs) from the electronic health records. With GAI, dependencies between decision support output and patients’ real-time laboratory data can be enabled by creating conditional behaviors within the LIS programs to compose meaningful decision support content. Many such attempts were made previously8 ; however, these programs were also fragile to variations in source data such as spelling errors, syntax changes, or peculiarities in style and formatting. With language models, the content and meaning of source data can be regularized to overcome this fragility, allowing models to extract relevant details from these sources and incorporate those details into decision support guidelines in a much more robust manner. Much like a human interpreter, language models can produce patient-specific adaptations of microbiologic (or clinical chemistry) test result data on the fly and in the appropriate context as necessary.

Perhaps the most exciting implication of this for the paradigm of decision support in CP is that language model integration offers the ability to provide patient-specific guidance in a ubiquitous manner. This is a departure from the current approach where focused time and effort are required for a pathologist to craft decision support content (if it is possible to do so at all). If one is able to generate such personalized reporting content for all patients (or at least generate a draft requiring only a pathologist’s high-level review), this would likely impact both the billing codes and effort required for CP interpretation. It is worth questioning whether many CP interpretations need to be requested by an ordering provider specifically, if they can be provided at scale anyway by using GAI. Additionally, GAI-based approaches to billing are likely to provide opportunities to harmonize the billing process to enable consistent collection practices for both small-scale laboratories, large-scale laboratories, and insurers.

Translational Research Activities in Clinical Chemistry and Microbiology

Clinical chemistry and microbiology practices leverage and encompass various translational aspects of clinical science, basic chemistry research, and molecular biology as a key to most—if not all—translational research activity in these subspecialties. For example, investigating microbial resistance in septic arthritis may require significant medical record abstraction to determine patient-specific factors. However, it may also require simultaneous genomic or transcriptomic profiling of microorganisms and determination of their mutation status, differential expression, and mechanisms of evolution and resistance in addition to routine morphologic assessment of cultures and colonies. Language models can deliver unique value within all these domains. Starting with medical record abstraction, language models are adept at regularizing the variance in note structure and syntax and thus are great tools in accessing medical record data including free text, synthesizing this information, and creating derivative assets such as summaries or labels that might be used for identifying specific translational research cases. For example, a translational research project supported by GAI may query a data warehouse to extract multiple channels like laboratory values, vitals, diagnosis and procedure codes, and progress notes. Language models can ingest and combine all these streams—admittedly with varying performance—and extract, synthesize, and summarize concepts like clinical course, deterioration, antibiotic regimen, or infection, and can produce corresponding labels and summaries. Manual retrieval, annotation, and synthesis of such data by a human being would represent a tedious task creating barriers for translational research activities. Similar strengths could be leveraged in the context of clinical chemistry to advance translational research. Availability of such GAI tools can reduce the barrier for CP residents and new faculty to become involved in translational research.

Administrative Support for Clinical Chemistry and Microbiology

GAI is adept at transforming various input data streams into flexible, customizable output/reporting formats and this has valuable applications in laboratory management. For example, the creation of regular quality control reports and summaries, which require human effort in data collection, curation, combination, report drafting, and review, may be performed by GAI tools. These tools can directly work from inputs like maintenance records, temperature logs, and assay result values generated by an automated instrument. However, one must remember that GAI currently is imperfect and—at least at present—cannot be relied upon to autonomously create such reports. However, it can automate many of the preprocessing steps relatively robustly. This allows human expertise to focus on higher-level report review, leading to the dual benefit of intensely reviewed and higher-quality reports (the “human-in-the-loop” model). Quality control is only one aspect of laboratory management. There are numerous other CP report streams (eg, volume and utilization, instrument maintenance and downtimes, and employee performance) that can leverage GAI models to accelerate data collection, transformation, and report generation processes within CP.

We propose that GAI also has another key role to play in the laboratory: operational management. Specifically, generative tools may be potentially used to edit and update SOPs, a key activity within a CP laboratory. GAI models can suggest and integrate changes suggested by human experts with prompting techniques. This has the potential to save clinical pathologists and staff from the burden of comprehensively recomposing SOP documents. Language models can also be used to review SOPs and highlight shortcomings like areas that require regulation-appropriate clarifications, potential areas of contradiction or redundancy, or key questions that remain unaddressed by current SOPs.

Furthermore, generative tools can use SOP documents as an in-house knowledge base upon which to perform RAG, allowing laboratory personnel to periodically ask questions about procedures without having to search for answers manually. For example, a user may be able to simply ask, “What volume needs to be plated from a blood culture bottle on an agar plate per lab procedures?” and receive a direct, LLM-generated answer. Adjacent tools such as voice-to-text and text-to-voice also allow for these interactions to present as a vocal query and response flows akin to using Siri (Apple Inc, Cupertino, California), Alexa (Amazon Inc, Seattle, Washington), or Google Assistant (Alphabet Inc, Mountain View, California). Making SOPs easier to query and more adaptable to workflow will likely result in using them more often and may enhance the overall adherence to different protocols established within the laboratory. Additionally, such platforms can become invaluable in inspection preparedness wherein LLMs can assess whether policies and procedures are up to par with the College of American Pathologists (CAP) or other inspection agency requirements. Lastly, LLM tools can also be used to streamline the human resources functions of the laboratory, including creation of job descriptions and screening of applications for positions. These operational enhancements are critical as we presently face a worsening shortage of well-trained laboratory technicians and mounting workload pressures that can lead to increased human errors and burnout.9  AI may help us to decrease this pressure and enable less well-trained technicians to function at a higher level, enhancing the resilience and capability of laboratories everywhere.

Data Mining Applications in Molecular Pathology and Hematopathology

Molecular pathology is among the main forces moving health care toward an individualized “precision medicine” paradigm, and there is great interest in enhancing access to molecular diagnostics and their associated interpretations. However, this is bottlenecked by the fact that the primary task of molecular pathology is to decide the clinical significance of molecular variants. This requires both traversing large databases of variant annotations and being responsive to a continuous stream of evolving research that may alter those annotations. This knowledge explosion is challenging for even experienced and focused molecular pathologists and is untenable for nonspecialists to keep abreast of. Generative language models can offer specific value here in their ability to synthesize data from various sources. Many popular variant annotation databases such as ClinVar have documented APIs that a language model could use to retrieve relevant annotations. Equally important, most research is still published as PDF documents written as narrative journal articles with various journal-specific stylistic and formatting idiosyncrasies. Being able to process and retrieve relevant information from heterogeneous groups of PDF documents as well as APIs allows language prompts for GAI to serve as a single abstraction over these various data sources and to make it possible to mine these data at scale. AI models trained on large corpora of clinical text can identify and extract salient findings, such as symptoms, clinical findings, laboratory results, radiology findings, and treatment responses.10  By generating concise summaries of the clinical records, GAI can help molecular pathologists and hematopathologists quickly grasp essential information, saving time and reducing the risk of overlooking crucial details. These summaries can also populate structured reports or databases, facilitating data analysis and research.11 

Specific to hematopathology more so than elsewhere in CP is the criticality of imaging such as blood smears, bone marrow biopsies (which may be in the form of whole slide images), and cytogenetics along with the concurrent molecular results. These can be incorporated in a few ways. First, the principles of vector representation and search can be applied to image content in similar ways as they are applied to language content. Although such approaches are used in general LLMs (eg, GPT4-v: GPT4-vision, GPT-4o), this is far less common at this time in the health care domain. Such use cases will undoubtedly grow in the future, allowing for language models to retrieve cases, articles, guidelines, or other documents by the similarity of imaging data (eg, finding articles with a slide that looks like a given patient’s biopsy specimen). Perhaps more practical, we see language models becoming increasingly multimodal in that they can receive image inputs and linguistically describe them. This would allow a multimodal model to receive both textual medical record data and image data and extract descriptive terms, pathologic entities, and related concepts from imaging data. As we will discuss shortly, this also allows multimodal LLM/GAI models to compare between image and text data, such as a marrow biopsy image and its corresponding report. By leveraging language-aligned multimodal models, current and future AI systems can understand and process natural language queries and retrieve relevant images or entities, based on the search criteria.12  For example, a hematopathologist can input a description of a particular cell morphology or a specific diagnostic finding, and the AI model can search through vast image databases or image atlases to identify similar cases for comparison. This functionality can greatly assist in diagnostic decision-making, allowing hematopathologists to quickly access relevant examples and references. Additionally, this search capability can facilitate research by identifying cases with specific characteristics or rare entities, aiding in the discovery of new insights and patterns.13 

Decision Support Processes in Molecular Pathology and Hematopathology

Much in the same way that GAI can aggregate various aspects of the medical record and ancillary resources into a unified microbiology report, this is likely equally valuable for molecular pathology. The sheer amount of potentially relevant mutational variants and annotations makes generative-AI–enabled decision support seemingly inevitable for molecular pathology reports for final pathologist review in the near future. The content of genetic variant annotations is a moving target and subject to change with emerging new research. Generative tools offer an efficient way to accommodate for the fast-moving nature of this subspecialty: instead of having to produce and migrate to new annotation databases regularly, we may instead be able to retrieve the present and past annotations of each molecular pathology variant (and supporting research data) and have a GAI model adjudicate the impact of the research and amend variant annotations appropriately. Relevant to hematopathology, studies have demonstrated the ability of GAI to compose medical imaging reports in the domain of radiology, and this work is likely being done at present in pathology.14  Generative-AI–supported hematopathology reports could incorporate key morphologic findings, patient medical history, genomics, and relevant guidelines into the generation of a final integrated report.15  Decision support in hematopathology often involves primary analysis of various complex patient-specific inputs, such as genomic sequencing data, flow cytometric parameters, fluorescence in situ hybridization assays, transcriptional pathway analyses, and histomorphology assessments. Separate model interactions can be composed into pipeline workflows to automate and standardize many of these steps.

By training AI models on vast amounts of historical report data, the system can learn to generate coherent and accurate reports from input parameters obtained from other laboratory data as well. For example, in flow cytometry, AI models can analyze the raw data, identify cell populations, and generate a comprehensive report describing the findings.16  Similar applications can be envisioned for esoteric coagulation profiles, hemoglobinopathy and red cell disorder testing, and molecular testing, to name a few. This kind of automation can save hematopathologists considerable time and effort, allowing them to focus on more complex cases and interpretation.

More promising is the opportunity for GAI through fine-tuning and RAG-based approaches to map the classification criteria for hematologic malignancies across different classification systems, such as the World Health Organization (WHO) revised 4th edition,17  WHO 5th edition,18  and the International Consensus Classification of myeloid and lymphoid neoplasms.19  By leveraging natural language processing and machine learning algorithms, AI models can analyze and interpret complex diagnostic findings and align them to entities described in various classification systems. This mapping process can help to harmonize various classification schemes, making it easier for hematopathologists to navigate and apply the diagnostic criteria consistently. AI-assisted mapping can also highlight discrepancies or updates among the different classifications, ensuring the most current and accurate classification guidelines are followed.20 

GAI can play a crucial role in analyzing and interpreting genomic data in hematopathology. With the increasing use of next-generation sequencing techniques, molecular pathologists face the challenge of accurately calling and curating genomic variants.21  AI models can be trained on large, annotated data sets to automatically identify and classify variants by their pathogenicity and clinical significance. These models can consider factors such as variant frequency, functional impact, and previously reported associations with hematologic disorders. This can lead to more precise diagnostic and prognostic assessments and the identification of potential therapeutic targets.22  Finally, it is critical to mention that AI—and GAI in particular—also continues to make significant advancements in basic science and touches areas such as DNA and protein design, structural determination, and molecular docking. Many of these tools will have implications for research (discussed below). It is quite likely that GAI/LLM-supported algorithms will be implemented in clinical molecular pathology workflows, since they can be used efficiently to generate patient-specific insights about the presence and consequences of a molecular lesion within each case.

Translational Research Activities in Molecular Pathology and Hematopathology

GAI and its core components can offer value to molecular research in many ways. Among them, 2 categories are exceptionally promising and proximal: database search and molecular modeling. Regarding the former, we have previously discussed the concepts of vector-based search. These same concepts can be applied to improve the usability and yield of all databases, including key research repositories such as The Cancer Genome Atlas. This would enable a more flexible search by using more nuanced predicates such as morphologic terms, outcomes, or even whole clinical vignettes accompanied by a prompt to “identify cases that are similar to the current molecular profile identified in patient X currently under evaluation.” While text is the most feasible modality to conduct this search, one could also use imaging, allowing imaging data searches to be either primary or a complementary avenue to identify cases from research repositories (public and local patient databases).

Meanwhile, genomics has experienced significant growth in its use of GAI, especially in language models adapted to read and write DNA, RNA, and proteins. While a full review of these tools is beyond the scope of this article, it is worth noting that the advancement of these tools also entails how DNA, RNA, and proteins are represented. Specifically, these tools create vector representations of biological entities—known as “embeddings”—much in the same way that text is converted into vector representations (as described previously in this article). This also means that comparison and clustering approaches that leverage embeddings become more robust. This would enhance the ability to detect genomic, transcriptomic, and proteomic signatures related to clinical outcomes, especially when those signatures are subtle or multifactorial.

Molecular modeling can take several forms, many of which are currently being impacted by GAI. Vectorization has played a key role in the success of modern structural determination tools such as AlphaFold and ESMFold, and the ability to derive 3-dimensional structures for proteins is a common prerequisite for other types of mechanistic investigation. AlphaFold is estimated to contribute between 3% and 13% of human disease-relevant structures and almost 30% of human protein structures overall.23  Beyond structural determination, true protein generation has been consistently demonstrated to include the design of novel antibodies and even CRISPR systems.24  Diffusion models, which are another architectural variant commonly used for image generation and exemplified by products such as MidJourney and Stable Diffusion DreamBooth, are also used to empower both protein generation in the case of Microsoft Research’s EvoDiff and protein-ligand docking in the case of DiffDock and more recently AlphaFold 3.25  We also see broadening support of these tools by large companies such as Nvidia, which has incorporated folding, docking, and design into their BioNemo AI Cloud service. Both the increased accessibility and power of these tools make it clear that they will play an increasingly more prominent role in molecular hypothesis generation, in our understanding of pathogenesis, and in more clinically adjacent tasks such as drug selection and clinical trial matching.

Administrative Support for Molecular Pathology and Hematopathology

Molecular pathology and hematopathology have unique administrative challenges that could be alleviated by using GAI. Routine curation of variant annotations and cross-referencing them with evolving guidelines to screen for ambiguity, contradiction, or content gaps are exclusively human tasks until very recently. Now it is possible to use GAI to perform this work at scale, saving potentially hundreds of hours of pathologist/variant curation scientist time. In hematopathology, the workflow commonly involves composing complex, integrated reports based upon a variety of patient data points. As many of these images are currently digitized—and many more will be in the future—there is a valuable opportunity in hematopathology to scale tasks like case audits and reviews using GAI. One could envision the use of GAI to compare image content, such as a bone marrow biopsy, to its corresponding textual report and to screen for disagreements between 2 data sources. This would very likely benefit both patient safety and clinician performance as feedback about report accuracy could be created more or less immediately on every case with the help of GAI-supported approaches.

The potential of GAI is immense, but this same technology introduces novel failure modes that must be understood and closely monitored.26  It is perhaps fair to say that we are just now beginning to understand the various pitfalls associated with this technology and we will continue to discover new potential issues and devise new safeguards and practices for the foreseeable future. GAI-driven failures are of particular concern within the practice of health care. One such common failure mode is that of confabulations (colloquially known as “hallucinations”) in which factually inaccurate elements are present within generated content (including health care data).

Generative models are so named because they are stochastic random sentence constructors, and it is the job of GAI to “generate” data as requested. In the absence of adequately trained networks, or adequately constrained generation, these tools tend to fill gaps by “hallucinating” new sequences of words. This problem is not unique to CP or even to health care. However, the consequences of said hallucinations are exceptionally drastic in the health care setting, since such false content could adversely impact the clinical decision making and patient care process. For example, test results may be falsely reported, or disease associations may be falsely claimed in a generated test interpretation, or references may be cited that do not, in fact, exist.27  Because of such issues, a common solution is to propose steps for human verification (“human-in-the-loop”) when using generative workflows wherein generated outputs are periodically evaluated by humans with expert domain knowledge.28  Such steps will also help to mitigate “automation bias” or overreliance on automated recommendations and generated content and a corresponding absence of critical thinking and evaluation.

Thus, careful and systematic planning of prompt engineering and GAI use cases is essential when planning use in a health care setting (including CP). Although cautious planning with adequate safeguards has been shown to enable foundation models to achieve domain-specific medical performance on par with other medically tuned models,29  evaluating medical domain performance often relies on medical Q&A data sets, which may fail to mimic the complexity of real-world scenarios.30  As a general observation, it is helpful to separate the linguistic capabilities of language models from their knowledge or content expertise. This is one of the reasons why context augmentation approaches such as RAG are so predominant in real-world use cases. For example, the primary benefit of foundation models like GPT-4 and Claude 3 having been trained on so much information is that they can readily understand the conceptual nuances of both instructions and supplied content. However, it does not currently seem feasible to have these models deliver precise, specialized outputs such as medical interpretation without context augmentation, and this is obviously true for patient-specific workflows that rely heavily on private medical record data. Instead, the most productive areas of development effort—outside of core model design questions such as parameter efficiency and reasoning—are in new approaches to context augmentation either via retrieval or fine-tuning. A particularly promising direction for this is demonstrated in projects like Microsoft GraphRAG, which aim to allow both content retrieval from graph databases and exploration across relationships within those databases. This allows generative models to understand highly interrelated content such as medical record data.31 

An adjacent issue with generative models is the familiar challenge of bias where such models may generate certain kinds of outputs as a function of input data that are not supposed to influence the generated response directly. For example, generative language models have been shown to exhibit racial and gender bias in numerous capacities such as evaluating applicant CVs in job applications.32  In addition, monitoring and remediating biases lack well-defined best practices, and this problem becomes insidious and a significant source of liability for the use of GAI. This is also true of visual generative workflows, as we have recently seen with Google’s launch of Gemini. It is important to note that bias is almost never intentionally engineered into models. Still, the lesson with Gemini and with other such examples is that even despite the best intentions to tune models that are fair and unbiased, this very same effort can still induce biases. Because of the complex nature of bias and the fact that generative workflows and their manifest biases are tremendously diverse, the only practical approach to mitigation seems to be implementing evaluation pipelines to systematically probe bias in various models and prompts, including within CP. It is likely that as GAI becomes more intelligent, such pipelines will also have to become increasingly more adept. In the abstract, evaluating an intelligent model for bias is not unlike evaluating an intelligent human for bias, commonly done via interview and interaction, coupled with an accounting of observed behavior.

Speaking more specifically to CP, the topic of hallucinations and bias has not been as comprehensively explored as with other areas of health care. However, we can anticipate a few specific pitfalls that are likely to be important in CP. First, generative language models often struggle with numerical operations and comparisons, which may impair the reliability of assessing trends and shifts in laboratory values in CP. Second, generative language models may hallucinate details in a clinical interpretation such as certain patient comorbidities. This is also an area where model bias may be responsible for some groups having disproportionately high frequencies of hallucination. This would of course negatively impact the integrity of a clinical interpretation, but it also raises the question of how best to combat this, as forcing pathologists to double check every mentioned comorbidity would constitute a significant imposition on time and effort. Importantly, there are broad efforts within health care to establish much-needed practices around AI monitoring and assurance. Examples include Microsoft’s Trustworthy and Responsible AI Network (TRAIN); the Coalition for Health AI (CHAI), which has proposed a national collection of AI “assurance labs”33 ; and Stanford’s Holistic Evaluation of Language Models (HELM)34  framework. Some practical measures for monitoring GAI bias include comparison of generated content to human-generated ground truth (where available) or using GAI tools to automatically assess the outputs from other GAI tools (eg, prompting one tool to evaluate biases and other issues in the output from another tool).

Another popular approach is “adversarial testing” or “red-teaming” wherein users or other GAI tools create various inputs—including harmful, biased, or problematic inputs—to evaluate model response.35  There remains a need to know more clearly how health care at large, and CP specifically, should conduct monitoring and measurement of GAI and what tools must be available to do this effectively. In any case, it does seem clear that any useful approach to monitoring GAI bias and hallucination will require that both inputs and outputs to these models be recorded and that quantitative measures of content, semantics, and disagreement with ground truth content must be routinely measured. As a final point, monitoring AI bias will contain participation from human validators and auditors and, as such, we must also be wary of “automation bias,” which represents the tendency of humans to unquestioningly accept computer-generated information and content. In a world where GAI outputs are increasingly present, clinical pathologists will also have to be increasingly vigilant when asked to perform adjudication of generated content.

Regardless of whether hallucinations and bias remain critical long-term issues, significant infrastructural requirements must be met to enable GAI to bring value to CP workflows. These requirements surround interoperability with EMRs, as rendering a clinical test interpretation often requires abstracting many pieces of historical and diagnostic information from the EMR. Human clinicians are efficient at this extraction task and can obtain lots of relevant data by viewing the graphical user interface of the EMR itself. However, the current state of generative tools would require a programmatic connection to EMR data, whether in a data warehouse (probably via structured query language) or in real time (probably via FHIR APIs). This is not a trivial task and further requires the development of plugin-like components that would inform a generative model as to which additional data points are needed and how to obtain them.

There are important questions about the broader impact of using GAI in CP regarding reimbursement. These questions are especially acute in the domain of clinical test interpretation as traditionally these activities are valuated in relative value units, which embed an assumption about the amount of time and effort clinical activities require. It seems very likely that an increased adoption of AI in the CP workflow—even if such adoption does not result in the end-to-end automation of any task—will reduce the time and effort required for clinical tasks overall and, thus, diminished allocated relative value units. This may induce a trend of reduced compensation for clinical tasks, which would need to be met with an increase in case volume to maintain pathologist compensation at current levels. It is also very possible that, if GAI does not quickly achieve a level of performance that allows it to operate largely unsupervised, clinical pathologist activities may just shift, being responsible for many more model supervision and output verification tasks in exchange for any decrease in interpretive thought work. This is an evolving area that requires close attention by pathologists and professional pathology societies.

Among the myriad use cases for GAI in pathology are those of trainee education. This topic is covered in depth in the accompanying GAI article on education in this series by Cecchini et al. Herein, we will comment on specific applications for CP. Clinical pathology differs from anatomic pathology in the breadth of information and nuances of a CP vignette. Typically, there are limited test materials or clinical scenarios in CP. GAI is capable of building on existing clinical and medical knowledge to create novel questions and clinical scenarios at scale.36  GAI-enabled test content can be dynamic as it can rewrite and recreate testing content to make it harder or easier or to incorporate new details as necessary. Thus, engaging educational material generated by GAI approaches may garner more attention and more engagement by residents.

Another use case for GAI in CP education is in explaining subject matter to learners by simplifying and summarizing key CP concepts. GAI offers synergies with CP question banks and vignettes, enabling dynamic educational experiences. GAI has the ability to behave like a tutor, personalizing content and expounding on material in a way that is directly responsive to user needs. This is especially helpful in CP where the background knowledge needed is vast, spanning genomics to economics. Thus, GAI is best suited to create pathology educational models where learners can chart their own personalized paths, interrogating specific topics more than others, focusing on personal knowledge gaps.

Professional certification in CP can benefit from GAI capabilities. Traditional, long board re-examinations have given way to frequent, shorter question sets as with CertLINK offered by the American Board of Pathology. The next logical evolution would be development of dynamic, real-time GAI chat agents focused on topical areas of CP. This could provide 2 key benefits: (1) it could make the recertification process more meaningful owing to adaptability and scalability to new knowledge and (2) accessibility would help to educate pathologists of the latest developments, improving overall knowledge and skill sets.

We propose CP use cases in all subspecialties would benefit from dynamic educational content, including laboratory management. This topic focuses less on fact recall and more on real-life operational scenarios in a CP laboratory (eg, assay validation, quality control, clinical consultation, or budgeting). GAI can create role play–like experiences allowing learners to experience evolving scenarios in a CP laboratory. This can lead to an engaging laboratory management curriculum that would better equip clinical pathologists. As an example, the authors have created a custom GPT based on clinical laboratory reference texts that can produce interactive scenarios. One such example of a conversation with a GPT (using OpenAI’s ChatGPT) is included in the supplemental digital content, available at https://meridian.allenpress.com/aplm in the February 2025 table of contents.

There is little doubt that GAI will have an outsized impact in health care including CP. However, these new capabilities of GAI bring new risks and liabilities. It is critically important for laboratories to chart a roadmap for the use of GAI, leveraging low-risk and low-complexity use-cases initially. A key question central to many of the emerging technologies such as GAI is, “Where does one begin?” In the Table, we have consolidated various potential ideas/opportunities that exist within the specialty of CP. This article highlights the dearth of existing AI use cases and also the myriad opportunities for GAI in CP. Ideally, clinical laboratories must begin with GAI tasks that do not directly impact clinical decision-making and are easily audited by human experts. Realistically, this means focusing on clerical workflows of CP such as SOP documentation, and leveraging GAI for tasks like writing or improving existing documentation. For example, one early project might be using an LLM to identify areas within an existing SOP to identify language that is unclear, contradictory, redundant, or inaccurate. Such simple use cases can often be done asynchronously, allowing users to focus on learning GAI-specific techniques such as prompting and response analysis first.

Such a methodical and incremental approach also allows users to learn about the field of GAI and its enabling capabilities. Early projects in a laboratory could be focused specifically on language model interactions, rather than application development. Clinical laboratories can be specific in acquiring their necessary initial AI skills: prompt engineering, development against language model APIs, vectorization, and retrieval are some such examples. Critically and as with any AI application, laboratories should have a robust plan to evaluate model outputs, design good evaluation studies, and define and monitor the operational and (eventually) clinical impact of a new AI tool a priori. Most laboratories should focus on acquiring these capabilities first by picking use cases that align with these laboratory goals rather than expending time, effort, and money on complex systems design or commercial solutions.

Workflow Enhancement Opportunities for Generative Artificial Intelligence (GAI) in Clinical Pathology by Specialty

Workflow Enhancement Opportunities for Generative Artificial Intelligence (GAI) in Clinical Pathology by Specialty
Workflow Enhancement Opportunities for Generative Artificial Intelligence (GAI) in Clinical Pathology by Specialty

Figure 2 depicts a rough ordering of potential use cases in CP in terms of the difficulty of the use cases and the breadth of required technologies helpful to clinical laboratories planning for the use of GAI and LLMs. GAI—and prediction and automation more broadly—is enticing to institutional leadership mesmerized by the AI sales pitch. Laboratory leadership is often left out of the initial decisions but are saddled with an unworkable AI solution as an afterthought. Such an approach leading to a series of failed AI projects can have an unwanted impact on the clinical laboratory members. Thus, engagement with all stakeholders and garnering support is key. One should lead with the value proposition that an AI solution can bring and not with the technology itself. Generally, it should be assumed that leaders want to invest in achieving a better state of the operation (eg, better outcomes, more revenue, less cost) and not in any technology for its own sake.

Figure 2.

Generative artificial intelligence use case difficulty level and enabling technologies. This figure depicts an ordering of use cases according to their difficulty and complexity/scope of the use case’s intended solution. The relevant technical areas necessary for implementation are also shown. Green boxes indicate that the intersection of a use case and technical capability is likely required for most use cases. Abbreviations: AI, artificial intelligence; API, application programming interface; SOP, standard operating procedure.

Figure 2.

Generative artificial intelligence use case difficulty level and enabling technologies. This figure depicts an ordering of use cases according to their difficulty and complexity/scope of the use case’s intended solution. The relevant technical areas necessary for implementation are also shown. Green boxes indicate that the intersection of a use case and technical capability is likely required for most use cases. Abbreviations: AI, artificial intelligence; API, application programming interface; SOP, standard operating procedure.

Close modal

Much of the discussion around GAI in health care—including the discussion within this article—is aimed at empowering laboratories to embark on their own unique journey to AI capabilities. There is tremendous overlap in what we do in CP collectively and it would be counterproductive not to highlight the fact that there is an urgent need for a broader, national discussion benefitting all laboratories when it comes to technologies such as GAI. Such discussions (similar to those highlighted in this article) must focus on common use cases, best practices, principles of application development and validation, approaches to risk and liability management, and compensation in the development of AI.

There is also a significant need for advocacy in this area at a pathologist, department, institutional, and professional level. Pathologist-driven advocacy efforts help institutions understand the need for a collaborative effort in utilization of AI-enabled methods and data. Professional society level advocacy (such as the CAP) is needed to impress upon payers that the efficiency gains produced by technologies such as GAI are not free and require significant time and effort by experts. The role of clinical pathologists in the future will grow to encompass AI system oversight alongside laboratory management and individual patient case sign-out. Future advocacy efforts must focus on these efforts (and skills) of clinical pathologists for the need to be valuated (and compensated) appropriately.

Looking to the future, CP occupies an incredibly strategic place in driving health care services and its associated revenue. As a platform that enables diagnosis, laboratories often provide the initial, quantitative metrics of a patient’s disease condition. Moreover, laboratory data are exceptionally broad and unusually well structured as compared to other modalities of health care data (eg, clinical notes). As GAI—and AI more broadly—continues to mature within health care and CP, we are likely to see interpretive guidance becoming more scalable and offered more frequently. As a result, we may see CP as a primary source of rich interpretive outputs. CP-generated data can form the primary basis for an AI-supported “copilot” for primary care and mid-level practitioners. This would place CP at the “top of the funnel” to identify disease and even guiding or allocating patients into subsequent downstream care pathways. From a patient care standpoint, this would improve access to expertise and enhance the quality of patient experience. From a hospital administration standpoint, this would mean more efficient identification of diagnosis codes, comprehensive tabulations of comorbidities, complexity-based revenues, and efficient deployment of clinical services by leveraging AI (and GAI) technologies using CP databases.

The future for GAI is exciting and incredibly bright as we are witnessing the beginnings of scalable and programmable cognitive work. We are also seeing rapid AI technology advancement such as information retrieval, guardrails, context length, multimodal inputs, and the accuracy of generated responses. We may come to see health care infused with GAI guidance in progress notes, order sets, handoffs, test interpretations, and many other instances limited only by our imagination. In CP specifically, GAI has the potential of elevating laboratory testing beyond the à la carte, demand-driven menu of biological assays to a cognitive utility that permeates the medical record currently and directly impacts the entire spectrum of patient care. In short, CP is uniquely well positioned to become the core “copilot” within the health care workflow at large.

As a final reflection, technologic revolutions like the one we are witnessing with GAI follow a typical pattern of proceeding “gradually, then suddenly.” Thus, it is critical for laboratories and pathologists to make constant progress in this domain and keep pace with the field. Laboratories (and pathologists) must inventory key needs and use cases in the laboratory to apply GAI technologies in improving laboratory workflows. Regardless of the outcomes initially, such efforts will prove to be educational in the long-term, revealing key strengths and weaknesses in skill sets, technology, and processes inside a laboratory. Iterative efforts can then be aimed at strengthening the AI shortcomings. However, ignorance and nonparticipation in AI technologies is not an option for laboratories (and pathologists) to remain competitive in the marketplace. If we collectively maintain an open dialogue about challenges and successes while adapting this transformational technology in a collaborative manner, there will be ample room for everyone to be successful. In the famous words of Peter Drucker, “The best way to predict the future is to create it.”37 

We would like to thank the staff from the College of American Pathologists (CAP), Digital Pathology Association (DPA), and Association of Pathology Informatics (API) for their assistance in coordinating this cross-organizational, multi-authored effort. The authors take full responsibility for the content of this manuscript.

1.
LeCun
Y,
Bengio
Y,
Hinton
G.
Deep learning
.
Nature
.
2015
;
521
(
7553
):
436
444
.
2.
Creswell
A,
White
T,
Dumoulin
V,
Arulkumaran
K,
Sengupta
B,
Bharath
AA.
Generative adversarial networks: an overview
.
IEEE Signal Process Mag
.
2018
;
35
(
1
):
53
65
.
3.
Vaswani
A,
Shazeer
N,
Parmar
N,
et al.
Attention is all you need
. Preprint. Posted online August 1, 2023. arXiv.
4.
Rodriguez
JPM,
Rodriguez
R,
Silva
VWK,
et al.
Artificial intelligence as a tool for diagnosis in digital pathology whole slide images: a systematic review
.
J Pathol Inform
.
2022
;
13
:
100138
.
5.
Patil
R,
Boit
S,
Gudivada
V,
Nandigam
J.
A survey of text representation and embedding techniques in NLP
.
IEEE Access
.
2023
;
11
:
36120
36146
.
6.
Gao
Y,
Xiong
Y,
Gao
X,
et al.
Retrieval-augmented generation for large language models: a survey
. Preprint. Posted online March 27, 2024. arXiv.
7.
Zakka
C,
Chaurasia
A,
Shad
R,
et al.
Almanac: retrieval-augmented language models for clinical medicine
. Preprint. Posted online May 2, 2023. Res Sq. rs.3.rs-2883198.
8.
Sutton
RT,
Pincock
D,
Baumgart
DC,
Sadowski
DC,
Fedorak
RN,
Kroeker
KI.
An overview of clinical decision support systems: benefits, risks, and strategies for success
.
NPJ Digit Med
.
2020
;
3
:
17
.
9.
Halstead
DC,
Sautter
RL.
Literature review on how we can address medical laboratory scientist staffing shortages
.
Lab Med
.
2023
;
54
:
1
. https://doi.org/10.1093/labmed/lmac090
10.
Ge
J,
Li
M,
Delk
MB,
Lai
JC.
A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record
. Preprint. Posted online September 4, 2023. MedRxiv. 2023.08.31.23294924.
11.
Sheikhalishahi
S,
Miotto
R,
Dudley
JT,
Lavelli
A,
Rinaldi
F,
Osmani
V.
Natural language processing of clinical notes on chronic diseases: systematic review
.
JMIR Med Inform
.
2019
;
7
(
2
):
e12239
.
12.
Tayebi
RM,
Mu
Y,
Dehkharghanian
T,
et al.
Automated bone marrow cytology using deep learning to generate a histogram of cell types
.
Commun Med
.
2022
;
2
(
1
):
1
14
.
13.
Tian
D,
Jiang
S,
Zhang
L,
Lu
X,
Xu
Y.
The role of large language models in medical image processing: a narrative review
.
Quant Imaging Med Surg
.
2024
;
14
(
1
):
1108
1121
.
14.
Huang
J,
Neill
L,
Wittbrodt
M,
et al.
Generative artificial intelligence for chest radiograph interpretation in the emergency department
.
JAMA Netw Open
.
2023
;
6
(
10
):
e2336100
.
15.
Waqas
A,
Bui
MM,
Glassy
EF,
et al.
Revolutionizing digital pathology with the power of generative artificial intelligence and foundation models
.
Lab Investig J Tech Methods Pathol
.
2023
;
103
(
11
):
100255
.
16.
Seheult
JN,
Weybright
MJ,
Jevremovic
D,
Shi
M,
Olteanu
H,
Horna
P.
Computational flow cytometry accurately identifies Sezary cells based on simplified aberrancy and clonality features
.
J Invest Dermatol
.
2024
;
144
(
7
):
1590
1599.e3
.
17.
Swerdlow
SH,
Campo
E,
Harris
N,
et al.
WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (Revised 4th Edition)
. Revised 4th ed.
Lyon, France
:
IARC Press
;
2017
. World Health Organization Classification of Tumours; vol
2
.
18.
Li
W.
The 5th edition of the World Health Organization Classification of Hematolymphoid Tumors. In:
Li
W
, ed. Leukemia. Exon Publications;
2022
. http://www.ncbi.nlm.nih.gov/books/NBK586208/. Accessed April 28, 2024.
19.
Gianelli
U,
Thiele
J,
Orazi
A,
et al.
International Consensus Classification of myeloid and lymphoid neoplasms: myeloproliferative neoplasms
.
Virchows Arch Int J Pathol
.
2023
;
482
(
1
):
53
68
.
20.
Arber
DA,
Campo
E,
Jaffe
ES.
Advances in the classification of myeloid and lymphoid neoplasms
.
Virchows Arch Int J Pathol
.
2023
;
482
(
1
):
1
9
.
21.
Cho
YU.
The role of next-generation sequencing in hematologic malignancies
.
Blood Res
.
2024
;
59
(
1
):
11
.
22.
Chen
ZH,
Lin
L,
Wu
CF,
Li
CF,
Xu
RH,
Sun
Y.
Artificial intelligence for assisting cancer diagnosis and treatment in the era of precision medicine
.
Cancer Commun (Lond)
.
2021
;
41
(
11
):
1100
1115
.
23.
Porta-Pardo
E,
Ruiz-Serra
V,
Valentini
S,
Valencia
A.
The structural coverage of the human proteome before and after AlphaFold
.
PLoS Comput Biol
.
2022
;
18
(
1
):
e1009818
.
24.
Ruffolo
JA,
Nayfach
S,
Gallagher
J,
et al.
Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences
. Preprint. Posted online April 22, 2024. BioRxiv. 2024.04.22.590591.
25.
Abramson
J,
Adler
J,
Dunger
J,
et al.
Accurate structure prediction of biomolecular interactions with AlphaFold 3
.
Nature
.
2024
;
630
(
8016
):
493
500
.
26.
Lin
SC,
Gao
L,
Oguz
B,
et al.
FLAME: factuality-aware alignment for large language models
. Preprint. Posted online May 2, 2024. arXiv.
27.
Wu
K,
Wu
E,
Cassasola
A,
et al.
How well do LLMs cite relevant medical references: an evaluation framework and analyses
. Preprint. Posted online February 2, 2024. arXiv.
28.
Sun
X,
Bosch
JA,
De Wit
J,
Krahmer
E.
Human-in-the-Loop interaction for continuously improving generative model in conversational agent for behavioral intervention
. In:
IUI ’23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces. Association for Computing Machinery
;
2023
:
99
-
101
.
29.
Nori
H,
Lee
YT,
Zhang
S,
et al.
Can generalist foundation models outcompete special-purpose tuning: case study in medicine
. Preprint. Posted online November 27, 2023. arXiv. Accessed April 29, 2024. doi:/
30.
Chen
H,
Fang
Z,
Singla
Y,
Dredze
M.
Benchmarking large language models on answering and explaining challenging medical questions
.
Preprint. Posted online March 13, 2024. arXiv
.
31.
Edge
D,
Trinh
H,
Cheng
N,
et al.
From local to global: a Graph RAG approach to query-focused summarization
. Preprint. Posted online April 24, 2024. Accessed April 29, 2024. arXiv. http://arxiv.org/abs/2404.16130
32.
Glazko
K,
Mohammed
Y,
Kosa
B,
Potluri
V,
Mankoff
J.
Identifying and improving disability bias in GAI-based resume screening
. Preprint. Posted online January 28, 2024. arXiv.
33.
Shah
NH,
Halamka
JD,
Saria
S,
et al.
A nationwide network of health AI assurance laboratories
.
JAMA
.
2024
;
331
(
3
):
245
249
.
34.
Liang
P,
Bommasani
R,
Lee
T,
et al.
Holistic evaluation of language models
. Preprint. Posted online October 1, 2023. arXiv.
35.
Radharapu
B,
Robinson
K,
Aroyo
L,
Lahoti
P.
AART: AI-Assisted Red-Teaming with diverse data generation for new LLM-powered applications
. Preprint. Posted online November 29, 2023. Accessed May 17, 2024. arXiv.
36.
Comstock
K.
Innovating education: creating custom ChatGPT solutions for enhanced teaching and learning experiences. In:
Society for Information Technology & Teacher Education International Conference
.
Waynesville, NC
:
Association for the Advancement of Computing in Education
;
2024
:
719
727
. https://www.learntechlib.org/primary/p/224029/. Accessed May 19, 2024.
37.
Cohen
WA.
Drucker on Leadership: New Lessons From the Father of Modern Management
.
New York
:
John Wiley and Sons
;
2010
.

Author notes

Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the February 2025 table of contents.

McCaffrey and Gullapalli contributed equally to the paper.

Gullapalli is funded and salary supported through the National Institutes of Health - National Institutes of General Medical Sciences (NIH-NIGMS) grant P20 GM130422 award mechanism.

Competing Interests

The authors have no relevant financial interest in the products or companies described in this article.

Supplementary data