ABSTRACT
Artificial intelligence (AI) is developing and applying computer algorithms that can perform tasks that usually require human intelligence. Machine learning (ML) refers to AI in which the algorithm, based on the input raw data, analyzes features in a separate dataset without explicitly being programmed and delivers a specified classification output. ML has been applied in image discrimination and classification, which has many applications within medicine, mainly when imaging is used. In this review, we will discuss AI applications in gastrointestinal endoscopy and endoscopic image analysis, including esophageal pathologies detection and classification, analysis of upper endoscopic images for the diagnosis of Helicobacter pylori infection, detection and depth assessment of early gastric cancer, and detection of various abnormalities in small-bowel capsule endoscopy images, endoscopic retrograde cholangiography, and endoscopic ultrasonography. The widespread application of AI technologies across multiple aspects of gastrointestinal endoscopy can potentially transform clinical endoscopic practice positively.
INTRODUCTION
Artificial intelligence (AI) is a computer system that enables machines to process and analyze data at or above the human level, and its application in healthcare has rapidly developed over the past few years.[1] Machine learning (ML) is an AI tool that uses algorithms to learn patterns and identify solutions by training on a dataset.[2] The most frequently used traditional ML algorithms in disease diagnosis include support vector machine (SVM), decision tree, k-nearest neighbor, naive Bayes, logistic regression, and AdaBoost. Deep learning (DL) is a subset of ML that uses multiple algorithms to train itself.[3] Convolutional neural network (CNN) is a powerful DL model that uses convolution layers to process and interpret images more effectively.[3] Based on their clinical experience, researchers can use traditional ML algorithms to manually extract data features, such as lesion size and characteristics. However, DL algorithms can automatically extract and recognize data features, making them more effective in reducing the loss of feature information and the workload of doctors.
AI has been used in various gastrointestinal (GI) endoscopic procedures, including esophagogastroduodenoscopy (EGD), gastroscopy, and colonoscopy. Additionally, it has been used in endoscopic imaging, including endoscopic retrograde cholangiography (ERCP), endoscopic ultrasonography (EUS), and small-bowel capsule endoscopy (CE) images. Most GI endoscopy research uses DL algorithms based on CNNs.[1] AI can assist endoscopists by improving the diagnosis of different GI lesions and decreasing the number of inaccurate diagnoses to make endoscopy more accurate and efficient. However, it is essential to note that while AI has great potential in this field, further research and integration into clinical practice are needed before it can be widely adopted. This review aims to explore AI’s screening and diagnostic advances in GI endoscopy. We did not discuss AI in colonoscopy because it has been well studied in the literature and implemented in clinical practice.
METHODS
We searched through Pubmed and Google Scholar databases and chose articles that discussed diagnostic and screening features of AI in endoscopy. We focused on newer studies from 2016 onwards and included older studies only if they were considered milestone studies in AI development in endoscopy. Our search terms in the field of AI in endoscopy included gastroesophageal reflux disease, esophageal squamous cell carcinoma, esophageal adenocarcinoma, Barrett's esophagus, gastric cancer, capsule endoscopy, endoscopic retrograde cholangiopancreatography, and endoscopic ultrasound.
GASTROESOPHAGEAL REFLUX DISEASE
Gastroesophageal reflux disease (GERD) is a chronic gastrointestinal disorder characterized by regurgitating stomach acid into the esophagus, causing distressing symptoms and potential complications.[4] It is classified as reflux esophagitis if mucosal breakdown is observed during EGD or nonerosive reflux disease, which can be diagnosed from symptoms or esophageal pH study.[5,6] The term reflux esophagitis is a relatively new classification that was described as erosive GERD in prior guidelines.[7] EGD is the recommended initial diagnostic study to evaluate patients with persistent GERD symptoms lasting more than 2–4 weeks or if symptoms relapse after discontinuation of proton pump inhibitors.[5] Traditional diagnostic methods for GERD have some limitations, leading to the emergence of AI as a promising tool, particularly in endoscopy. These AI algorithms can improve the diagnosis of GERD by identifying abnormalities that the human eye might miss and improving the accuracy of disease classification.
One study conducted by Pace et al[8] in 2010 used artificial neural network (ANN) analysis to diagnose GERD, based on EGD images, and it was found to be superior to endoscopists. However, classifying GERD into erosive or nonerosive forms proved challenging. Subsequently, in 2021, a new study by Wang et al[9] showed significant progress. They developed a CNN module to interpret traditional endoscopic images and narrow band imaging (NBI) for analysis. They used the Los Angeles classification for disease classification, which successfully classified GERD and demonstrated substantially improved accuracy as compared to classifications made by trainee doctors, 87.9% to 75% and 65.6%, respectively. Another module was developed by Ge et al,[10] introducing Explainable Artificial Intelligence (XAI). This module demonstrated similar accuracy to the previous one in GERD classification. Notably, their study compared the XAI module to expert endoscopists and juniors and found that it delivered superior results. The accuracy of classification in the AI module was 86.7%, contrasting with 71.5% by junior endoscopists and 77.4% by experts.
Several other CNN modules were developed, and they demonstrated comparable results in GERD classification, reinforcing the effectiveness of this approach. In a more recent study conducted by Yen et al[11] in 2022, the aim was to improve the accuracy of the CNN module described by Wang et al,[9] using the same database and a similar CNN module with modified automated diagnostic capabilities of the AI module, adding a feature of ML. The researchers concluded that the accuracy can be up to 92.5 ± 2.1%.[11] This study shows the potential of AI, which opens the door for further modifications and improvements in function, using ML and DL on those modules to improve diagnostic abilities for GERD and its classifications.
BARRETT’S ESOPHAGUS AND ESOPHAGEAL ADENOCARCINOMA
Barrett’s esophagus (BE) is a condition in which the normal squamous epithelium of the esophagus gets replaced with columnar epithelium.[12] BE is considered a precancerous lesion that can progress to low-grade dysplasia, high-grade dysplasia, and finally to invasive adenocarcinoma.[13] According to the American College of Gastroenterology, the diagnosis of BE requires identification of columnar metaplasia during endoscopy and confirmation of intestinal metaplasia with goblet cells upon biopsy evaluation. However, despite the advancement in endoscopic imaging and biopsy techniques, dysplastic and neoplastic BE detection and characterization are challenging, even for experienced endoscopists.[14] Endoscopists could benefit from an automatic detection system that assists in identifying early neoplastic lesions. AI has recently emerged as a promising tool in assisting endoscopists in the detection and characterization of neoplastic lesions in patients with BE.
In 2016, van der Sommen et al[15] conducted the first reported study in the literature to explore the feasibility of a computer-based detection system to identify early neoplastic lesions in BE. They developed an SVM model that showed reasonable accuracy, with a sensitivity of 86% and specificity of 87%. More recent studies have used CNN instead of SVM as the foundation for their AI models. Abdelrahim et al[16] used CNN to detect Barrett neoplasia, which showed high sensitivity and reasonable specificity, with an average speed of 33 ms/image. Furthermore, de Groof et al[17] reported the development of the first externally validated DL computer-aided detection system for early BE neoplasia. The system achieved greater accuracy than a group of nonexpert endoscopists, indicating that a computer-aided detection system could enhance the accuracy of early BE neoplasia detection by general endoscopists. De Groof et al[18] also used a CNN model to detect Barrett neoplasia during live endoscopic procedures that showed high accuracy. Another study from Japan developed a CNN model to detect esophageal adenocarcinomas that achieved higher sensitivity and acceptable specificity when compared to experts.[19] Additionally, this model analyzed the data at a speed of 36 images per second.[19] Furthermore, Ebigbo et al[20] used a DL model to predict submucosal invasion in Barrett cancer that showed no statistically significant difference between its performance and that of expert endoscopists.
In a recent study in 2023, Abdelrahim et al[21] reported the first study describing external validation of AI algorithms for detecting BE neoplasia on real-time endoscopic videos. The model achieved high sensitivity, specificity, and accuracy and performed significantly better than general endoscopists. In addition, the model detection and localization processing speeds were 5 ms/image and 33 ms/image, respectively. In another recent study in 2023, Fockens et al[22] developed a DL system that outperformed general endoscopists in detecting BE neoplasia on images and videos and improved their detection rate without compromising specificity. In addition, the system classified an endoscopic image in 0.029 seconds at an average speed of 35 frames per second. Several other studies have also shown that DL models have high accuracy in detecting Barrett-related dysplasia and cancer, with an average sensitivity greater than 90% and a specificity greater than 80%. All those studies have shown that AI is expected to play a significant role in diagnosing BE and esophageal adenocarcinoma in the near future (Table 1).
SQUAMOUS CELL CARCINOMA
The outcome of esophageal squamous cell carcinoma (ESCC) is very poor. It is usually diagnosed at a late stage, resulting in a high demand for improved diagnostic tools to detect esophageal squamous cell carcinoma early.[28,29] Traditional methods include images read by endoscopists and are done through white-light endoscopy (WLE), which is usually combined with either narrow band imaging (NBI) or iodine staining to increase sensitivity.[28,29] NBI would highlight ESCC with a brownish area and is preferred over the use of iodine staining, as the use of iodine increases the procedure time and causes inconvenience to the patient with side effects such as chest pain or discomfort; its use is usually reserved for screening in high-risk patients.[29] The detection rate using white light imaging and NBI is around 50% when images are read by nonexpert endoscopists.[29] Therefore, DL modules using CNN algorithms were developed to improve the detention rate, and multiple studies reported their effectiveness. Using CNN for esophageal cancer detection, Horie et al[29] studied the effect of a comprehensive AI module; AI could show very high sensitivity, reaching 98%, and differentiate between superficial and deep cancer. However, the positive predictive value (PPV) was only 40%, compared to the PPV of endoscopists, which was 45% for experts and 35% for nonexperts. Moreover, this study highlighted the ability of CNN modules to differentiate between superficial and deep cancers, which was a starting point for further research to be done on disease classification.
Several other studies were done to further assess the classification of the invasion depth of pathology-proven ESCC.[30] Nakagawa et al[30] did a study on the same module, reviewing more than 14,000 images. They concluded that AI successfully differentiated between different stages of ESSC, based on invasion depth, with sensitivity, specificity, accuracy, and positive predictive value superior to endoscopists’ measures. Endoscopists were superior when it came to negative predictive value, with the most significant difference being the specificity, which was 7.4% more in the AI group (Table 2).
Comparison between the CNN module and endoscopists in reading endoscopy images to differentiate between stages of esophageal squamous cell carcinoma

In a subsequent study by Tokai et al[31] in 2020, they explored the effectiveness of CNNs in diagnosing and categorizing ESCC. The AI-driven diagnostic system demonstrated remarkable performance by identifying 95.5% (279 of 291) ESCC cases in test images within a swift 10-second timeframe. It efficiently analyzed the 279 images, accurately estimating the invasion depth of ESCC with an 84.1% sensitivity and an overall accuracy of 80.9%, all achieved within a concise 6-second window. Significantly, the accuracy score of this system outpaced that of 12 out of 13 board-certified endoscopists. Moreover, the system’s area under the curve (AUC) surpassed the AUCs of all the participating endoscopists.
Fukuda et al[32] did a study that compared the performance of the AI CNN module and expert endoscopists in the diagnostic process of ESCC; it involved reading more than 23,000 real-time video images. The diagnostic process focused on two main categories: first, detection, which involved pinpointing suspicious lesions; and characterization, which aimed to differentiate between cancer and noncancer cases. The AI system outperformed the experts in detecting ESCC by displaying a higher sensitivity than theirs but a lower specificity, indicating a greater likelihood of false positives. The second phase was the characterization phase. The AI system demonstrated superior performance to the experts, showcasing higher sensitivity, specificity, and overall accuracy (Table 3).[32] The receiver operating characteristic curve highlighted that the AI system has significantly better diagnostic capabilities.
Diagnostic ability of the CNN module including detection and characterization when compared to expert endoscopists

In another study conducted by Waki et al[33] in 2021 to assess the effectiveness of AI in detecting ESCC by analyzing approximately 17,000 images, the AI program demonstrated a sensitivity of 85.7% and specificity of 40%. In comparison, endoscopists had a sensitivity of 75% and a higher specificity of 91.4%. A subsequent evaluation was done with endoscopists reading those images with the assistance of the AI program through DL, and it showed improved sensitivity to 77.7% with a similar specificity of 91.6%. In a more recent study in 2022, Yuan et al[34] conducted a multicenter study in China on a CNN module, analyzing more than 50,000 images and videos to diagnose ESCC confined to the epithelium, which showed comparable results in sensitivity and specificity with expert endoscopists with superiority of AI in reading white light imaging.
Overall, AI diagnostic abilities in ESCC are exciting and have promising results, especially in sensitivity and time efficiency, and the use of AI could prove beneficial to aid endoscopists. However, improvements are needed regarding specificity and accuracy in identifying a lesion to be malignant. Furthermore, when classifying different stages of ESCC, AI was superior overall.
GASTRIC CANCER
Gastric cancer (GC) arises from the stomach lining and develops through a well-established precancerous cascade: normal gastric mucosa to nonatrophic gastritis to multifocal atrophic gastritis without intestinal metaplasia to gastric intestinal metaplasia (GIM) to low-grade dysplasia (LGD) to high-grade dysplasia (HGD) to invasive adenocarcinoma, with the likelihood of progression increasing as this cascade advances.[35] GC is usually diagnosed through endoscopy and biopsy for optimal treatment planning, which may include surgery, chemotherapy, or targeted therapy. The identification and characterization of early gastric cancer (EGC) and precancerous lesions through endoscopy are crucial for assessing cancer risk and guiding suitable treatment and surveillance measures. Although conventional endoscopic imaging modalities such as WLE and NBI are sufficient for detecting some features of the carcinomatous transformation of gastric lesions, they have lower sensitivity in detecting GIM, LGD or HGD, and EGC. On the other hand, although chromoendoscopy has high sensitivity in detecting EGCs and precancerous gastric lesions, it has some limitations due to the requirement for local dye application, which limits its use in the context of the broad field required for gastric lesion detection.[36] Hence, using AI becomes essential to improve sensitivity and specificity in detecting precancerous lesions and early gastric ulcers endoscopically. The application of AI in gastroscopy is currently limited when compared to colonoscopy. However, recent studies continue to rapidly reveal the potential of AI in advancing the diagnosis and treatment of GC.
In 2013, Miyaki et al[37] were among the first to use the SVM algorithm alongside magnifying endoscopy and flexible spectral imaging color enhancement to identify mucosal GC. This method demonstrated a detection accuracy of 85.9%, a sensitivity of 84.8%, and a specificity of 87.0% for GC diagnosis. Many studies were published after that, which used CNN and other DL algorithms to develop AI systems that could detect GC with better performance. The sensitivity, specificity, and accuracy of these AI systems have consistently improved with each published study, signaling ongoing advancements in their performance.[38–51] For example, Ishioka et al[51] developed the AI-based diagnostic support tool “Tango” in 2023 to detect EGCs. Subsequently, this tool’s performance was compared with that of endoscopists.[51] Tango achieved superior sensitivity and accuracy over the specialists (84.7% vs. 65.8% and 70.8% vs. 67.4%, respectively) and nonspecialists (84.7% vs. 51.0% and 70.8% vs. 58.4%, respectively).
In addition to its application in GC detection, AI has been increasingly used in more advanced capacities within GC diagnostics. This includes tasks such as cancer classification, depth prediction, and identifying histopathologic features of GC. Table 4 illustrates key studies that developed CNN-based diagnostic systems that assessed these advanced capacities. For instance, Jagric et al[52] used learning vector quantization neural networks to develop a system that predicts liver metastases after GC resection. The sensitivity and specificity for the test sample were 66.7% and 97.1%, respectively. Additionally, Zhang et al[53] created a CNN-based diagnostic system that was trained to differentiate between peptic ulcer (PU), EGC, high-grade intraepithelial neoplasia (HGIN), advanced gastric cancer (AGC), gastric submucosal tumors (SMTs), and normal gastric mucosa without lesions. The developed system showed higher diagnostic specificity and PPV than the endoscopists for the images of EGC and HGIN. Moreover, the diagnostic accuracy of this system was similar to that of the endoscopists for the images without lesions, EGC, HGIN, PU, AGC, and SMTs.
In one of the most recent meta-analyses that included all the latest studies up to 2023, Klang et al[54] analyzed 42 studies that used a variety of DL techniques, demonstrating the utility of DL in GC classification, detection, tumor invasion depth assessment, cancer margin delineation, lesion segmentation, and detection of early-stage and premalignant lesions. The results highlighted the noteworthy performance of AI systems across all previously mentioned aspects of GC diagnostics, showcasing high levels of accuracy, sensitivity, and specificity.
Shifting the focus to another aspect of gastroscopy, AI has been used to detect Helicobacter pylori infection. The study by Parsonnet et al[74] in 1991 was a significant milestone in understanding the link between H. pylori infection and gastric adenocarcinoma development. Since then, various studies have confirmed this association and demonstrated the importance of early detection and treatment of H. pylori infection in reducing the incidence of gastric adenocarcinoma.[74] Huang et al[75] were among the first to use neural network modules in diagnosing H. pylori. Their work involved using a refined feature selection with a neural network technique that could predict H. pylori-related gastric histologic features. Nakashima et al[76] introduced a linked color imaging computer-aided diagnosis (LCI-CAD) system designed to categorize the status of H. pylori infection into three categories: uninfected, currently infected, and post eradication, boasting diagnostic accuracies of 84.2% for uninfected, 82.5% for currently infected, and 79.2% for posteradication status. The LCI-CAD system demonstrated comparable diagnostic accuracy to experienced endoscopists. A meta-analysis done in 2020 by Bang et al[77] using endoscopic images, evaluated AI’s diagnostic test accuracy for predicting H. pylori infection. Eight studies were included in the final analysis, and the results showed sensitivity and specificity of 87% and 86%, respectively, in predicting H. pylori infection. Additionally, the accuracy of discrimination between noninfected and posteradication images was 82%.
Most recently in 2023, Lin et al[78] created an AI system using CNN and Concurrent Spatial and Channel Squeeze and Excitation (scSE) networks that demonstrated 90% accuracy, 100% sensitivity, and 81% specificity, further proving that the diagnostic capabilities of new systems will continue to improve. Also, this confirms that improving the diagnostic approaches for H pylori infection, using AI and ML algorithms, is crucial to reducing mortality and morbidity associated with gastric adenocarcinoma.
CAPSULE ENDOSCOPY
CE is a noninvasive GI study that involves swallowing a wireless camera in a pill to capture images of the intestinal lumen.[79] The capsule transmits high-resolution images at 2 to 6 frames per second for 8 to 12 hours until the battery runs out.[79] Since its introduction in 2001, CE has played an important role in the diagnosis of small-bowel diseases. However, the process of interpreting CE is prone to errors owing to its long duration and expertise requirements. AI has demonstrated great promise in interpreting CE images. In CE, AI often uses DL algorithms, particularly CNNs. CNNs are highly proficient in recognizing images and can be trained on extensive datasets of CE images to identify patterns and characteristics associated with various GI abnormalities. Several studies that have been done to evaluate the use of CNNs in CE are summarized in Table 5.
Two studies from China reported using CNN for automated detection of GI hemorrhage from CE images, which showed promising results.[80,81] Another study from Portugal used CNN to automatically detect luminal blood in CE images, with an AUC of 1 and an average reading rate of 184 images per second.[82] AI models have also been studied to detect GI angioectasia, using CE images. Leenhardt et al[83] proposed a CNN-based segmentation algorithm that showed excellent accuracy for detecting GI angioectasia from small-bowel CE images. In another study from Japan, Tsuboi et al[84] developed a CNN-based system for automatic detection of small-bowel angioectasia from CE images. This retrospective study suggested that this CNN-based system has adequate sensitivity and specificity, with an AUC of 0.998. In addition, the CNN system analyzed the CE images in 323 seconds at an average speed of 32.5 images per second. In a more recent study, Ribeiro et al[85] developed a CNN-based model to detect vascular lesions, including red spots, angioectasia, and varices. The model demonstrated high performance with an AUC of 0.97–0.98 and an average reading rate of 145 frames per second.
Another application of AI models in CE is detecting GI erosions and ulcers. Fan et al[86] reported the first study to detect small intestinal erosions and ulcers automatically using a CNN model. The study demonstrated an excellent performance of the CNN model, with an AUC of 0.98. Aoki et al[87] also trained a CNN system for detecting GI erosions and ulcers from CE images, which showed high accuracy with an AUC of 0.958. Additionally, the CNN model evaluated the CE images in 233 seconds at an average speed of 44.8 images per second. In a subsequent study, Aoki et al[88] found that CNN reduced physicians’ reading time without compromising the detection rate of erosions or ulcerations.
AI models have also been used for detecting GI polyps in CE images. In 2017, Yuan et al[89] proposed a novel DL method that achieved higher polyp recognition accuracy in CE images than other polyp recognition methods. In another study, Saito et al[90] trained a CNN to automatically detect various protruding lesions, including polyps, nodules, epithelial tumors, submucosal tumors, and venous structures, in CE images. The trained CNN achieved a sensitivity of 90.7% and specificity of 79.8%, with an AUC of 0.911 and a detection rate of 98.6%.[90] Moreover, the trained CNN analyzed the CE images in 530.462 seconds at an average speed of 0.3030 seconds per image.[90]
One study from China used CNN to identify various small intestinal lesions in CE images.[91] The study showed that the CNN model detected abnormalities with higher sensitivity and shorter reading times than gastroenterologists, with a mean reading time per patient of 5.9 ± 2.23 minutes by CNN and 96.6 ± 22.53 minutes by gastroenterologists. In addition, the CNN model increased the total detection rate by 16.33% compared to that of gastroenterologists. Furthermore, Klang et al[92] developed a CNN algorithm to detect Crohn disease ulcers in CE images, which showed an average duration of 204.7 ± 93.9 seconds for detecting ulcers. This study also showed excellent accuracies ranging from 95.4 to 96.7%, with AUCs of 0.99. Furthermore, Zhou et al[93] developed a CNN model to analyze the presence and extent of celiac disease, which achieved promising results with 100% sensitivity and specificity.
Most recently in 2023, Ding et al[94] developed a well-trained AI model that automatically detects multiple abnormalities on small-bowel CE videos. Moreover, one recent retrospective study has shown that AI-assisted CE reading saves significant time without reducing sensitivity.[95] After using AI-assisted reading, the mean reading time went from 29.7 to 2.3 minutes, resulting in an average time savings of 27.4 minutes per study. In another recent report, Choi et al[96] conducted a study to examine if a CNN model could detect meaningful findings when negative CE videos were reanalyzed. They found that of 103 videos initially read as negative by humans, 63 videos (61.2%) showed meaningful findings when reanalyzed by the CNN model. Additionally, the diagnosis was changed for 10.3% of patients who initially had negative CE results. All those studies have shown that AI in CE is evolving and holds great potential in the future.
ENDOSCOPIC RETROGRADE CHOLANGIOPANCREATOGRAPHY
ERCP is a procedure that uses images from a dedicated camera and x-ray to view the bile and pancreatic ducts for both diagnostic and therapeutic purposes in various pancreaticobiliary conditions. In general, ERCP is the best therapeutic approach for common bile duct stone removal. However, cannulation of the ampulla could be difficult and could be associated with adverse events such as post-ERCP pancreatitis (PEP). Appropriate measures are taken in every ERCP to minimize the risk of developing PEP, starting with risk stratification based on multiple factors, including age, sex, bilirubin level, and history of sphincter of Oddi dysfunction. If the procedure is deemed high risk, then prophylactic measures could be taken, such as rectal nonsteroidal anti-inflammatory drugs, intravenous lactate Ringer, and placement of pancreatic stent.[97] AI in ERCP is relatively new and has already shown many uses that could improve patient care and decrease adverse events of the procedure, mainly in helping locate anatomical landmarks and assess the difficulty of the cannulation and procedure. A recent study in 2023 done by Archibugi et al[98] built and compared two different types of ML modules to perform a predictive tool for developing PEP in 1150 patients. Gradient boosting outperformed logistic regression with an AUC of 0.7 compared to 0.585, with a p-value of 0.012.
Kim et al[99] in 2021 conducted the first study that compared multiple CNN modules to assist endoscopists in ERCP with two functions: locating the ampulla of Vater, and classifying the procedure into two categories: easy vs. difficult. This study has shown promising results in locating ampulla of Vater in different anatomical variants and shapes. This could assist endoscopists in locating the anatomical sites and classifying the procedure’s difficulty to take necessary precautions.[99]
The development of AI modules in ERCP has revealed intriguing applications and potential benefits. Although novel research has begun to shed light on their utility, further investigations are essential to comprehend their capabilities and limitations fully. Additionally, the field anticipates the emergence of new applications and advancements in the near future, underlining the ongoing evolution and promising trajectory of AI in endoscopic retrograde cholangiopancreatography.
ENDOSCOPIC ULTRASONOGRAPHY
EUS is a minimally invasive procedure that involves the use of the endoscope and high-frequency ultrasound device to enable imaging of the GI tract. EUS has an essential role in the diagnosis and management of various GI conditions, including pancreatic pathologies, by providing high-resolution and real-time imaging. In particular, EUS has been an essential tool for diagnosing pancreatic cancer. The diagnostic accuracy for EUS-guided fine-needle aspiration is between 85 and 91%.[100] However, the presence of chronic pancreatitis could make this diagnosis challenging owing to scar, inflammation, and the difficulty of finding an appropriate site for a biopsy, and in turn, could give a high false-negative rate; therefore, there was a need to improve the quality of the images.[101] AI was studied in these patients in an attempt to enhance the detection of pancreatic cancer in patients with a prior diagnosis of chronic or autoimmune pancreatitis.
In 2001, Norton et al[102] were the first to study that subject and studied 21 patients with pancreatic cancer and 14 patients with chronic pancreatitis (CP), all biopsy-proven. An AI model built using SVM classification was able to read the EUS images and detect malignancy with 100% sensitivity, which is a higher percentage than that of endoscopists, 73%, but with a lower specificity of 50% compared to 83%. When this tool was used to assist endoscopists, it helped increase the sensitivity to 89%. It is important to note that although this study showed a positive outcome regarding improved detection rate, the sample size was small.
Further studies were conducted on the same topic with a larger sample size. In 2008, Zhang et al[103] used an SVM module and Das et al[104] used an ANN module, and in 2013, Zhu et al[105] used an SVM module. All three studies included patients with both pancreatic cancer and CP, and they all reported a sensitivity of 93–94% and specificity of 93%, except for Zhang et al,[103] which reported 99%. These studies marked a significant milestone in enhancing the detection of pancreatic cancer.
In one study in 2021, Marya et al[106] developed a CNN module that analyzed data from 583 individuals, encompassing patients diagnosed with autoimmune pancreatitis (AIP), pancreatic ductal adenocarcinoma (PDAC), CP, and normal pancreas (NP). The CNN module studied approximately 1,750,000 images and demonstrated robust diagnostic performance, with a 99% sensitivity and 98% specificity in distinguishing AIP from NP. Additionally, it exhibited a 90% sensitivity and 93% specificity for distinguishing AIP from PDAC, and a 94% sensitivity and 71% specificity in discerning AIP from CP. The CNN program maintained a consistent 90% sensitivity and 85% specificity for distinguishing AIP from the other mentioned conditions. Notably, the CNN module achieved a processing rate of 955 EUS frames per second.
In another study in 2021, Udriștoiu et al[107] conducted a study with an AI module that used a hybrid learning approach, combining CNN and Long Short-Term Memory Neural Network, to analyze EUS images for distinguishing various pancreatic pathologies. Around 75% of the images were designated for training, with the remaining 25% reserved for testing the DL models. The findings revealed an outstanding predictive performance of the model, achieving an overall accuracy of 98.26% and an AUC index of 0.98 in predicting clinical diagnoses. The reported negative predictive value (NPV) and PPV were 96.7% and 98.1% for PDAC, 96.5% and 99.7% for CP, and 98.9% and 98.3% for pancreatic neuroendocrine tumors.
EUS elastography (EUS-E) is a newer diagnostic technique for malignant pancreatic lesions; it changes the image of the tissue into colored pixels. Săftoiu et al[108] demonstrated that EUS-E is superior in sensitivity, accuracy, specificity, NPV, and PPV than regular EUS in detecting malignant lymph nodes. AI was investigated in EUS-E to differentiate malignant pancreatic lesions from nonmalignant ones. Săftoiu et al[109] used an ANN module-based computer-aided detection system that analyzed EUS-E videos from 258 patients, including those with chronic pancreatitis, which can complicate cancer detection. The study concluded that the ANN-based analysis had a sensitivity of 87.59%, specificity of 82.94%, PPV of 96.25%, and NPV of 57.22%. Moreover, the ANN-based analysis showed a significantly higher AUC of 0.94 compared to 0.85 of a previously studied simple hue histogram analysis.[109]
SUMMARY
In this review, we have summarized key studies discussing the use of DL models in various GI endoscopic procedures. Although AI implementation in endoscopy has shown promising results, there are still limitations that should be addressed. AI in endoscopy can give rise to several ethical concerns and potential AI-related errors. One of the main ethical concerns is that AI requires access to sensitive patient data to train itself, so it is essential to implement safe practices to ensure patient privacy. Obtaining informed consent from patients may also be a challenge, as they may not fully understand the role of AI in their healthcare. AI integration into healthcare may raise concerns about healthcare professionals’ job displacement. Therefore, it is crucial to emphasize the collaborative role of AI in enhancing endoscopy quality and assisting endoscopists in clinical decisions instead of relying solely on AI without human input. It is also essential to establish clear guidelines regarding accountability and liability in case of AI-related errors.
The financial burden of integrating AI software into healthcare, including the costs of the software itself, its application, and maintenance expenses, raises numerous questions about the future financial state of healthcare. Additionally, there is ongoing concern about the challenges faced by patients of low socioeconomic status without health insurance, especially if there is a further increase in the cost of endoscopic procedures. Although AI has the potential to save costs by reducing the need for labor, it also presents an ethical dilemma regarding potential job market shrinkage and increased unemployment. Further research is required to comprehend the ethical and practical considerations in integrating AI tools into GI endoscopic procedures. Moreover, integration of AI models into endoscopic procedures depends on their acceptance by endoscopists, which can be challenging and requires adequate training to ensure comfortable daily use of AI models. Given all these challenges, it is essential to establish explicit guidelines for AI implementation in endoscopy to address all the concerns and prevent patient harm arising from potential AI-related errors.
CONCLUSION
Despite being a relatively newly emerging tool, AI has provided clinically and statistically significant improvement in the diagnostic capability of endoscopy. In GERD, AI applications include GERD detection and classification. Additionally, trained AI systems could detect and characterize dysplastic and neoplastic lesions and predict the submucosal invasion of Barrett’s esophagus, esophageal adenocarcinoma, and esophageal squamous cell carcinomas. The developed AI modules have a range of applications for gastric pathologies, including GC detection and classification. These AI modules can differentiate GC from other pathologies like gastritis, gastric ulcers, and submucosal tumors and identify its histopathologic features. Additionally, AI can detect H. pylori infection and categorize it as uninfected, currently infected, or post eradication. Trained AI modules have increased the detection accuracy of small-bowel pathologies during capsule endoscopy, such as erosions, ulcers (including Crohn disease ulcers), vascular lesions (including angioectasia), and polyps. Furthermore, AI has improved diagnostic applications of advanced endoscopic procedures such as ERCP and EUS. AI helps in post-ERCP pancreatitis risk stratification, locating the ampulla of Vater, and classifying the procedure into two categories: easy and complex. Similarly, AI also helps detect chronic pancreatitis, pancreatic cancer, and malignant lymph nodes during EUS procedures. Finally, AI applications in upper GI, advanced, and capsule endoscopies hold great promise. New data and studies continue to prove the capabilities of AI and what it can provide in the future; future studies are expected to develop more effective AI models that can be implemented widely in this field. However, further research is needed to address the potential challenges, errors, and ethical considerations associated with the integration of AI in GI endoscopy.
References
Competing Interests
Source of Support: None. Conflict of Interest: None.