Recent artificial intelligence (AI) advancements in cardiovascular medicine offer potential enhancements in diagnosis, prediction, treatment, and outcomes. This article aims to provide a basic understanding of AI enabled ECG technology. Specific conditions and findings will be discussed, followed by reviewing associated terminology and methodology. In the appendix, definitions of AUC versus accuracy are explained. The application of deep learning models enables detecting diseases from normal electrocardiograms at accuracy not previously achieved by technology or human experts. Results with AI enabled ECG are encouraging as they considerably exceeded current screening models for specific conditions (i.e., atrial fibrillation, left ventricular dysfunction, aortic stenosis, and hypertrophic cardiomyopathy). This could potentially lead to a revitalization of the utilization of the ECG in the insurance domain. While we are embracing the findings with this rapidly evolving technology, but cautious optimism is still necessary at this point.
Editor’s comment
A companion article in the Journal of Insurance Medicine1 discusses Artificial Intelligence (AI)’s ability to read a simple PA chest radiographic (CXR) and successfully screen for type 2 diabetes mellitus, low-ejection fraction heart failure, valvular heart disease, and prediction of mortality in asymptomatic persons with chronic lung disease. This current article is in follow-up to Dr. Posan’s and Dr. Ross MacKenzie’s presentation Advanced EKGs and the Potential Role of AI-Augmented EKG interpretation at the American Academy of Insurance Medicine’s annual meeting in Washington, D.C. in October 2023, which highlighted the potential for AI-enabled ECG.
Methodology
Key articles over the past four years appearing in the National Library of Medicine database (PubMed) served as the major source of information for this treatise. Each article is well-referenced, allowing readers to delve more deeply into whatever study is most beneficial.
Background
The ECG is a more than 100-year-old technology (invented in 1901 by Willem Einthoven) and remains the most performed cardiac test worldwide. Life and disability insurance companies are less frequently requiring ECGs with applications, likely because of the cost and “hassle” of obtaining the ECG, and because of potential reader variability in interpreting ECGs.
Interpretation of an ECG requires specific skills and knowledge. Computer-generated interpretations have been used in health systems for years, but these interpretations are based on predefined rules and manual pattern or feature recognition algorithms that do not capture the complexity and nuances of an ECG. For example, ECGs have heretofore been useful at identifying specific features, such as ST-segment elevation for acute myocardial infarction, T-wave changes to suggest critical serum potassium elevations or pericarditis, and other gross deviations to identify specific clinical entities. However, AI-enabled ECGs (AI-ECG) may represent a quantum-leap not only in the reliability and consistency in interpretation but also in capabilities of diagnosing and predicting future conditions that could never be detected previously.
AI-ECGs may increase the reliability and predictability of a simple 12-lead ECG and even a 1-lead rhythm recording from a mobile or wearable device such an iPhone. This revolution may open new opportunities in the utilization of this centuries-old classic tool. Failure of the insurance industry to recognize the potential for adverse risk assessment could prove costly.
AI can be applied in two ways to the ECG. In one, currently performed human interpretation skills, such as diagnosing acute myocardial infarction, bundle branch blocks, etc. but with greater diagnostic performance that is faster, more consistent, and more accurate than previously possible. The second utilization is to extract information from an ECG that is imperceptible to the human eye. This paper will focus on the latter approach.
The ECG is the aggregate of the electrical activity of millions of individual cardiomyocytes recorded from the body’s surface. In most conditions the underlying pathophysiological process begins many years in advance of clinical presentation. Numerous distinct biological factors can subtly and nonlinearly impact cardiomyocyte electrical function resulting in subclinical ECG alterations not evident on a normal 12-lead sinus rhythm ECG.
The fundamental hypothesis of AI-enabled ECG diagnosis is that the disease process will initially affect the voltage (in time) before it effects other imaging technologies, and those changes (at a cellular level) could be reliably detected by a properly trained neural convolutional network (CNN) (see Figure 1).
“The CNN is trained without hard-coded rules by finding often subclinical patterns in huge datasets, AI transforms the ECG into a screening tool and predictor of cardiac and non-cardiac diseases, often in asymptomatic individuals.”2
The ECG is an ideal substrate for deep-learning AI applications. The ECG is widely available and yields reproducible raw data that are easy to store and transfer in a digital format. In addition to fully automated interpretation of the ECG, rigorous research programmers using large databanks of ECG and clinical datasets coupled with powerful computational capabilities have been able to demonstrate the utility of the AI-enhanced ECG (henceforth referred to as the AI-ECG) as a tool for the detection of ECG signatures and patterns that are unrecognizable by the human eye.
Screening to Detect Silent Atrial Fibrillation
Atrial fibrillation (AF) is the most common cardiac arrhythmia, affecting one-quarter of patients older than 80 years.3 With an aging and growing population, the prevalence of AF is expected to rise. Patients with AF are 5 times more likely to experience a stroke and have up to a 25% risk of dying within 30 days of stroke.4 It can be episodic and at least one-third of cases are asymptomatic.5 Asymptomatic AF is associated with an increased risk of cardiovascular (HR 3.12, 95% CI 1.50-6.45) and all-cause mortality (HR 2.96, 95% CI 1.89-4.64) compared to typical AF after adjustment for CHA2DS2VASc score and age.6 Other studies, however, have found no significant morbidity or mortality differences’ between persons with asymptomatic and symptomatic AF.7 Among patients who experience an acute stroke of unknown origin (cryptogenic stroke), one-fifth will be found to have occult AF.8 Atrial fibrillations also may cause long-term changes in cardiac structure, including atrial dilation, atrial remodeling, and ventricular function deterioration, which can result in permanent AF and heart failure.9,10
Effective clinical management can mitigate the complications of AF. Oral anticoagulation reduces the relative risk of stroke by two-thirds but only in patients proven to have non-valvular AF.11 Early intervention with antiarrhythmic medications or ablation may prevent more permanent AF and reduce symptoms and stroke risk.12,13
For further prevention of stroke in patients with cryptogenic stroke (i.e., stroke of unknown etiology) current screening measures include various cardiac monitoring systems (Holter, loop-recording, etc.), and CHA2DS2-VASc scores (see Table 1) of ≥2 in men and ≥3 in women, all of which have a poor diagnostic yield (detecting only 9% of patients with insertable cardiac monitor at 6 months and CHA2DS2-VASc having an AUC of 0.52 for stroke prediction).14,15
For asymptomatic adults 50 years or older, the USPSTF (United States Preventive Services Task Force) concludes that the current evidence is insufficient to assess the balance of benefits and harms of screening for atrial fibrillation.16,17
Since neural networks can detect multiple, subtle, non-linear-related patterns in an ECG, a group at the Mayo Clinic hypothesized that they may be able to detect the presence of intermittent AF from a normal sinus rhythm ECG. The authors theorized that patients with AF may have subclinical ECG changes associated with fibrosis or transient physiologic changes. Approximately 1 million ECGs from patients with no AF (controls) and patients with episodic AF (cases) were studied. This was a retrospective study. The network was never shown ECGs with AF, but only normal sinus rhythm ECGs from patients with episodic AF and from controls. After training, the AI-ECG network accurately detected paroxysmal AF from an ECG recorded during normal sinus rhythm (accuracy of 79.3%; AUC 0.87) [seeappendix B for definitions of AUC vs. accuracy]. When the ECG from the patients’ “window of interest” (31-day period) prior to first ECG showing AF was evaluated, the accuracy of the AI-ECG algorithm improved (accuracy 83%; AUC 0.90).18
In another, prospective study from the Mayo Clinic, 1003 patients (mean age 74 with stroke risk factors (CHA2DS2-VACc scores ≥2 in men and ≥3 in women) with no history of AF but with ECGs underwent continuous ambulatory heart rhythm monitoring for up to 30 days. Their AI-ECG algorithm for AF detection applied to the patients’ ECGs divided patients into a high-risk or low-risk group. The primary outcome was newly diagnosed AF. Over a mean 22.3 days of continuous monitoring, atrial fibrillation was detected in 6 of 370 (1.6%) low-risk patients and 48 (7.6%) of 633 high risk patients (odds ratio 4.98, 95% CI 2.11, 11-75, p=0.0002). Subsequent AI-guided screening was associated with increased detection of atrial fibrillation: 10.6% (95% CI 8.3, 3-13.2) in high-risk group vs 3.6% (95% CI 2.3, 3-5.4) in the low-risk group (p<0.0001) over a median follow-up of 9.9 months. Adding CHARGE-AF19 (a 5-year predictive model that includes the variables of age, race, height, weight, systolic and diastolic blood pressure, current smoking, use of antihypertensive medication, diabetes mellitus, history of myocardial infarction and heart failure) on top of the AI-ECG risk score did not improve the discrimination compared with the AI alone. This was the first study which found that the AI algorithm was able to further risk stratify this at-risk-of-stroke population beyond traditional clinical risk factors (age, CHA2DS2VASc scores, CHARGE-AF risk scores) and the first to evaluate the effectiveness of AI-guided targeted screening programs in comparison with usual care.20
A large (907,858 ECGs from 6 US Veterans Affairs (VA) hospital networks) study applied a deep-learning model to predict the presence of AF within 31 days of a sinus rhythm ECG. Because the demographics at the VA sites were 94% male, the validation study was done at a large non-VA medical center (72,483 ECGs). The mean CHA2DS2VASc score was 1.6. The AUC was 0.93 (95% CI, 0.93-0.94) with accuracy 0.87 (95% CI, 0.86-0.88). Among individuals deemed high risk by deep learning, the number needed to screen to detect a positive case of AF was 2.47 individuals for a testing sensitivity of 25%, and 11.48 for testing sensitivity of 75%. Model performance was similar in patients who were Black, female, or younger than 65 years or who had CHA2DS2-VASc scores of 2 or greater.21
Key Findings
AI-ECG applied to a normal sinus rhythm ECG may permit identification of individuals experiencing AF either previously or occurring in the future (i.e., current ECG shows sinus rhythm, but atrial fibrillation is present at other times).
In patients at risk for AF, AI-ECG is strongly predictive of concurrent AF within 30 days of ECG during normal sinus rhythm with high degree of accuracy – suggesting that an ECG (current or stored) may be a surrogate for prolonged rhythm monitoring.
AI algorithm can risk-stratify a relatively uniform population (e.g., older adults at risk for stroke) to detect undiagnosed atrial fibrillation during short-term cardiac monitoring.
An AI-guided targeted screening approach that leverages existing clinical data (i.e., patients with stroke risk factors) increased the yield for atrial fibrillation detection with subsequent monitoring.
Screening for Left Ventricular Systolic Dysfunction
Asymptomatic left ventricular dysfunction is relatively prevalent, present in 3-6% of the general population and 9% in the elderly, is associated with increased mortality, and is treatable when found.22 It is, however, difficult to diagnose because there are no warning symptoms and because the current tools (echocardiogram, biomarker testing with natriuretic peptides) are not widely available and/or cost effective. Screening for left ventricular dysfunction with BNP and N-terminal pro b-type natriuretic peptide (NTproBNP) has been found to have AUCs of 0.6 for BNP and 0.7 for NTproBNP.23 Utilization of AI augmented ECG signals in screening for LV dysfunction could provide a valuable new tool.
Sangha et al.24 used 385,601 normal sinus rhythm ECGs with paired low EF <40% (HFrEF – Heart Failure [with] reduced Ejection Fraction) echocardiograms in both in-hospital and outpatient settings. Their AI-ECG model yielded AUCs of 0.88-0.94 in testing in a number of large medical centers in the U.S. An AI-ECG suggestive of LV systolic dysfunction portended >27-fold higher odds of LV systolic dysfunction on a subsequent transthoracic echocardiogram (odds ratio, 27.5 [95% CI, 22.3-33.9]). A positive AI-ECG in individuals with an LV ejection fraction ≥40% by echocardiogram at the time of initial assessment was associated with a 3.9-fold increased risk of developing incident LV systolic dysfunction in the future (hazard ratio, 3.9 [95% CI, 3.3-4.7]; median follow-up was 3.2 years.
Another study from the Mayo Clinic trained a convolutional neural network on paired 12-lead ECG and echocardiogram ejection fractions from 44,959 patients with validation testing on an independent set of 52,870 patients. The network model yielded AUC of 0.93 (see Figure 2), with sensitivity of 86.3%, specificity of 85.7%, and accuracy of 85.7%.25
In patients without ventricular dysfunction but with a positive AI screen (i.e., ‘false positives’), there was a 4-fold risk of developing future ventricular dysfunction (hazard ratio 4.1; 95% CI, 3.3-5.0, p<0.001) compared to persons with a negative screen over the following median 3.4 years (interquartile range, 1.2-6.8 years). The authors concluded that application of their AI-ECG as a screening tool permits a simple and inexpensive 12-lead ECG to be a powerful screening tool in asymptomatic individuals with asymptomatic left ventricular dysfunction with EF <35%.25
A study from South Korea used a deep learning model to detect heart failure with preserved ejection fraction (HFpEF – Heart Failure [with] preserved Ejection Fraction). Their model yielded an AUC of 0.87 (95% CI, 0.85-0.88). In individuals without HFpEF on initial echocardiography, patients whose deep learning model indicated a higher risk proved, in fact, to have a significantly higher chance of developing HFpEF in the future than those in the low-risk group (34% vs 8%, p<0.001).26
Another study examined routine ECGs, done as part of primary care in a group of patients without a prior diagnosis of heart failure, to reveal occult heart failure with application of their AI-ECG model. The purpose of the study was to evaluate, in a real-world clinical practice, the potential utility of AI-ECG results. The primary outcome was a new diagnosis of low EF (≤50%) in an asymptomatic population. The intervention of using AI-ECG analysis increased the diagnosis of low EF in the overall cohort (1.6% undergoing usual care versus 2.1% in the intervention arm (odds ratio [OR] 1.32 (95% CI, 1.01-1.61, p=0.0007) when echocardiography was ordered because of the AI-ECG. The number of ordered echocardiography was not different in the two arms. The OR increased to 1.43 (95% CI, 1.08-1.91, p=0.01) when the AI-ECG suggested a high likelihood of low EF.27
A study of 2,454 Mayo Clinic individuals volunteered to send their single-lead ECGs from their Apple iPhone for analysis. Of this number, 421 participants had at least one watch-classified sinus rhythm ECG within 30 days of undergoing an echocardiogram, of whom 16 (3.8%) were found to have an EF ≤40% (HFrEF). The AI algorithm detected patients with low EF with an AUC of 0.89 (95% CI, 0.82-95). It is remarkable that this method has been successfully modified and applied to a single lead model (Apple Watch). “This illustrates that the 12-lead AI-ECG model can be modified and extrapolated to a single lead, portable device, showing the massive scalability and various utility of the consumer grade wearable recorded AI-ECG acquired in nonclinical environments.” More than 120,000 ECGs were analyzed in the creation of their model.28
LV dysfunction can be caused by frequent premature ventricular complexes (PVCs). PVCs are prevalent and, although often benign, they may lead to PVC-induced cardiomyopathy. A deep-learning algorithm was designed to predict HFrEF in persons with PVCs from a 12-lead ECG. The primary outcome was first diagnosis of LVEF ≤40% within 6 months. Among over 14,000 patients (age 67.6 ± 115 years) 22.9% experienced such reductions, with an AUC 0.79 (95% CI: 0.77-0.81). The weighted class activation map explainability (that we discuss later) framework highlighted the sinus rhythm QRS complex-ST segment. In patients who underwent successful PVC ablation there was a post-ablation improvement in LVEF with resolution of cardiomyopathy in 89% of patients.29
Key Findings
Application of specific AI-ECG algorithm to the standard electrocardiogram ECG (12-lead or single lead ECG) enables:
Detection/diagnosis of asymptomatic LV dysfunction (LVD)
To reveal occult stages of heart failure (HFrEF and HFpEF), and the
Ability to predict future risk for developing LVD.
Studies demonstrate that the 12-lead model can be modified to a single lead model (and be used in nonclinical environments) indicating the potential for massive scalability and various utility of the consumer grade wearable recorded AI-ECG.
Screening for Aortic Valve Stenosis
Aortic stenosis (AS) is usually managed by aortic valve replacement (AVR) when a patient develops symptoms.30 If AVR is not performed at an appropriate time, AS can lead to heart failure or sudden death.31 Recently, benefit of early aortic valve repair has been demonstrated even in asymptomatic patients with severe aortic stenosis and in a subgroup with moderate AS where the long-term outcome was found to be poor.32,33 Systolic murmur is documented in <50% of patients with moderate or severe AS34 suggesting a need for an inexpensive and readily available screening tool for this disorder.
Cohen-Shelly et al.35 combined 258,607 patients with paired ECG and TEE studies into a training set (50%), a validation set (10%) and a testing set (40%) to screen for aortic stenosis. The AUC was 0.85 in both the validation and testing groups. In the testing group, 3.7% of patients were labelled as AI-ECG-positive AS with sensitivity 78%, specificity 74%, and accuracy 74% for predicting echo-positive AS. Because of the low prevalence (4%) of AS in the population studied, the positive predictive value was low at 10.5% but the negative predictive value was high at 98.9%. Of a total of 102,926 patients in the testing group, true positive was present in 3% (n=2995), true negative in 71% (n=73,624), false positive in 25% (n=25,469) and false negative in 1% (n=838). Patients with false positive findings more frequently had hypertension and renal disease compared with other groups (p<0.0001). Although hypertension shares similar ECG changes with those with AS, the AUC was found to be 0.81 in patients with hypertension and 0.88 in those without hypertension, and their AUC was 0.89 for those without any comorbidities (n=31,484 – 31%). The authors concluded that the AI-ECG successfully identified patients with moderate to severe AS with high performance (AUC 0.85). However, with the prevalence of AS being only 4%, the positive predictive value was low, raising the possibility of unnecessary echocardiography done in the AI-ECG patients. This consideration would be balanced with the excellent negative predictive value that should reduce the number of unnecessary imaging in patients with a non-significant murmur.
Key Findings
An AI-ECG can identify patients with moderate or severe AS and may serve as a powerful screening tool for AS
Although the positive predictive value is low due to low prevalence of AS
The negative predictive value is excellent (close to 99%). It has important clinical value: AI-ECG can be used for excluding asymptomatic moderate-severe AS
Screening for Hypertrophic Cardiomyopathy
Hypertrophic cardiomyopathy (HCM) is among the leading causes of sudden cardiac death among adolescents and young adults and is associated with significant morbidity in all age groups.36 Although the implications of an HCM diagnosis are important for sudden cardiac death risk stratification, genetic counseling, family screening, and longitudinal clinical follow-up, the condition is uncommon, affecting approximately 1 in 200-500 individuals.37
More than 90% of persons with HCM have ECG abnormalities (LVH, left axis deviation, prominent Q waves, and ST-changes), but these non-specific findings have non-specific diagnostic performance to justify routine ECG screening (and up to 10% have normal ECGs).
In a study using 3060 expertly diagnosed patients from a HCM clinic and comparing that with a control group of patients who had both ECG and echocardiographic data, 70% of this population were involved in training, 10% in validation, and then 20% in testing of the model. The validation dataset had an AUC of 0.95 (95% CI, 0.94-0.97) when using the optimal probability threshold of 11%. When applying this probability threshold in the test dataset, the AI-ECG AUC was 0.96 (95% CI, 0.95-0.96) with sensitivity 87%, specificity 90%, positive predictive value 31%, and negative predictive value 99%. When higher HCM probability thresholds were applied, the performance characteristics changed to favor specificity and to reduce the false-positive rate. For example, with a probability threshold of 75% (instead of 11%), specificity was 99% and false-positive rate was 1%. The false-positive rate was low, and the negative predictive value was high, across all tested thresholds. Their AUC for predicting HCM when the population of interest was restricted only to patients with a normal ECG was 0.95 (95% CI, 0.90-1.00), with sensitivity of 93%, specificity of 87%, positive predictive value 31% and negative predictive value 99% at an HCM probability threshold of 11%.38
Key Findings
ECG-based detection of HCM by an AI algorithm can be achieved with high diagnostic performance, particularly in younger persons.
Negative predictive values remained high regardless of thresholds selected for probability.
AI-ECG METHODOLOGY AND TERMINOLOGY
Review of Methodology
AI-enhanced ECG interpretation uses machine learnings’ convolutional neural networks (CNNs) that are “trained” on tens of thousands or even hundreds of thousands of normal ECGs that are labelled with specific cardiac conditions in question (as discussed above). The ultimate goal is that the trained network would find or predict that condition from an ECG that the network has never seen previously. Once the network is trained internal and external validations are important next steps. The number of algorithms for detecting various conditions has been increasing and eventually all can be applied simultaneously.
There have been additional or coincidental findings. For example, from an apparently normal ECG, a patient’s undisclosed sex can be determined with startling precision (AUC 0.97), and estimation of a subjects age that exceeds their actual age was found to be attributed to the existence of left ventricular dysfunction, silent arrhythmias such as atrial fibrillation, and various valvular heart conditions, or perhaps other conditions.39
Some Definitions
The term machine learning (ML) refers to replacing algorithms that define the relationship between inputs and outputs using man-made rules with statistical tools that identify (learn) the most probable relationship between input and output based on repeated idealized target data elements. It finds similar features and patterns in a large dataset. Convolutional neural networks (CNNs) are modeled upon the human brain’s network of neurons, with each AI neuron being a simple mathematical equation with parameters that are adjusting during network training. When these electrical neurons are connected in many layers, it is referred to as a deep learning network (DL). The CNN is trained with large ECG datasets. The input is the ECG voltage (in time, a continuous input) and the output is finding a specific condition. A specific condition is used as a label for the input (ECG) – therefore it is supervised learning for each condition. Simply, the input is matched to a specific output. Based on set thresholds, in the last layer of the network the probability of findings is translated to a dichotomized “yes” or “no” outcome.40
During the learning phase, parameters (sometimes called ‘weights’) are constantly being adjusted to minimize the “errors”, the difference between the estimated output and the known outputs (i.e., input being a normal ECG and the output being atrial fibrillation) (see Figure 3). The most common way to adjust these weights is by applying a backpropagation process method, which adjusts each parameter weight based on its effect on the error, with the network weights adjusted until an error minimum is found.2 This ‘learning experience’ subsequently will be utilized during the ongoing self-training.
For binary models in which the output is “yes/no”, such as the presence of silent atrial fibrillation (AF) determined from an ECG in normal sinus rhythm, the input numbers (i.e., digitizing of the ECG’s voltage and other parameters) are used by the network to calculate the probability of silent AF. The output will be a single number, ranging from ‘no’ (no silent AF) to ‘yes’ (silent AF present). To dichotomize the output (to ‘yes’ or ‘no’), a threshold value determines whether the model output is ‘positive’ or ‘negative.’ By adjusting the threshold, the test can be made more sensitive (with more samples considered positive and fewer missed events, but at the cost of a higher number of false positives) or more specific (with fewer false positives but more missed events).41 (See Figure 3).
Continuous electrocardiogram voltage points (red dots, arrows) are fed to ‘input neurons’ (x0, x1, x2, … xm) which are coded as software objects. Hidden neurons within this three-layer network (h0, h1 ,h2……hn) connect input and output layer neurons by numerical weights (w). The output indicates atrial fibrillation (y1; correct, red) or non-atrial fibrillation (y0).41
Since the training of supervised ML models requires only labelled data (e.g. ECGs) and no explicit rules, machines learn to solve tasks that humans simply cannot ‘see.’ But what exactly is it ‘seeing’? Extensive research is being invested in this challenge. One illustration of these models is by looking at specific examples and highlighting the parts of the input that contributed most to the final output. An example is the Grad-CAM method, with network gradients used to produce a coarse localization of drivers of the output.42
Because of the complexity of explaining how AI is interpreting ECGs, we commonly refer to these inner workings as a “black box” between input (normal sinus rhythm ECG) and output (i.e., probability of silent or pending or past AF). Unlike the traditional risk-prediction models that comprise predefined variables, the CNN is described as “agnostic,” because we do not know what ECG features the CNN is “seeing” and which factor drive its performance. The performance of the algorithm is likely to be based on a combination of ECG signatures that are known risk factors (for example, for AF, the P-wave amplitude, width, notches, atrial ectopy, LV hypertrophy, and heart rate variability) as well as others that are currently unknown, or are not obvious to the human eye, in combination, in a non-linear manner.18 Beside the ‘black box’ transparency concerns, various challenges do exist around AI-techniques (i.e., generalizability, liability, regulatory approval, etc.) that exceeds the focus of the current paper.
The overall performance for the AI power were measured by the area under the curve (AUC) under the Receiver Operating Characteristic (ROC). See appendix for explanation.
AI-ECG is proving to be a tool for advanced ECG interpretation and a potential tool for profiling cardiac health and diseases. AI-ECG implementation is still in its early stages. Like any medical tool, AI-ECG requires careful vetting, validation, and clinician training. When seamlessly integrated into medical practice, AI-ECG shows potential to revolutionize clinical care.
Conclusion
AI applications now have been extended to the domain of ECG (a routine 12-lead ECG and even a single-lead tracing from an iPhone). It enables diagnostic and predictive capabilities that exceed human’s ability to ‘see’ important changes on ECGs. AI utilizes deep learning technique using convolutional neural networks (CNNs) to extract and find hidden information, subtle signals, and patterns in vast datasets – without relying on predefined rules. These are leading to discovery or forecast of subclinical conditions. AI-ECG (12-lead or single lead wearables) is predicted to be seamlessly integrated into future clinical workflows for better screening and predicting tool in both symptomatic and asymptomatic patient populations. But vigorous validation, appropriate training and a legal framework are still required before we see these techniques in clinical practice.
APPENDIX
Appendix A: The AUROC
Most AI-ECG models use area under the receiver operating characteristic (AUROC) for statistical powering of their models. The Receiver Operating Characteristic (ROC) was coined from British radar in WWII, analyzing radar data to differentiate between enemy aircraft and flocks of geese. As the sensitivity of the receiver was increased (to better find enemy aircraft), the number of false positive (i.e., the specificity, or number of false positives) worsened. The area under the curve (AUC) shows sensitivity [Y axis] versus 1-specificity [x-axis]. The sensitivity is the true-positive rate (TP/TP + FN) and 1-specificity is the false-positive rate (FP/FP + TN). The area under the curve (AUC) quantifies the overall performance of the ROC across all possible thresholds, striving for the highest TPR while maintain the lowest possible FPR. AUC=1 refers to a perfect classifier while AUC=0.5 is random chance (Figure 4). The Area Under the Curve (AUC) simply measures how well the model separates two classes.
Appendix B: Accuracy versus AUC
Accuracy
Think of accuracy like hitting a bullseye on a target. It's a simple and intuitive measure – it tells you the percentage of times your model makes the correct prediction (both positive and negative).
It's easy to understand – higher accuracy is generally better.
Area Under the ROC Curve (AUC)
Imagine a different kind of target, where the center represents perfectly distinguishing positives from negatives, and the edges represent purely random guessing.
The ROC curve plots how well your model performs across different thresholds for classifying something as positive. AUC tells you how much of that target area your model captures.
A higher AUC means your model can consistently differentiate positives from negatives, regardless of the chosen threshold.
Why is AUC More Commonly Used in Assessing AI-ECGs?
Imbalanced data: If you have a lot more of one class (e.g., healthy patients) than the other (e.g., sick patients), a model might just predict everyone as healthy (the majority class) and achieve high accuracy. AUC avoids this by considering all possibilities.
AUC focuses on both positives and negatives: A good model needs to identify both positive and negative cases well. AUC considers both true positives and false positives, giving a more complete picture.
Note: some studies refer to AUC as AUROC – Area Under (Curve of) of Receiver Operating Characteristic.
References
Competing Interests
Conflicts of Interest: Neither author acknowledged any financial or other conflicts of interest in producing this treatise.