Recent artificial intelligence (AI) advancements in cardiovascular medicine offer potential enhancements in diagnosis, prediction, treatment, and outcomes. This article aims to provide a basic understanding of AI enabled ECG technology. Specific conditions and findings will be discussed, followed by reviewing associated terminology and methodology. In the appendix, definitions of AUC versus accuracy are explained. The application of deep learning models enables detecting diseases from normal electrocardiograms at accuracy not previously achieved by technology or human experts. Results with AI enabled ECG are encouraging as they considerably exceeded current screening models for specific conditions (i.e., atrial fibrillation, left ventricular dysfunction, aortic stenosis, and hypertrophic cardiomyopathy). This could potentially lead to a revitalization of the utilization of the ECG in the insurance domain. While we are embracing the findings with this rapidly evolving technology, but cautious optimism is still necessary at this point.

Editor’s comment

A companion article in the Journal of Insurance Medicine1 discusses Artificial Intelligence (AI)’s ability to read a simple PA chest radiographic (CXR) and successfully screen for type 2 diabetes mellitus, low-ejection fraction heart failure, valvular heart disease, and prediction of mortality in asymptomatic persons with chronic lung disease. This current article is in follow-up to Dr. Posan’s and Dr. Ross MacKenzie’s presentation Advanced EKGs and the Potential Role of AI-Augmented EKG interpretation at the American Academy of Insurance Medicine’s annual meeting in Washington, D.C. in October 2023, which highlighted the potential for AI-enabled ECG.

Methodology

Key articles over the past four years appearing in the National Library of Medicine database (PubMed) served as the major source of information for this treatise. Each article is well-referenced, allowing readers to delve more deeply into whatever study is most beneficial.

Background

The ECG is a more than 100-year-old technology (invented in 1901 by Willem Einthoven) and remains the most performed cardiac test worldwide. Life and disability insurance companies are less frequently requiring ECGs with applications, likely because of the cost and “hassle” of obtaining the ECG, and because of potential reader variability in interpreting ECGs.

Interpretation of an ECG requires specific skills and knowledge. Computer-generated interpretations have been used in health systems for years, but these interpretations are based on predefined rules and manual pattern or feature recognition algorithms that do not capture the complexity and nuances of an ECG. For example, ECGs have heretofore been useful at identifying specific features, such as ST-segment elevation for acute myocardial infarction, T-wave changes to suggest critical serum potassium elevations or pericarditis, and other gross deviations to identify specific clinical entities. However, AI-enabled ECGs (AI-ECG) may represent a quantum-leap not only in the reliability and consistency in interpretation but also in capabilities of diagnosing and predicting future conditions that could never be detected previously.

AI-ECGs may increase the reliability and predictability of a simple 12-lead ECG and even a 1-lead rhythm recording from a mobile or wearable device such an iPhone. This revolution may open new opportunities in the utilization of this centuries-old classic tool. Failure of the insurance industry to recognize the potential for adverse risk assessment could prove costly.

AI can be applied in two ways to the ECG. In one, currently performed human interpretation skills, such as diagnosing acute myocardial infarction, bundle branch blocks, etc. but with greater diagnostic performance that is faster, more consistent, and more accurate than previously possible. The second utilization is to extract information from an ECG that is imperceptible to the human eye. This paper will focus on the latter approach.

The ECG is the aggregate of the electrical activity of millions of individual cardiomyocytes recorded from the body’s surface. In most conditions the underlying pathophysiological process begins many years in advance of clinical presentation. Numerous distinct biological factors can subtly and nonlinearly impact cardiomyocyte electrical function resulting in subclinical ECG alterations not evident on a normal 12-lead sinus rhythm ECG.

The fundamental hypothesis of AI-enabled ECG diagnosis is that the disease process will initially affect the voltage (in time) before it effects other imaging technologies, and those changes (at a cellular level) could be reliably detected by a properly trained neural convolutional network (CNN) (see Figure 1).

Figure 1.

AI application to the standard ECG enables to diagnose conditions not previously identifiable by an ECG.

Figure 1.

AI application to the standard ECG enables to diagnose conditions not previously identifiable by an ECG.

Close modal

“The CNN is trained without hard-coded rules by finding often subclinical patterns in huge datasets, AI transforms the ECG into a screening tool and predictor of cardiac and non-cardiac diseases, often in asymptomatic individuals.”2 

The ECG is an ideal substrate for deep-learning AI applications. The ECG is widely available and yields reproducible raw data that are easy to store and transfer in a digital format. In addition to fully automated interpretation of the ECG, rigorous research programmers using large databanks of ECG and clinical datasets coupled with powerful computational capabilities have been able to demonstrate the utility of the AI-enhanced ECG (henceforth referred to as the AI-ECG) as a tool for the detection of ECG signatures and patterns that are unrecognizable by the human eye.

Screening to Detect Silent Atrial Fibrillation

Atrial fibrillation (AF) is the most common cardiac arrhythmia, affecting one-quarter of patients older than 80 years.3  With an aging and growing population, the prevalence of AF is expected to rise. Patients with AF are 5 times more likely to experience a stroke and have up to a 25% risk of dying within 30 days of stroke.4  It can be episodic and at least one-third of cases are asymptomatic.5  Asymptomatic AF is associated with an increased risk of cardiovascular (HR 3.12, 95% CI 1.50-6.45) and all-cause mortality (HR 2.96, 95% CI 1.89-4.64) compared to typical AF after adjustment for CHA2DS2VASc score and age.6  Other studies, however, have found no significant morbidity or mortality differences’ between persons with asymptomatic and symptomatic AF.7  Among patients who experience an acute stroke of unknown origin (cryptogenic stroke), one-fifth will be found to have occult AF.8  Atrial fibrillations also may cause long-term changes in cardiac structure, including atrial dilation, atrial remodeling, and ventricular function deterioration, which can result in permanent AF and heart failure.9,10 

Effective clinical management can mitigate the complications of AF. Oral anticoagulation reduces the relative risk of stroke by two-thirds but only in patients proven to have non-valvular AF.11  Early intervention with antiarrhythmic medications or ablation may prevent more permanent AF and reduce symptoms and stroke risk.12,13 

For further prevention of stroke in patients with cryptogenic stroke (i.e., stroke of unknown etiology) current screening measures include various cardiac monitoring systems (Holter, loop-recording, etc.), and CHA2DS2-VASc scores (see Table 1) of ≥2 in men and ≥3 in women, all of which have a poor diagnostic yield (detecting only 9% of patients with insertable cardiac monitor at 6 months and CHA2DS2-VASc having an AUC of 0.52 for stroke prediction).14,15 

Table 1.

CHA2DS2-VASc Score for Atrial Fibrillation Stroke Risk

CHA2DS2-VASc Score for Atrial Fibrillation Stroke Risk
CHA2DS2-VASc Score for Atrial Fibrillation Stroke Risk

For asymptomatic adults 50 years or older, the USPSTF (United States Preventive Services Task Force) concludes that the current evidence is insufficient to assess the balance of benefits and harms of screening for atrial fibrillation.16,17 

Since neural networks can detect multiple, subtle, non-linear-related patterns in an ECG, a group at the Mayo Clinic hypothesized that they may be able to detect the presence of intermittent AF from a normal sinus rhythm ECG. The authors theorized that patients with AF may have subclinical ECG changes associated with fibrosis or transient physiologic changes. Approximately 1 million ECGs from patients with no AF (controls) and patients with episodic AF (cases) were studied. This was a retrospective study. The network was never shown ECGs with AF, but only normal sinus rhythm ECGs from patients with episodic AF and from controls. After training, the AI-ECG network accurately detected paroxysmal AF from an ECG recorded during normal sinus rhythm (accuracy of 79.3%; AUC 0.87) [seeappendix B for definitions of AUC vs. accuracy]. When the ECG from the patients’ “window of interest” (31-day period) prior to first ECG showing AF was evaluated, the accuracy of the AI-ECG algorithm improved (accuracy 83%; AUC 0.90).18 

In another, prospective study from the Mayo Clinic, 1003 patients (mean age 74 with stroke risk factors (CHA2DS2-VACc scores ≥2 in men and ≥3 in women) with no history of AF but with ECGs underwent continuous ambulatory heart rhythm monitoring for up to 30 days. Their AI-ECG algorithm for AF detection applied to the patients’ ECGs divided patients into a high-risk or low-risk group. The primary outcome was newly diagnosed AF. Over a mean 22.3 days of continuous monitoring, atrial fibrillation was detected in 6 of 370 (1.6%) low-risk patients and 48 (7.6%) of 633 high risk patients (odds ratio 4.98, 95% CI 2.11, 11-75, p=0.0002). Subsequent AI-guided screening was associated with increased detection of atrial fibrillation: 10.6% (95% CI 8.3, 3-13.2) in high-risk group vs 3.6% (95% CI 2.3, 3-5.4) in the low-risk group (p<0.0001) over a median follow-up of 9.9 months. Adding CHARGE-AF19  (a 5-year predictive model that includes the variables of age, race, height, weight, systolic and diastolic blood pressure, current smoking, use of antihypertensive medication, diabetes mellitus, history of myocardial infarction and heart failure) on top of the AI-ECG risk score did not improve the discrimination compared with the AI alone. This was the first study which found that the AI algorithm was able to further risk stratify this at-risk-of-stroke population beyond traditional clinical risk factors (age, CHA2DS2VASc scores, CHARGE-AF risk scores) and the first to evaluate the effectiveness of AI-guided targeted screening programs in comparison with usual care.20 

A large (907,858 ECGs from 6 US Veterans Affairs (VA) hospital networks) study applied a deep-learning model to predict the presence of AF within 31 days of a sinus rhythm ECG. Because the demographics at the VA sites were 94% male, the validation study was done at a large non-VA medical center (72,483 ECGs). The mean CHA2DS2VASc score was 1.6. The AUC was 0.93 (95% CI, 0.93-0.94) with accuracy 0.87 (95% CI, 0.86-0.88). Among individuals deemed high risk by deep learning, the number needed to screen to detect a positive case of AF was 2.47 individuals for a testing sensitivity of 25%, and 11.48 for testing sensitivity of 75%. Model performance was similar in patients who were Black, female, or younger than 65 years or who had CHA2DS2-VASc scores of 2 or greater.21 

Key Findings

  • AI-ECG applied to a normal sinus rhythm ECG may permit identification of individuals experiencing AF either previously or occurring in the future (i.e., current ECG shows sinus rhythm, but atrial fibrillation is present at other times).

  • In patients at risk for AF, AI-ECG is strongly predictive of concurrent AF within 30 days of ECG during normal sinus rhythm with high degree of accuracy – suggesting that an ECG (current or stored) may be a surrogate for prolonged rhythm monitoring.

  • AI algorithm can risk-stratify a relatively uniform population (e.g., older adults at risk for stroke) to detect undiagnosed atrial fibrillation during short-term cardiac monitoring.

  • An AI-guided targeted screening approach that leverages existing clinical data (i.e., patients with stroke risk factors) increased the yield for atrial fibrillation detection with subsequent monitoring.

Screening for Left Ventricular Systolic Dysfunction

Asymptomatic left ventricular dysfunction is relatively prevalent, present in 3-6% of the general population and 9% in the elderly, is associated with increased mortality, and is treatable when found.22  It is, however, difficult to diagnose because there are no warning symptoms and because the current tools (echocardiogram, biomarker testing with natriuretic peptides) are not widely available and/or cost effective. Screening for left ventricular dysfunction with BNP and N-terminal pro b-type natriuretic peptide (NTproBNP) has been found to have AUCs of 0.6 for BNP and 0.7 for NTproBNP.23  Utilization of AI augmented ECG signals in screening for LV dysfunction could provide a valuable new tool.

Sangha et al.24  used 385,601 normal sinus rhythm ECGs with paired low EF <40% (HFrEF – Heart Failure [with] reduced Ejection Fraction) echocardiograms in both in-hospital and outpatient settings. Their AI-ECG model yielded AUCs of 0.88-0.94 in testing in a number of large medical centers in the U.S. An AI-ECG suggestive of LV systolic dysfunction portended >27-fold higher odds of LV systolic dysfunction on a subsequent transthoracic echocardiogram (odds ratio, 27.5 [95% CI, 22.3-33.9]). A positive AI-ECG in individuals with an LV ejection fraction ≥40% by echocardiogram at the time of initial assessment was associated with a 3.9-fold increased risk of developing incident LV systolic dysfunction in the future (hazard ratio, 3.9 [95% CI, 3.3-4.7]; median follow-up was 3.2 years.

Another study from the Mayo Clinic trained a convolutional neural network on paired 12-lead ECG and echocardiogram ejection fractions from 44,959 patients with validation testing on an independent set of 52,870 patients. The network model yielded AUC of 0.93 (see Figure 2), with sensitivity of 86.3%, specificity of 85.7%, and accuracy of 85.7%.25 

Figure 2.

Asymptomatic Left Ventricular Dysfunction Detection-CNN (Convolutional Neural Network) Model TEST PERFORMANCE AUC: 0.93 (see appendix for explanation of AUC).

Figure 2.

Asymptomatic Left Ventricular Dysfunction Detection-CNN (Convolutional Neural Network) Model TEST PERFORMANCE AUC: 0.93 (see appendix for explanation of AUC).

Close modal

In patients without ventricular dysfunction but with a positive AI screen (i.e., ‘false positives’), there was a 4-fold risk of developing future ventricular dysfunction (hazard ratio 4.1; 95% CI, 3.3-5.0, p<0.001) compared to persons with a negative screen over the following median 3.4 years (interquartile range, 1.2-6.8 years). The authors concluded that application of their AI-ECG as a screening tool permits a simple and inexpensive 12-lead ECG to be a powerful screening tool in asymptomatic individuals with asymptomatic left ventricular dysfunction with EF <35%.25 

A study from South Korea used a deep learning model to detect heart failure with preserved ejection fraction (HFpEF – Heart Failure [with] preserved Ejection Fraction). Their model yielded an AUC of 0.87 (95% CI, 0.85-0.88). In individuals without HFpEF on initial echocardiography, patients whose deep learning model indicated a higher risk proved, in fact, to have a significantly higher chance of developing HFpEF in the future than those in the low-risk group (34% vs 8%, p<0.001).26 

Another study examined routine ECGs, done as part of primary care in a group of patients without a prior diagnosis of heart failure, to reveal occult heart failure with application of their AI-ECG model. The purpose of the study was to evaluate, in a real-world clinical practice, the potential utility of AI-ECG results. The primary outcome was a new diagnosis of low EF (≤50%) in an asymptomatic population. The intervention of using AI-ECG analysis increased the diagnosis of low EF in the overall cohort (1.6% undergoing usual care versus 2.1% in the intervention arm (odds ratio [OR] 1.32 (95% CI, 1.01-1.61, p=0.0007) when echocardiography was ordered because of the AI-ECG. The number of ordered echocardiography was not different in the two arms. The OR increased to 1.43 (95% CI, 1.08-1.91, p=0.01) when the AI-ECG suggested a high likelihood of low EF.27 

A study of 2,454 Mayo Clinic individuals volunteered to send their single-lead ECGs from their Apple iPhone for analysis. Of this number, 421 participants had at least one watch-classified sinus rhythm ECG within 30 days of undergoing an echocardiogram, of whom 16 (3.8%) were found to have an EF ≤40% (HFrEF). The AI algorithm detected patients with low EF with an AUC of 0.89 (95% CI, 0.82-95). It is remarkable that this method has been successfully modified and applied to a single lead model (Apple Watch). “This illustrates that the 12-lead AI-ECG model can be modified and extrapolated to a single lead, portable device, showing the massive scalability and various utility of the consumer grade wearable recorded AI-ECG acquired in nonclinical environments.” More than 120,000 ECGs were analyzed in the creation of their model.28 

LV dysfunction can be caused by frequent premature ventricular complexes (PVCs). PVCs are prevalent and, although often benign, they may lead to PVC-induced cardiomyopathy. A deep-learning algorithm was designed to predict HFrEF in persons with PVCs from a 12-lead ECG. The primary outcome was first diagnosis of LVEF ≤40% within 6 months. Among over 14,000 patients (age 67.6 ± 115 years) 22.9% experienced such reductions, with an AUC 0.79 (95% CI: 0.77-0.81). The weighted class activation map explainability (that we discuss later) framework highlighted the sinus rhythm QRS complex-ST segment. In patients who underwent successful PVC ablation there was a post-ablation improvement in LVEF with resolution of cardiomyopathy in 89% of patients.29 

Key Findings

Application of specific AI-ECG algorithm to the standard electrocardiogram ECG (12-lead or single lead ECG) enables:

  • Detection/diagnosis of asymptomatic LV dysfunction (LVD)

  • To reveal occult stages of heart failure (HFrEF and HFpEF), and the

  • Ability to predict future risk for developing LVD.

  • Studies demonstrate that the 12-lead model can be modified to a single lead model (and be used in nonclinical environments) indicating the potential for massive scalability and various utility of the consumer grade wearable recorded AI-ECG.

Screening for Aortic Valve Stenosis

Aortic stenosis (AS) is usually managed by aortic valve replacement (AVR) when a patient develops symptoms.30  If AVR is not performed at an appropriate time, AS can lead to heart failure or sudden death.31  Recently, benefit of early aortic valve repair has been demonstrated even in asymptomatic patients with severe aortic stenosis and in a subgroup with moderate AS where the long-term outcome was found to be poor.32,33  Systolic murmur is documented in <50% of patients with moderate or severe AS34  suggesting a need for an inexpensive and readily available screening tool for this disorder.

Cohen-Shelly et al.35  combined 258,607 patients with paired ECG and TEE studies into a training set (50%), a validation set (10%) and a testing set (40%) to screen for aortic stenosis. The AUC was 0.85 in both the validation and testing groups. In the testing group, 3.7% of patients were labelled as AI-ECG-positive AS with sensitivity 78%, specificity 74%, and accuracy 74% for predicting echo-positive AS. Because of the low prevalence (4%) of AS in the population studied, the positive predictive value was low at 10.5% but the negative predictive value was high at 98.9%. Of a total of 102,926 patients in the testing group, true positive was present in 3% (n=2995), true negative in 71% (n=73,624), false positive in 25% (n=25,469) and false negative in 1% (n=838). Patients with false positive findings more frequently had hypertension and renal disease compared with other groups (p<0.0001). Although hypertension shares similar ECG changes with those with AS, the AUC was found to be 0.81 in patients with hypertension and 0.88 in those without hypertension, and their AUC was 0.89 for those without any comorbidities (n=31,484 – 31%). The authors concluded that the AI-ECG successfully identified patients with moderate to severe AS with high performance (AUC 0.85). However, with the prevalence of AS being only 4%, the positive predictive value was low, raising the possibility of unnecessary echocardiography done in the AI-ECG patients. This consideration would be balanced with the excellent negative predictive value that should reduce the number of unnecessary imaging in patients with a non-significant murmur.

Key Findings

  • An AI-ECG can identify patients with moderate or severe AS and may serve as a powerful screening tool for AS

  • Although the positive predictive value is low due to low prevalence of AS

  • The negative predictive value is excellent (close to 99%). It has important clinical value: AI-ECG can be used for excluding asymptomatic moderate-severe AS

Screening for Hypertrophic Cardiomyopathy

Hypertrophic cardiomyopathy (HCM) is among the leading causes of sudden cardiac death among adolescents and young adults and is associated with significant morbidity in all age groups.36  Although the implications of an HCM diagnosis are important for sudden cardiac death risk stratification, genetic counseling, family screening, and longitudinal clinical follow-up, the condition is uncommon, affecting approximately 1 in 200-500 individuals.37 

More than 90% of persons with HCM have ECG abnormalities (LVH, left axis deviation, prominent Q waves, and ST-changes), but these non-specific findings have non-specific diagnostic performance to justify routine ECG screening (and up to 10% have normal ECGs).

In a study using 3060 expertly diagnosed patients from a HCM clinic and comparing that with a control group of patients who had both ECG and echocardiographic data, 70% of this population were involved in training, 10% in validation, and then 20% in testing of the model. The validation dataset had an AUC of 0.95 (95% CI, 0.94-0.97) when using the optimal probability threshold of 11%. When applying this probability threshold in the test dataset, the AI-ECG AUC was 0.96 (95% CI, 0.95-0.96) with sensitivity 87%, specificity 90%, positive predictive value 31%, and negative predictive value 99%. When higher HCM probability thresholds were applied, the performance characteristics changed to favor specificity and to reduce the false-positive rate. For example, with a probability threshold of 75% (instead of 11%), specificity was 99% and false-positive rate was 1%. The false-positive rate was low, and the negative predictive value was high, across all tested thresholds. Their AUC for predicting HCM when the population of interest was restricted only to patients with a normal ECG was 0.95 (95% CI, 0.90-1.00), with sensitivity of 93%, specificity of 87%, positive predictive value 31% and negative predictive value 99% at an HCM probability threshold of 11%.38 

Key Findings

  • ECG-based detection of HCM by an AI algorithm can be achieved with high diagnostic performance, particularly in younger persons.

  • Negative predictive values remained high regardless of thresholds selected for probability.

Review of Methodology

AI-enhanced ECG interpretation uses machine learnings’ convolutional neural networks (CNNs) that are “trained” on tens of thousands or even hundreds of thousands of normal ECGs that are labelled with specific cardiac conditions in question (as discussed above). The ultimate goal is that the trained network would find or predict that condition from an ECG that the network has never seen previously. Once the network is trained internal and external validations are important next steps. The number of algorithms for detecting various conditions has been increasing and eventually all can be applied simultaneously.

There have been additional or coincidental findings. For example, from an apparently normal ECG, a patient’s undisclosed sex can be determined with startling precision (AUC 0.97), and estimation of a subjects age that exceeds their actual age was found to be attributed to the existence of left ventricular dysfunction, silent arrhythmias such as atrial fibrillation, and various valvular heart conditions, or perhaps other conditions.39 

Some Definitions

The term machine learning (ML) refers to replacing algorithms that define the relationship between inputs and outputs using man-made rules with statistical tools that identify (learn) the most probable relationship between input and output based on repeated idealized target data elements. It finds similar features and patterns in a large dataset. Convolutional neural networks (CNNs) are modeled upon the human brain’s network of neurons, with each AI neuron being a simple mathematical equation with parameters that are adjusting during network training. When these electrical neurons are connected in many layers, it is referred to as a deep learning network (DL). The CNN is trained with large ECG datasets. The input is the ECG voltage (in time, a continuous input) and the output is finding a specific condition. A specific condition is used as a label for the input (ECG) – therefore it is supervised learning for each condition. Simply, the input is matched to a specific output. Based on set thresholds, in the last layer of the network the probability of findings is translated to a dichotomized “yes” or “no” outcome.40 

During the learning phase, parameters (sometimes called ‘weights’) are constantly being adjusted to minimize the “errors”, the difference between the estimated output and the known outputs (i.e., input being a normal ECG and the output being atrial fibrillation) (see Figure 3). The most common way to adjust these weights is by applying a backpropagation process method, which adjusts each parameter weight based on its effect on the error, with the network weights adjusted until an error minimum is found.2  This ‘learning experience’ subsequently will be utilized during the ongoing self-training.

Figure 3.

Neural network design to classify atrial fibrillation from the electrocardiogram (Modified figure from reference 41).

Figure 3.

Neural network design to classify atrial fibrillation from the electrocardiogram (Modified figure from reference 41).

Close modal

For binary models in which the output is “yes/no”, such as the presence of silent atrial fibrillation (AF) determined from an ECG in normal sinus rhythm, the input numbers (i.e., digitizing of the ECG’s voltage and other parameters) are used by the network to calculate the probability of silent AF. The output will be a single number, ranging from ‘no’ (no silent AF) to ‘yes’ (silent AF present). To dichotomize the output (to ‘yes’ or ‘no’), a threshold value determines whether the model output is ‘positive’ or ‘negative.’ By adjusting the threshold, the test can be made more sensitive (with more samples considered positive and fewer missed events, but at the cost of a higher number of false positives) or more specific (with fewer false positives but more missed events).41  (See Figure 3).

Continuous electrocardiogram voltage points (red dots, arrows) are fed to ‘input neurons’ (x0, x1, x2, … xm) which are coded as software objects. Hidden neurons within this three-layer network (h0, h1 ,h2……hn) connect input and output layer neurons by numerical weights (w). The output indicates atrial fibrillation (y1; correct, red) or non-atrial fibrillation (y0).41 

Since the training of supervised ML models requires only labelled data (e.g. ECGs) and no explicit rules, machines learn to solve tasks that humans simply cannot ‘see.’ But what exactly is it ‘seeing’? Extensive research is being invested in this challenge. One illustration of these models is by looking at specific examples and highlighting the parts of the input that contributed most to the final output. An example is the Grad-CAM method, with network gradients used to produce a coarse localization of drivers of the output.42 

Because of the complexity of explaining how AI is interpreting ECGs, we commonly refer to these inner workings as a “black box” between input (normal sinus rhythm ECG) and output (i.e., probability of silent or pending or past AF). Unlike the traditional risk-prediction models that comprise predefined variables, the CNN is described as “agnostic,” because we do not know what ECG features the CNN is “seeing” and which factor drive its performance. The performance of the algorithm is likely to be based on a combination of ECG signatures that are known risk factors (for example, for AF, the P-wave amplitude, width, notches, atrial ectopy, LV hypertrophy, and heart rate variability) as well as others that are currently unknown, or are not obvious to the human eye, in combination, in a non-linear manner.18  Beside the ‘black box’ transparency concerns, various challenges do exist around AI-techniques (i.e., generalizability, liability, regulatory approval, etc.) that exceeds the focus of the current paper.

The overall performance for the AI power were measured by the area under the curve (AUC) under the Receiver Operating Characteristic (ROC). See appendix for explanation.

AI-ECG is proving to be a tool for advanced ECG interpretation and a potential tool for profiling cardiac health and diseases. AI-ECG implementation is still in its early stages. Like any medical tool, AI-ECG requires careful vetting, validation, and clinician training. When seamlessly integrated into medical practice, AI-ECG shows potential to revolutionize clinical care.

Conclusion

AI applications now have been extended to the domain of ECG (a routine 12-lead ECG and even a single-lead tracing from an iPhone). It enables diagnostic and predictive capabilities that exceed human’s ability to ‘see’ important changes on ECGs. AI utilizes deep learning technique using convolutional neural networks (CNNs) to extract and find hidden information, subtle signals, and patterns in vast datasets – without relying on predefined rules. These are leading to discovery or forecast of subclinical conditions. AI-ECG (12-lead or single lead wearables) is predicted to be seamlessly integrated into future clinical workflows for better screening and predicting tool in both symptomatic and asymptomatic patient populations. But vigorous validation, appropriate training and a legal framework are still required before we see these techniques in clinical practice.

APPENDIX

Appendix A: The AUROC

Most AI-ECG models use area under the receiver operating characteristic (AUROC) for statistical powering of their models. The Receiver Operating Characteristic (ROC) was coined from British radar in WWII, analyzing radar data to differentiate between enemy aircraft and flocks of geese. As the sensitivity of the receiver was increased (to better find enemy aircraft), the number of false positive (i.e., the specificity, or number of false positives) worsened. The area under the curve (AUC) shows sensitivity [Y axis] versus 1-specificity [x-axis]. The sensitivity is the true-positive rate (TP/TP + FN) and 1-specificity is the false-positive rate (FP/FP + TN). The area under the curve (AUC) quantifies the overall performance of the ROC across all possible thresholds, striving for the highest TPR while maintain the lowest possible FPR. AUC=1 refers to a perfect classifier while AUC=0.5 is random chance (Figure 4). The Area Under the Curve (AUC) simply measures how well the model separates two classes.

Figure 4.

AUC measures how well the model classifies binary classes: https://en.wikipedia.org/wiki/Receiver_operating_characteristic.

Figure 4.

AUC measures how well the model classifies binary classes: https://en.wikipedia.org/wiki/Receiver_operating_characteristic.

Close modal

Appendix B: Accuracy versus AUC

Accuracy is mathematically written as:
AUC is mathematically written as:

Accuracy

  • Think of accuracy like hitting a bullseye on a target. It's a simple and intuitive measure – it tells you the percentage of times your model makes the correct prediction (both positive and negative).

  • It's easy to understand – higher accuracy is generally better.

Area Under the ROC Curve (AUC)

  • Imagine a different kind of target, where the center represents perfectly distinguishing positives from negatives, and the edges represent purely random guessing.

  • The ROC curve plots how well your model performs across different thresholds for classifying something as positive. AUC tells you how much of that target area your model captures.

  • A higher AUC means your model can consistently differentiate positives from negatives, regardless of the chosen threshold.

Why is AUC More Commonly Used in Assessing AI-ECGs?

  • Imbalanced data: If you have a lot more of one class (e.g., healthy patients) than the other (e.g., sick patients), a model might just predict everyone as healthy (the majority class) and achieve high accuracy. AUC avoids this by considering all possibilities.

  • AUC focuses on both positives and negatives: A good model needs to identify both positive and negative cases well. AUC considers both true positives and false positives, giving a more complete picture.

Note: some studies refer to AUC as AUROC – Area Under (Curve of) of Receiver Operating Characteristic.

1.
Richie
RC.
Through the Looking Glass Darkly: How May AI Models Influence Future Underwriting
?
J Insur Med
2024
;
51
(
1
)
2.
Attia
ZI,
Harmon
DM,
Behr
ER,
Friedman
PA.
Application of artificial intelligence to the electrocardiogram (A State of the Art Review)
.
Eur Heart J
.
2021
;
42
:
4717
4730
. . PMID: 34534279
3.
Go
AS,
Hylek
EM,
Phillips
KA,
Chang
Y,
et al
Prevalence of diagnosed atrial fibrillation in adults: national implications for rhythm management and stroke prevention: the Anticoagulation and Risk Factors in Atrial Fibrillation (ATRIA) Study
.
JAMA
.
2001
;
285
(
18
):
2340
2375
. . PMID:11343485
4.
Fang
MD,
Go
AS,
Chang
Y,
et al
Long-term survival after ischemic stroke in patients with atrial fibrillation
.
Neurology
.
2014
;
82
(
12
):
1033
1037
. . PMID:24532273
5.
Hindricks
G,
Potrpara
T,
Dagres
N,
Arbelo
E,
Bax
JJ,
Blomström-Lundqvist
C,
Boriani
G,
Castella
M,
Dan
G-A,
Dilaveris
PE,
et al
2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the Euroean Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC
.
Eur Heart J
.
2021
;
42
:
373
498
. PMID: 32860505
6.
Siontis
KC,
Gersh
BJ,
Killian
JM,
Noseworthy
PA,
McCabe
P,
Weston
SA,
Roger
VL,
chamberlain
AM.
Typical, atypical, and asymptomatic presentations of new-onset atrial fibrillation in the community: characteristics and prognostic implications
.
Heart Rhythm
.
2016
Jul;
13
(
7
):
1418
24
. . PMID: 26961300
7.
Sgreccia
D,
Manicardi
M,
Malavasi
VL,
et al
Comparing outcomes in asymptomatic and symptomatic atrial fibrillation: a systematic review and meta-analysis of 81,462 patients
.
J Clin Med
.
2021
;
10
(
17
):
3979
. . PMID: 34501434
8.
Kashou
AH,
Adedinsewo
DA,
Noseworthy
PA.
Subclinical Atrial Fibrillation: A Silent Threat with Uncertain Implications
.
Ann Rev Med
.
2022
;
73
:
355
62
.
9.
Farhan
S,
Silbiger
JJ,
Halperin
JL,
Zhang
L,
et al
Pathophysiology, echocardiographic diagnosis, and treatment of atrial functional mitral regurgitation: MACC state-of
-
the-art review. J Am Coll Cardiol
.
2022
;
80
(
24
):
2314
2330
. . PMID: 36480974
10.
Santhanakrishnan
R,
Wang
N,
Larson
MG,
Magnani
JW,
et al
Atrial fibrillation begets heart failure and vice versa: temporal associations and differences in preserved versus reduced ejection fraction
.
Circulation
.
2016
;
133
(
5
):
484
492
. PMID: 26746117).
11.
Hart
RG,
Pearce
LA,
Aguilar
MI.
Meta-analysis: antithrombotic therapy to prevent stroke in patients who have nonvalvular atrial fibrillation
.
Ann Intern Med
.
2007
;
146
(
12
):
857
867
. PMID: 17577005
12.
Goette
A,
Borof
K,
Breithardt
G,
Camm
AJ,
et al
EAST-AFNET 4 Investigators. Presenting pattern of atrial fibrillation and outcomes of early rhythm control therapy
.
J Am Coll Cardiol
.
2022
;
80
(
4
):
283
295
. . PMID: 35863844)
13.
Kirchhof
P,
Camm
AJ,
Goette
A,
et al
EAST-AFNET 4 Trial Investigators. Early rhythm-control therapy in patients with atrial fibrillation
.
N Engl J Med
.
2020
.
383
(
14
):
1305
1316
. . PMID: 32865375)
14.
Sanna
T,
Diener
HC,
Passman
RS,
Di Lazzaro
V,
Bernstein
RA,
Morillo
CA,
et al
Cryptogenic stroke and underlying atrial fibrillation
.
N Engl J Med
.
2014
;
370
:
2478
2486
. . PMID: 24963567
15.
Seet
RC,
Friedman
PA,
Rabinstein
AA.
Prolonged rhythm monitoring for the detection of occult paroxysmal atrial fibrillation in ischemic stroke of unknown cause
.
Circulation
.
2011
;
124
:
477
486
. PMID: 21788600
16.
United States Preventive Services Task Force (USPSTF)
. uspreventiveservicestaskforce.org/uspstf/recommendation/atrial-fibrillation-screening (accessed 02/09/2024)
17.
Davidson,
KW,
Barry
MJ,
Mangione
CM,
et al
Screening for atrial fibrillation: US Preventive Services Task Force recommendation statement
.
JAMA
2022
;
327
:
360
367
. . PMID: 36076659
18.
Siontis
KC,
Noseworthy
PA,
Attia Zachi
I,
Friedman
PA.
Artificial-enhanced electrocardiography in cardiovascular disease management
.
Nature Reviews Cardiology
.
2021
July;
18
(
7
):
465
478
. . PMID: 33526938
19.
Goudis
C,
Daios
S,
Dimitriadis
F,
Liu
T.
CHARGE-AF: A Useful Score for Atrial Fibrillation Presiction
?
Curr Cardiol Rev
.
2023
;
19
(
3
):
e010922208402
. . PMID: 36056866
20.
Noseworthy
PA,
Attia
ZI,
Behnken
EM,
Giblon
RE,
Bews
KA,
Liu
S,
Gosse
TA,
Linn
ZD,
Deng
Y,
Yin
J,
Gersh
BJ,
Graff-Radford
J,
Rabinstein
AA,
Siontis
K,
Friedman
PA,
Yao
X.
Artificial intelligence-guided screening for atrial fibrillation using electrocardiogram during sinus rhythm: a prospective non-randomized intervention trial
.
Lancet
.
2022
Oct 8:
400
(
10359
):
1205
1212
. PMID: 36179758
21.
Yuan
N,
Duffy
G,
Dhruva
SS,
Oesterle
A,
Pellegrini
CN,
Theurer
J,
Vali
M,
Heidenreich
PA,
Keyhani
S,
Ouyang
D.
Deep Learning of Electrocardiograms in Sinus Rhythm from US Veterans to Predict Atrial Fibrillation
.
JAMA Cardiol
.
2023
Dec 1;
8
(
12
):
1131
1139
. . PMID: 37851434
22.
Redfield
MM,
Jacobsen
SJ,
Burnett
JC
Jr,
Mahoney
DW,
Bailey
KR,
Rodeheffer
RJ.
Burden of systolic and diastolic ventricular dysfunction in the community: appreciating the scope of the heart failure epidemic
.
JAMA
.
2003
;
289
(
2
):
194
202
. PMID: 12517230
23.
McDonagh
TA,
McDonald
K,
Maisel
AS.
Screening for asymptomatic left ventricular dysfunction using B-type natriuretic Peptide
.
Congest Heart Fail
2008
;
14
(
4
):
5
8
.
24.
Sangha
V,
Nargesi
AA,
Dhingra
LS,
Khunte
A,
Mortazav
BJ,
Ribeiro
AH,
Banina
E,
Adeola
O,
Garg
N,
Brandt
CA,
Miller
EJ,
Luiz
A,
ribeiro
P,
Velazquez
EJ,
Giatti
L,
Barreto
SM,
Foppa
M,
Yuan
N,
Ouyang
D,
Krumhoiz
HM,
Khera
R
.
Detection of Left Ventricular Systolic Dysfunction from Electrographic Images
.
Circulation
2023
;
148
:
765
777
. . PMID:37489538
25.
Attia
ZI,
Kapa
S,
Lopez-Jimenez
F,
McKie
PM,
Ladewig
DL,
et al
Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram
.
Nat Med
.
2019
;
25
(
1
):
70
74
. . PMID: 30617318
26.
Kwon
J-M,
Kim
K-H,
Eisen
HJ,
Cho
Y,
Jeon
K-H,
Lee
SY,
Park
J,
Oh
B-H.
Artificial intelligence assessment for early detection of heart failure with preserved ejection fraction based on electrocardiographic features
.
Eur Heart J-digital Health
.
2020
;
2
:
106
116
.
27.
Yao
X,
Rushlow
DR,
Inselman
JW,
McCoy
RG,
Thacher
TD,
Behnken
EM,
Bernard
ME,
Rosas
SL,
Akfaly
A,
Misra
A,
Molling
PE,
Krien
JS,
Foss
RM,
Barry
BA,
Siontis
KC,
Kapa
S,
Pellikka
PA,
Lopez-Jimenez
F,
Attia
ZI,
Shah
ND,
Friedman
PA,
Noseworthy
PA.
Artificial intelligence-enabled electrocardiograms for identification of patients with low ejection fraction: a pragmatic, randomized clinical trial
.
Nat Med
.
2021
May;
27
(
5
):
815
819
. . PMID: 33958795
28.
Attia
ZI,
Harmon
DM,
Dugan
J,
Manka
L,
Lopez-Jimenez
F,
Lerman
A,
Siontis
KC,
Noseworthy
PA,
Yao
X,
Klavetter
EW,
Halamka
JD,
Asivatham
SJ,
Khan
R,
Carter
RE,
Leibovich
BC,
Friedman
PA.
Prospective evaluation of smartwatch-enabled detection of left ventricular dysfunction
.
Nat Med
. Nov
2022
.
29.
Lampert
J,
Vaid
A,
Whang
W,
Koruth
J,
Miller
MA,
Langan
MN,
Baumgartner
H,
Falk
V,
Bax
JJ,
Fr Nonis
, et al
2017 ECT/EACTS Guidelines for the management of valvular heart disease
.
Eur Heart J
2017
:
38
:
2739
2791
. . PMID: 28886619
30.
Kang
DH,
Park
SJ,
Lee
SA,
Lee
S,
et al
Early surgery or conservative care for asymptomatic aortic stenosis
.
N Engl J Med
.
2020
;
382
:
111
119
. . PMID: 31733181
31.
Pelikka
PA,
Sarano
ME,
Nishimura
RA,
Malouf
JF,
et al
Outcome in 622 adults with asymptomatic, hemodynamically significant aortic stenosis during prolonged follow-up
.
Circulation
.
2005
;
11
:
3290
3295
. . PMID: 15956131
32.
Strange
G,
Stewart
S,
Celermajor
D,
Prior
D,
et al
Poor long-term survival in patients with moderate aortic stenosis
.
J Am Coll Cardiol
.
2019
;
74
:
1851
1863
. . PMID: 31491546
33.
Ito
S,
Miranda
WR,
Nkomo
VT,
Boler
AN,
et al
.
Prognostic risk stratification of patients with moderate aortic stenosis
.
J Am Soc Echocardiogr
.
2021
;
34
:
248
256
. . PMID: 33161066
34.
Kuperstein
R,
Fainberg
MS,
Eldar
M,
Schwammenthal
E.
Physical determinants of systolic murmur intensity in aortic stenosis
.
Am J Cardiol
.
2005
;
95
(
6
):
774
776
.
35.
Cohen-Shelly
M,
Attia
Z,
Friedman
PA,
Ito
S,
et al
.
Electrocardiogram screening for aortic valve stenosis using artificial intelligence
.
Eur Heart J
.
2021
;
42
:
2885
2896
. . PMID: 337488852
36.
Maron
BJ,
Hass
TS,
Murphy
CJ,
Adluwalia
A,
Rutten-Ramos
S.
Incidence and causes of sudden death in U.S. college athletes
.
J Am Coll Cardiol
.
2014
;
63
:
1636
1645
. . PMID: 24583295
37.
Semsarian
C,
Ingles
J,
Maron
MS,
Maron
BJ.
New perspectives on the prevalence of hypertrophic cardiomyopathy
.
J Am Coll Cardiol
.
2015
;
65
:
1249
1254
. . PMID: 25814232
38.
Ko
W-Y,
Siontis
KC,
Attia
ZI,
Carter
RE,
Kapa
S,
et al
Detection of Hypertrophic Cardiomyopathy Using a Convolutional Neural Network-Enabled Electrocardiogram
.
J Am Coll Cardiol
.
2020
;
75
(
7
):
722
733
. . PMID: 32081280
39.
Attia
ZI,
Friedman
PA,
Noseworthy
PA,
Lopez-Jimenez
F,
Ledewig
DJ,
et al
.
Age and Sex Estimation Using Artificial Intelligence From Standard 12-Lead ECGs
.
Circ Arrhythm Electrophsiol
.
2019
;
12i007284
.
40.
Richie
RC.
Basics of Artificial Intelligence (AI) Modeling
.
J Insur Med
.
2024
:
51
.
[PubMed]
41.
Krittanawong
C,
Johnson
K,
Rosen
R,
Wang
Z,
Aydar
M,
et al
.
Deep learning for cardiovascular medicine: a practical primer
.
Eur Heart J
.
2019
;
40
(
25
):
2058
2073
. . PMID: 30815669
42.
Selvaraju
RR,
Cogswell
M,
Das
A,
et al
.
Visual Explanations from Deep Networks via Gradient-Based Localization. Proceedings of the
IEEE International Conference on Computer Vision
,
Venice
, Oct.
2017
:
618
626
.

Competing Interests

Conflicts of Interest: Neither author acknowledged any financial or other conflicts of interest in producing this treatise.