Abstract
To detect unusual infusion alerting patterns using machine learning (ML) algorithms as a first step to advance safer inpatient intravenous administration of high-alert medications.
We used one year of detailed propofol infusion data from a hospital. Interpretable and clinically relevant variables were feature engineered, and data points were aggregated per calendar day. A univariate (maximum times-limit) moving range (mr) control chart was used to simulate clinicians' common approach to identifying unusual infusion alerting patterns. Three different unsupervised multivariate ML-based anomaly detection algorithms (Local Outlier Factor, Isolation Forest, and k-Nearest Neighbors) were used for the same purpose. Results from the control chart and ML algorithms were compared.
The propofol data had 3,300 infusion alerts, 92% of which were generated during the day shift and seven of which had a times-limit greater than 10. The mr-chart identified 15 alert pattern anomalies. Different thresholds were set to include the top 15 anomalies from each ML algorithm. A total of 31 unique ML anomalies were grouped and ranked by agreeability. All algorithms agreed on 10% of the anomalies, and at least two algorithms agreed on 36%. Each algorithm detected one specific anomaly that the mr-chart did not detect. The anomaly represented a day with 71 propofol alerts (half of which were overridden) generated at an average rate of 1.06 per infusion, whereas the moving alert rate for the week was 0.35 per infusion.
These findings show that ML-based algorithms are more robust than control charts in detecting unusual alerting patterns. However, we recommend using a combination of algorithms, as multiple algorithms serve a benchmarking function and allow researchers to focus on data points with the highest algorithm agreeability.
Unsupervised ML algorithms can assist clinicians in identifying unusual alert patterns as a first step toward achieving safer infusion practices.
Avoidable medication errors related to the use of intravenous (IV) medications occur more frequently than other types of medication errors, with frequencies between 48% and 81%.1,2 Smart infusion pumps with dose error reduction software (DERS) have been shown to reduce such IV medication administration errors at rates as high as 79%.3 The benefits of this safety system have facilitated the adoption of smart infusion pumps in about 90% of U.S. hospitals as of 2017.4
DERS contains a drug library that uses highly tailored, hospital system–defined medication limits. An alert is generated when the programmed infusion parameters are not within the drug's limit in the drug library. When the programmed parameters of a medication are below or above its soft limit, a soft minimum or soft maximum alert is generated, respectively. Although soft limits can be overridden, infusions outside of hard limits can only be either reprogrammed within the acceptable range or canceled.
Most current smart infusion pump systems also capture detailed data on all infusion events, which can be analyzed for continuous quality improvement (CQI) initiatives.5 Despite these benefits, clinicians can bypass using DERS and override soft limits.6 Moreover, hospital systems determine the limits in the drug library, and these might not necessarily reflect varying patient needs. Hence, DERS cannot prevent all IV medication errors and has limited effects on patient safety.7,8
Consequently, professional organizations have pushed for safer clinical practices through published guidelines and standards developed to prevent IV medication errors and improve patient safety.9–11 In 2019, the Institute for Safe Medication Practices (ISMP) published a list of safe infusion guidelines.9 These guidelines include using DERS for 95% of all infusions, monitoring drug library version updates to ensure the most current drug library is used, and analyzing CQI metrics (e.g., alert count, alert rate, alert override rate). However, meeting some of these safety guidelines could be challenging for many health systems because it can require substantial efforts and resources from cross-functional teams of clinicians to search, compile, and analyze relevant data to formulate these safety metrics.5,12 Although certain pump analytics software report some of these metrics, analyzing them to identify unusual infusion alert patterns and at-risk practices remains challenging.5 Hence, a need exists for hospital systems to prioritize rigorous CQI initiatives in IV medication administration, not just to achieve compliance with ISMP or other professional organizations but also to ultimately improve infusion safety.
In this study, we explored the potential of unsupervised machine learning (ML) algorithms in detecting unusual infusion alerting patterns and compared that approach with current practices of using traditional time-series plots. Recently, researchers have explored ML algorithms to aid medication administration and reduce medication errors.12–14 One study used ML models to identify unusual medication orders and pharmacological profiles.15 Another study devised ML approaches to characterize patient risk factors associated with self-intercepted medication ordering errors.16 Researchers have also developed a novel ML-based outlier system integrated into the electronic health record (EHR) system to identify and intercept potential medication prescription errors.17 However, none of these efforts have specifically targeted IV medication administration.
We define “unusual infusion alerting patterns” as trends or patterns that deviate in frequency, total amount, or time of generation from past alerts of the same infused drug. Detecting unusual infusion alerting patterns is especially critical for high-alert medications (HAMs), which bear a heightened risk of causing considerable patient harm when used in error.18 HAM errors are troublesome for patients, healthcare providers, and institutions because they are associated with adverse events (AEs) and can result in substantial patient harm, including the potential for permanent injury and death. From reviewing reports and basic charts, clinicians can potentially miss some HAM infusion alerting patterns that can lead to serious AEs. These omissions may be due to limitations of the human information processing system. In addition, findings from such manual reviews might be inconsistent from person to person or in relation to the duration of review. With less time and labor resources, unusual HAM infusion alerting patterns can be detected using ML models. This can serve as a preventive step for future events while providing insights into at-risk practices.
Objectives
The primary objective of this study was to detect unusual propofol alert patterns that can potentially indicate unsafe medication administration practice leading to AEs, by using traditional statistical and unsupervised ML models. The secondary objective was to compare unusual alert patterns identified by each of these models and suggest improvements in current CQI practices.
Materials and Methods
Study Setting and Dataset
This study was approved by Purdue University's Institutional Review Board. The dataset used was from one medium-sized health system of the Regenstrief National Center for Medical Device Informatics (REMEDI) community of practice. This time-stamped infusion administration dataset was extracted from the Alaris Pump system (BD/CareFusion, San Diego, CA).19
This dataset was fully deidentified, with no patient-related details and no pointers to any patient-specific patterns. The time period for the dataset ranged from January to December 2019, totaling more than three million infusion events and 38 data fields (shown in Table 1 in the supplemental material for this article, available at www.aami.org/bit).
This study focused on propofol, a drug classified as a HAM by the ISMP and American Society of Health-System Pharmacists.10,18 Propofol is a well-known high-risk medication with severe AEs.20 One such AE is propofol infusion syndrome (high doses of propofol sedation over long periods), which was the cause of death in seven adults.21 Other propofol AEs include aspiration, oxygen desaturation, central apnea obstruction, stridor, bradycardia, laryngospasm, excessive secretions, vomiting, unexpected readmissions, and postinduction hypotension.22–24 These point out to the need to implement powerful CQI approaches to detecting unusual propofol infusion alerting patterns as a first step to identify potential risk-inducing practices or the need to adjust drug limits with the goal of improving patient safety.
Data Preparation
All data preparation steps were performed with R25 in the RStudio integrated development environment.26 We defined an infusion as a series of infusion events (e.g., infusion started, infusion stopped, infusion alerted, infusion alarmed, infusion paused) grouped by a unique infusion identifier assigned by the pump system. In this study, we aggregated the infusion dataset per calendar day because it depicts the time series nature of the data while focusing on relevant alerting variables. From the ISMP guidelines,9 we selected pertinent CQI metrics as potential input variables. In addition, we interviewed multiple medication safety pharmacists to leverage their domain-specific knowledge of the infusion administration process to refine the final variables used for the study. Consequently, seven variables were feature engineered from the propofol dataset and these served as input data for the models used in this study. Table 1 describes these variables and our justifications for their inclusion in the study.
Statistical Process Control
Statistical process control (SPC) combines rigorous time series analysis methods with the graphical presentation of the dataset to get quick and easy-to-understand insights.28 Control charts are a powerful, convenient, and statistically rigorous SPC tool used by quality improvement researchers and practitioners.28 They are used in healthcare settings, specifically in medication administration, for process monitoring, control, and improvement.29–32 They are also useful for anomaly detection because data that fall outside the control limits depict special cause variation, indicating that something unpredictable is not inherent in the process.28,33 A moving range (mr)-chart is a control chart of the magnitudes of the differences between successive values of the data.34 It is robust because, in practice, it requires no assumptions of dataset normality.32,35
The process used to create hospital-defined drug limits can affect the type and amount of infusion alerts. Drug limits with wide ranges could decrease alerts, albeit at a potential detriment to patient safety, while more stringent limits could increase the risk of clinicians' alert fatigue.27 Clinicians commonly use a time series plot of the times-limit values from pump-generated reports to visualize how far off limits these infusions were programmed.5 Using control charts is more statistically rigorous and just as convenient. Hence, our use of an mr-chart of maximum times-limit generated by Python 3.136 simulated clinicians' common approach to identifying unusual alerting patterns.
Anomaly Detection
Anomaly detection or outlier detection refers to finding anomalies, rarities, or patterns in data that do not conform to a well-defined notion of expected behavior.37 Anomaly detection is important in different applications, including fraud detection in financial systems,38 fault detection in safety-critical systems,39 and detection of abnormal regions in image processing applications.40 Similarly, healthcare systems can benefit from anomaly detection because anomalies in any healthcare dataset can be translated to important (sometimes critical) and actionable information.
Depending on the availability of dataset labels, ML-based anomaly detection algorithms can operate as supervised or unsupervised. Supervised ML works on the premise that a training dataset exists with labeled instances (normal or anomalous class for each data point) serving as a ground truth. After a predictive model is built, unseen data instances can be compared with the model to determine their classes.
In healthcare applications, unsupervised ML algorithms are more common due to the resource-intensiveness of correct data labeling. These algorithms have been applied to reduce medication errors and, subsequently, improve safe medication practices. For example, Hogue et al.15 used an unsupervised GANomaly-based model to identify atypical pharmacological profiles. In another study, one-class support vector machine was used on prescription data obtained from an EHR system to identify overdose and underdose prescriptions.41 An unsupervised anomaly detection method called density-distance-centrality was also applied to detect potential outlier prescriptions from EHR data.42
To address the study's primary objective, we applied three commonly used unsupervised ML-based anomaly detection algorithms to identify unusual alerting patterns in the dataset: Local Outlier Factor (LOF),43 Isolation Forest (iForest),44 and k-Nearest Neighbors (kNN).45
LOF is a density-based technique that computes the local density deviation of a given data point with respect to its n neighbors and considers anomalies as data points with a substantially lower density than their neighbors. Although previous works have suggested a minimum value of n = 10, no objective criterion exists for determining an optimal value.43 However, n = 20 appears to generally work well in practice.46
iForest is a binary tree–based technique that detects anomalies by direct isolation based on the assumptions that they are few and different from normal classes.
kNN is generally a distance-based supervised ML algorithm that takes an unsupervised approach for anomaly detection. In unsupervised kNN, for a given value of k (k ∈ ℕ), the mean of distances from the k nearest neighbors of each data point is determined. Anomalies are datapoints with considerably larger mean of distances from their k nearest neighbors.
The selection of these three unsupervised ML algorithms is based on the diversity of their approaches to anomaly detection: local density based, binary tree based, and distance based.
We carried out a final preprocessing step of normalizing all numerical input variables in Table 1 before implementing the anomaly detection algorithms. Normalization refers to scaling each input variable to a [0, 1] interval and is important for unsupervised anomaly detection because variables of the dataset might have different measurement units and ranges.47 Research has also shown that implementing ML-based anomaly detection algorithms on normalized datasets yields better performance compared with unnormalized datasets.48
Dataset normalization and anomaly detection implementations were performed with Python 3.136 with the ScikitLearn library.46 Given that unsupervised anomaly detection uses unlabeled input data, the anomaly reporting style is specific to each algorithm.37 LOF and iForest assign an anomaly score to each point, while kNN assigns a mean-of-distance value. We then selected thresholds (corresponding to the number of anomalies detected by the mr-chart) beyond which points were considered anomalies.37,47,49
Results
The propofol dataset used in this study spanned 365 days (January 1 to December 31). It had a total of 174,987 infusion events (11,986 of which were unique) and 3,300 alerts. Of these alerts, 92% were generated during the day shift. The alert rate in the dataset was one alert for every three unique infusions. The day shift had an alert rate of one for every two unique infusions, while the night shift had an alert rate of one per 19 unique infusions. Approximately all propofol infusions were given in adult care units with use in therapies for critical care and anesthesia. The maximum limit for both therapies was 100 μg/kg/min, and this limit was used as a hard maximum for critical care and a soft maximum for anesthesia. About 73% of the alerted propofol infusions were programmed at this limit. Seven alerts had a times-limit value greater than 10, and 96% of all alerts were continuous dose. Table 2 shows the five-number summary statistics, including the mean values for each of the feature-engineered numeric input variables used for this study.
Moving Range Charts for Anomaly Detection
Figure 1 shows the mr-chart for maximum times-limit over time. The center line (represented by the mean moving range) was at 1.53, while the lower and upper control limits were at 0.00 and 4.99, respectively. The lower and upper control limits were defined as 3 SD values below and above the center line, respectively. A total of 15 data points (representing 15 different calendar days) were highlighted as anomalies, depicting unusual alerting patterns (Figure 1). The highest value of maximum times-limit among the mr-chart anomalies was 35 (corresponding to that of the propofol dataset). About 20% of the anomalies had maximum times-limit value of 0.
Unsupervised ML-Based Anomaly Detection
Because the ML algorithms used in this study have different distinguishing attributes and anomaly reporting patterns, thresholds yielding the top 15 anomalies (corresponding to results from the mr-chart) per model were used. For LOF and iForest, a contamination fraction parameter representing the percentage of anomalies assumed to be in the dataset is selected. To match the result reporting style from the mr-chart, anomalies from the ML algorithms were visualized on a plot of maximum times-limit against time (Figure 2).
For LOF, n = 20 and a contamination fraction of 4.11% was the threshold value that yielded the top 15 anomalies. Figure 2A shows the plot of maximum times-limit against time, with the top 15 LOF anomalies highlighted. About 13% of all LOF anomalies had no alerts. Of the LOF anomalies, 20% indicated unusual alerting patterns on weekends.
A contamination fraction of 4.11% yielded the top 15 anomalies detected by iForest. Figure 2B shows the plot of maximum times-limit against time, with the top 15 iForest anomalies highlighted. All iForest anomalies had at least one alert. Of the iForest anomalies, 40% indicated unusual alerting patterns on weekends.
Finally, for kNN, data points with mean of distances from their k nearest neighbors greater than 0.058 were considered anomalies (for k = 7). This was the threshold that yielded the top 15 anomalies. Figure 2C shows the plot of maximum times-limit against time, with the top 15 kNN anomalies highlighted. All kNN anomalies had at least one alert. None of the kNN anomalies indicated unusual alerting patterns on weekends.
Comparison of Detection Methods
To effectively compare the performance of both approaches for detecting unusual infusion patterns, we matched the anomalies from the univariate mr-chart (indexed by the calendar dates) with the multivariate input dataset used for the ML algorithms. This allowed us to get more insights into and expand our interpretation of the mr-chart anomalies. All anomalies detected by the ML algorithms then were grouped and ranked by degrees of agreeability. Without duplicating similar points, the three unsupervised ML algorithms detected a total of 31 unique anomalies. All three algorithms agreed on 10% of the 31 anomalies. At least two of the three algorithms agreed on 36% of the 31 anomalies. For each numerical input variable, Table 3 shows the averages of anomalies detected by the ML algorithms compared with those from the mr-chart.
Of the 15 anomalies detected from the mr-chart, three (20%) had no alerts and were on weekends. Another eight (54%) mr-chart alerts were also detected by at least one of the unsupervised ML algorithms (Figure 3). The remaining four (26%) mr-chart anomalies were not detected by any of the ML algorithms (Figure 4A).
Discussion
CQI is a constant and step-by-step quality management approach used in healthcare settings to improve processes, patient care, and safety. It can play a critical role in achieving safer IV medication administration and, consequently, improved patient safety. The effect of CQI runs the gamut, as improvements could be groundbreaking or very gradual in nature.
At its core, CQI constantly challenges healthcare teams with questions such as, “How are we doing?” and “Can we do it better?”50 Because these questions can be quite vague, loosely defined, and subjective, an effective CQI practice needs structured and relevant clinical data. Through a proof-of-concept study, we leveraged infusion administration data from smart pumps to answer those questions with respect to identifying unusual infusion alerting patterns of propofol, which is a HAM.
Our analysis showed that 92% of propofol alerts occurred during the day and at a rate of one alert for every two unique infusions, aligning with the fact that pre-scheduled perioperative procedures usually occur during the day shift.51 Surgical procedures are complex, high stakes, and fast paced.52 Hence, having such a high volume of alerts generated (up to 71 in a day) from just one of the many IV medications administered during perioperative periods is highly undesirable.
In the study dataset, the largest times-limit was 35. This means that the programmed parameters were 35 times over the propofol alert limit in the drug library. Research has shown that a times-limit value of 10 is considered a high fatal threshold beyond which an override would be unlikely to be justifiable as a legitimate dose in most patient care situations.53–56 Our results revealed 0.2% of all propofol alerts had times-limit values greater than 10, representing seven unique instances where patients were at immense risk of a fatal AE.
Third-party informatics tools for infusion pump systems can provide commonly used metrics (e.g., infusion alert count, override-to-reprogram ratio) with data filters on time series plots but have no advanced and customizable computer intelligence.57 To bridge this gap, seven relevant, interpretable, and clinically meaningful variables (six numerical and one categorical) were feature engineered. Feature engineering is a human-centric process that uses human knowledge and intuition to control the raw input variables of an ML solution with the eventual goal of making complex problems easier to solve.58 Because anomaly detection is a hard problem in several domains,59 feature engineering can potentially reduce its complexity.
Although the mr-chart identified 15 anomalies, those findings were from one variable that might not exhaustively define the infusion administration process. Different sets of anomalies are expected to be detected through the mr-chart as the input variable changes. About 20% of the mr-chart anomalies had a maximum times-limit of 0, which depicts calendar days with no propofol alerts. This shows clear room for improvement in using SPC tools as identifying unusual alerting patterns becomes inconsequential in the absence of alerts. Regardless, because out-of-limit conditions in the mr-chart translate to a shift in the data that is too large to be easily explained by ordinary variation, the anomalies with positive maximum times-limit could be pointers to further investigating the infusion administration practice on those days. In addition, our analysis shows a reasonable level of overlap (54%) in anomalies identified by both the mr-chart and the unsupervised ML algorithms. Hence, if resources are available, we recommend that hospital systems can implement both methods for a more comprehensive view of unusual infusion alert patterns.
Of all LOF anomalies, 13% had no alerts. Further, for each of the numerical variables, the average values of LOF outliers were less than those of the mr-chart, iForest, and kNN. These findings are in line with our understanding that LOF considers anomalies as data points with a substantially lower density than their neighbors. However, this principle makes it ideal for detecting trends caused by seasonality because certain data points that are outliers relative to their local neighborhood (i.e., local outliers) might be inliers in the “global view of things.”43,60 kNN penalized points with high alert counts (both day and night) and maximum times-limit, as these anomalies had the highest values in our analysis. However, iForest penalized points with high rates (daily, weekly moving, and alert override), as these anomalies had the highest values in our analysis. The three algorithms did not all detect the same anomalies. This variation in degrees of agreeability can be attributed to their different approaches to anomaly detection (local-density based, binary-tree based, and distance-based). However, it is important to leverage the strength of different ML-based anomaly detection approaches to identify anomalies in a dataset.
All three ML algorithms agreed that an unusual infusion alerting pattern occurred on July 30, while the mr-chart found no anomaly on that day. Although the maximum times-limit value on that day was 3 (below the fatal threshold), 71 alerts were generated at an average rate of 1.06 per unique infusion. One-half of those alerts were also overridden. However, the moving alert rate for the week was 0.35 per infusion. These are all cogent reasons to further investigate the infusion administration process and practices for the day as well as any patient-related patterns. Even though anomalies were detected by the mr-chart but not by any of the ML algorithms, the case of July 30 further hints at the need to go beyond control charts to identify unusual alerting patterns. Regardless, we also acknowledge that control charts are easier to use and interpret compared with ML algorithms.
The mr-chart highlighted peaks in the plots as anomalies, but the ML algorithms did not have such patterns of detection, lending credence to the importance of combining multiple relevant variables for anomaly detection. However, our findings suggest that both SPC tools and ML-based anomaly detection have the potential for identifying unusual alerting patterns. Although ML-based algorithms are more robust than control charts, we recommend using a combination of algorithms. This is because multiple algorithms serve as a form of benchmarking, and researchers can prioritize efforts to datapoints with the highest agreeability across algorithms.
We also recommend that hospital systems consider allotting dedicated in-house clinical analysts or clinicians with skill sets to meaningfully adopt, interpret, and sustain ML-based clinical decision support tools. These hyper targeted and prioritized efforts would effectively and efficiently advance safe IV medication administration
Limitations
This study had limitations. As a proof-of-concept study, its scope was limited to one year of data from one pump vendor for a single HAM from one hospital. Hence, the alerting patterns might not generalize to other HAMs, hospitals, or vendor pumps. In addition, control charts are typically used on smaller datasets as against traditional statistical analytic tools.28 However, our methods are valid and customizable to other HAM infusion data from other vendor pumps based on specific clinical use patterns.
We also recognize that there are other unsupervised ML-based anomaly detection algorithms for time series data beyond LOF, iForest, and kNN. Because these three algorithms had reasonable levels of anomaly agreeability with one another, a similar pattern would be expected even with other algorithms with similar detection approaches.
Another limitation of this study was that we set ML anomaly thresholds to correspond to the number of anomalies identified by the mr-chart. However, this was necessary for the purposes of having similar basis to achieve the study's secondary objective.
Finally, we only considered out-of-limit alerts in this study. However, AEs can still occur within a drug's alert limit, but such cases were out of the scope of this study.
Future Study
We recommend that future studies involve developing more computational and clinically meaningful definitions of unusual intravenous infusion alerting patterns, as no ground truth currently exists in this area. Unusual infusion alerting patterns can vary depending on other factors, such as the patient and medication being infused. Without specific labels, there are no objective performance metrics for the ML algorithms used in this study. However, research has shown that if the goal is to predict rare occurrences, then raw classification accuracy is not a good measure of prediction quality.61 Hence, blind overreliance on ML performance metrics can be misleading.62
One study showed that improving healthcare ML predictive performance metrics is not always the biggest priority for stakeholders and that value exists in using ML as a decision support tool, then having human resources spend their limited time in a smarter way by investigating the identified high risk areas.63 Consequently, based on the ML algorithms results of unusual infusion alerting patterns, hospital systems can take actionable and preventive steps (e.g., addressing nuisance alerts, revising dose limit settings in the drug library to reflect changes in available infusion supplies, clinical practices, or patient populations, updating relevant policies and procedures, modifying clinical workflows, investing in additional nurse training).
Conclusion
This study presented rigorous analytic CQI practices for detecting unusual infusion alert patterns of a HAM. Our results showed that beyond reviewing pump reports, plotting aggregated univariate charts, and applying basic statistical methods, a mix of intuition, domain expertise, computational rigor, and human-ML collaboration provides more insights into the infusion administration process. As a result, a more effective means of identifying unusual infusion alert patterns can be achieved as a first step to advance safer IV medication administration.
Acknowledgments
The authors thank all REMEDI clinicians involved in the study, especially Andrew Lodolo and Naomi Barasch, for their invaluable insights and expertise.
References
Funding
This work was supported by a Mary K. Logan Research Award from the AAMI Foundation. The content is solely the responsibility of the authors and does not necessarily represent the views of AAMI and/or the AAMI Foundation.