Delayed recognition of acute kidney injury (AKI) results in poor outcomes in military and civilian burn-trauma care. Poor predictive ability of urine output (UOP) and creatinine contribute to the delayed recognition of AKI.
To determine the impact of point-of-care (POC) AKI biomarker enhanced by machine learning (ML) algorithms in burn-injured and trauma patients.
We conducted a 2-phased study to develop and validate a novel POC device for measuring neutrophil gelatinase-associated lipocalin (NGAL) and creatinine from blood samples. In phase I, 40 remnant plasma samples were used to evaluate the analytic performance of the POC device. Next, phase II enrolled 125 adults with either burns that were 20% or greater of total body surface area or nonburn trauma with suspicion of AKI for clinical validation. We applied an automated ML approach to develop models predicting AKI, using a combination of NGAL, creatinine, and/or UOP as features.
Point-of-care NGAL (mean [SD] bias: 9.8 [38.5] ng/mL, P = .10) and creatinine results (mean [SD] bias: 0.28 [0.30] mg/dL, P = .18) were comparable to the reference method. NGAL was an independent predictor of AKI (odds ratio, 1.6; 95% CI, 0.08–5.20; P = .01). The optimal ML model achieved an accuracy, sensitivity, and specificity of 96%, 92.3%, and 97.7%, respectively, with NGAL, creatinine, and UOP as features. Area under the receiver operator curve was 0.96.
Point-of-care NGAL testing is feasible and produces results comparable to reference methods. Machine learning enhanced the predictive performance of AKI biomarkers including NGAL and was superior to the current techniques.
Acute kidney injury (AKI) occurs in up to 34.3% of combat casualties with mortality rates of about 21.7%.1 Combined trauma- and burn-related casualties are at high risk for AKI owing to hypotension complicated by fluid loss associated with thermal injury.2,3 Increased AKI risk and severity also occur under situations where combat casualties experience delayed evacuation—relying on field assessment with limited diagnostic resources where urine output (UOP) serves as the primary means for guiding acute fluid resuscitation in these patients.2 These same clinical and diagnostic challenges are mirrored in civilian practice, where, at best, point-of-care (POC) creatinine testing and UOP measurements may aid in the recognition and management of AKI.4,5
Despite common use, the predictive performance of creatinine and UOP as AKI biomarkers are suboptimal.6,7 Novel AKI biomarkers have been proposed, such as neutrophil gelatinase-associated lipocalin (NGAL), cystatin C, kidney injury marker–1, tissue inhibitors of metalloprotease-2, and insulin-like growth factor binding protein–7, to overcome these limitations.8–11 NGAL, in particular, has shown good performance as an AKI biomarker in several critical and emergency care populations. Briefly, NGAL is normally produced by neutrophils during inflammation and is renally cleared.9,10 During AKI, plasma NGAL increases (>100 ng/mL), and interestingly, renal tubular cells also produce NGAL. To this end, plasma and urine NGAL levels are elevated in patients at risk for renal injury. Despite literature supporting the use of NGAL, concerns have been raised previously about preexisting inflammation confounding the biomarker's performance and limiting its clinical value at the POC.
Recent studies have suggested artificial intelligence (AI)/machine learning (ML) may improve the predictive performance of NGAL along with other biomarkers such as N-terminal B-type natriuretic peptide when performed in the laboratory for burn-injured and nonburned trauma patients12,13 —raising hopes that ML could help identify complex diagnostic patterns masked by confounding factors (eg, preexisting nonrenal injury, inflammation) and not seen by the human eye. NGAL, sensitivity, and specificity increased when combined with ML.
Unfortunately, some logistical challenges remain with implementing ML. Machine learning development is often limited by the manual and laborious nature of programming to find an optimal model—becoming a time-consuming endeavor that may be fraught with analytic bias. A recent study used an innovative automated ML platform to overcome these limitations by quickly identifying high-performing models while eliminating a method bias. To this end, a natural extension of these recent studies prompt evaluating these AI/ML models as a multicenter study and translation to POC testing. The goal of our study was to establish a proof of concept for a prototype NGAL and creatinine POC device with ML models generated by an automated platform.
MATERIALS AND METHODS
We conducted a study evaluating the analytic performance of a prototype handheld POC NGAL/creatinine analyzer (eLab, Nanomix, Emeryville, California), using residual clinical chemistry plasma specimens (phase I). This was followed by a 2-site (site A and site B) observational trial to evaluate the clinical performance of the POC device using prospectively collected plasma specimens from patients at risk for AKI (phase II). A total of 125 adult (age ≥18 years) patients with burn injuries or non–burn-related trauma were enrolled for this phase of the study with approval by both local institutional review boards and the Department of Defense Human Research Protection Office.
For the phase I bench analytic phase of the study, 40 residual chemistry plasma samples from unique patients were collected and banked. These specimens were stored at −70°C via site B's College of American Pathologists–accredited biorepository until testing. Subjects enrolled (by informed consent) in the phase II prospective phase of the study had 1 mL of whole blood collected into lithium heparin collection tubes at time of admission, at onset of AKI, daily during AKI, at AKI resolution, and at discharge. All whole blood samples were immediately centrifuged and plasma was aliquoted. Site A shipped processed plasma samples in frozen batches to site B where testing was performed.
Patient demographics (ie, age, sex, burn size, type of injury), vital signs, hourly UOP, and relevant laboratory results including plasma creatinine at the time of specimen collection were recorded onto an electronic case report form (REDCap, Nashville, Tennessee). Acute kidney injury status based on these data was determined by electronic chart review and using the established Kidney Disease Improving Global Outcomes (KDIGO) criteria (Table 1).
Sample Testing Platforms
Research samples were tested on the POC analyzer, using an NGAL and creatinine assay specifically developed for this study (Figure 1). Briefly, the POC device is a handheld, test cartridge–based platform designed to be environmentally robust for field testing. Test cartridges are single use and use a carbon nanobiosensor design to provide highly sensitive analysis of whole blood, serum, and plasma specimens. For this study, NGAL antibodies from a commercially available assay (BioPorto, Hellerup, Denmark) were adapted to POC test cartridge, while the creatinine assay used a competitive enzyme-linked immunosorbent assay (ELISA) principle. For determining analytic accuracy, NGAL measurements on the POC device were compared against the commercially available ELISA (BioPorto). Point-of-care creatinine measurements were then compared to a central laboratory method (Synchron DxC 800, Beckman Coulter, Brea, California). The central laboratory creatinine assay was standardized against National Institute for Standards and Technology Standard Reference Material (SRM 967)–calibrated isotope dilution mass spectrometry. Between-day (across 5 days) precision was determined by testing in triplicate at clinically relevant thresholds for NGAL and creatinine.
Data were analyzed by using both traditional statistical techniques along with various ML methods. Details are outlined below.
Traditional Statistical Analysis
JMP software (SAS Institute, Cary, North Carolina) was used for statistical analysis. Patient demographics were compared via descriptive statistics. Normality was evaluated by using the Ryan-Joiner test. As appropriate, the 2-sample t test was used for continuous independent variables, while discrete variables were compared by using the nonparametric χ2 test. The paired t test was used to compare continuous dependent variables. For continuous nonparametric variables, the Mann-Whitney U test was used when indicated. Multivariate logistic regression (LR) was used to identify predictors of AKI with age and burn size (when applicable) serving as covariates with 95% CIs reported. Bland-Altman and least squares linear regression plots were used to assess bias and correlation, respectively. Analytic imprecision was determined by calculating the percent coefficient of variation (CV) from replicate (n = 5) testing. A P value <.05 was considered statistically significant. Receiver operator curve (ROC) analysis was also performed to compare AKI biomarker performance.
Machine learning algorithms were developed by using an automated Machine Intelligence Learning Optimizer (MILO) platform (Regents of the University of California, Oakland).14 This automated ML platform has been used for sepsis prediction previously in severely burned patients. In brief, MILO infrastructure uses an automated data processor, a data feature selector (eg, analysis of variance [ANOVA] F select percentile feature selector), and data transformer (eg, principal component analysis), followed by its custom supervised model builder, which includes our custom hyperparameter search tools (ie, grid search along with our random search tools) that help identify optimal hyperparameter combinations for deep neural network (DNN), LR, naïve Bayes (NB), k-nearest neighbor (k-NN), support vector machine (SVM), random forest (RF), and XGBoost gradient boosting machine (GBM) ML algorithms. MILO helps find the most suitable ML model(s) from user-defined datasets by evaluating multiple algorithms and feature combinations—a process not feasible through individual manually programming alone.
Scaling was used by MILO where appropriate, using either standard or min-max scalers. To evaluate the effect of various features within the datasets, combinations of statistically significant feature sets are used to construct new datasets with less features or transformed features from the original dataset. The features selected in this step are derived from established unsupervised ML technique including ANOVA F-statistic value select percentile and RF Feature Importances or transformed by using our principle component analysis approach. A large number of ML models are then built from these datasets with optimal parameters through various algorithms (ie, DNN, SVM, NB, LR, k-NN, RF, and GBM), scalers, hyperparameters, and feature sets.
Two datasets (A and B) were used for ML training, validation, and generalization steps in this study (Figure 2). Dataset A consisted of 76 patients (40 AKI and 36 no-AKI) and was derived from our previous ML AKI study for training and validation of MILO-produced ML algorithms.15 A total of 324,891 ML models were validated by MILO with the top-performing model for each category identified and passed onto the next phase for generalization assessment using dataset B. Dataset B was derived from the present study and using results from the 125 patients enrolled during phase II. Model performance data were then tabulated by MILO and reporting clinical sensitivity, specificity, accuracy, ROC curves, and reliability curves.
Phase I Bench Analytic Study
Point-of-care plasma NGAL results were statistically comparable to the ELISA method with a mean (SD) bias of 9.8 (38.5) ng/mL (P = .10, n = 40) (Figure 3, A). The POC NGAL correlation exhibited an R2 of .97 (Figure 3, B). The POC plasma creatinine assay was also found to be statistically similar to the central laboratory method with a mean bias of 0.28 (0.30) mg/dL (P = .18) with R2 = .81 (Figure 3, C and D). A subset of urine samples (n = 23) was also tested to determine the potential application of NGAL, using this specimen matrix. The POC device was significantly different from the ELISA method with a mean (SD) bias of 23.7 (12.8) ng/mL (P < .001). The analytic imprecision for the plasma NGAL assay was determined to have CVs of 21.7% at 18 ng/mL, 15.2% at 151 ng/mL, 10.2% at 304 ng/mL, and 8.9% at 1256 ng/mL. In contrast, the imprecision for creatinine was 3.3% and 4.2% at 1.0 and 1.5 mg/dL, respectively.
Phase II Prospective Study
Of the 125 patients, 25 were nonburn trauma patients from site A. The remaining 100 patients came from site B and include 25 subjects with severe burns (>20% total body surface area) and 75 with non–burn-related trauma. Patient demographics are highlighted in Table 2. The AKI prevalence was 31.2% (39 of 125) in this combined population. Receiver operator characteristics curve analysis showed NGAL (Figure 4, A) providing superior performance for detecting AKI versus UOP or creatinine individually (Figure 4, B and C). Specifically, NGAL served as an independent predictor of AKI occurring within the first week of hospital stay (odds ratio, 1.6; 95% CI, 0.08–5.20; P = .01). Creatinine and UOP were not significantly different between populations.
Machine Learning Algorithm Performance
Using ML models trained from our previous study (dataset A) via MILO, we were able to identify 2 LR-based ML algorithms with accuracy, sensitivity, and specificity optimized for predicting AKI better than KDIGO and using a set of features with dataset B. The first LR algorithm reported an accuracy of 96% with a clinical sensitivity and specificity of 92.3% and 97.7%, respectively, using a combination of NGAL, creatinine, and UOP as features. Area under the ROC curve for this ML algorithm was determined to be 0.89 for the training validation (Figure 5, A), and 0.96 for the generalization (Figure 5, B). The second LR algorithm achieved an accuracy of 96.8% with a sensitivity of 92.3% and specificity of 98.8% using all 3 features (ie, NGAL, UOP, and creatinine). Area under the ROC curve for this particular algorithm was determined to also be 0.89 for the training validation (Figure 5, C) and 0.97 for generalization (Figure 5, D).
Early recognition of AKI can reduce the incidence and severity of renal dysfunction for these at-risk patients.16 Studies suggest AKI prevalence is as high as 67% among postoperative intensive care unit patients,17 up to 53.3% in burn-injured patients,3 and 34.3% for combat casualties from recent conflicts.1 The bidirectional extrapolation of civilian burn/trauma research, including AKI, to combat casualty care has proven valuable in recent years—often providing new perspectives and redefining standards of care for both populations.18,19 Our study leverages multicenter (2-site) observations made in the civilian burn and trauma setting to establish a proof of concept for ML-enhanced POC AKI biomarker testing.
Both creatinine and UOP remain the primary means for recognizing AKI as defined by the KDIGO criteria, with UOP being often used out of convenience during acute fluid resuscitation in burns and trauma. Numerous AKI biomarkers have since been investigated to overcome the known limitations of creatinine and UOP.7–11 NGAL in particular shows good performance in the burn population when combined with natriuretic peptide testing.9,10,12,13 Not surprisingly, in our present study, we observed NGAL concentrations were significantly higher in both burned and nonburned trauma patients presenting with AKI. Among severely burned patients, plasma creatinine and UOP were statistically similar in AKI and non-AKI populations—underscoring noted deficiencies of these contemporary biomarkers of renal function and mirroring observations identified in previous studies.9,10,12,13 Among nonburned trauma patients, creatinine and UOP were significantly different between AKI and non-AKI patients. Receiver operator characteristic curve analysis using prospectively collected samples showed NGAL outperforming other AKI biomarkers in terms of accuracy, sensitivity, and specificity. These findings are tempered because NGAL was not available as a POC test and biomarker interpretation was affected by confounding factors such as nonspecific inflammation.
Our study overcame these limitations by developing the first handheld, field deployable, quantitative NGAL and creatinine test platform for blood and urine. Analytic accuracy of the prototype POC analyzer for both NGAL and creatinine assays was comparable to the reference methods. We identified 4 outlier NGAL measurements. These samples were visually noted to have fibrin; however, there was insufficient volume to recentrifuge and retest. Imprecision of the POC platform was higher for the NGAL assay and may be attributed to the use of both prospectively collected and banked samples. Nonetheless, the performance was comparable to CVs reported in previous literature20,21 and acceptable for use as a single screening test for AKI when compared to creatinine and UOP.
Uniquely, the prototype device was also further enhanced by ML produced via the automated MILO platform, and without using natriuretic peptides in conjunction with NGAL, and the first use of ML with such a POC application for AKI. Machine learning algorithms using LR enhanced AKI prediction when using NGAL, creatinine, and UOP. Interestingly, this is in contrast to previous ML AKI studies where k-NN and SVM were optimal models.12,13 Differences may be due to the multicenter nature of the study and the use of the automated MILO platform to develop AKI algorithms, which is able to identify more models than previous techniques.
Limitations include the modest sample size and the observational nature of the study. Differences in NGAL and laboratory creatinine methodologies were controlled in this study by performing all reference testing at site B. Nonetheless, creatinine exhibits high interassay variability despite standardization efforts; therefore, our site B–derived algorithms should be validated against other US Food and Drug Administration–approved test platforms. Variability due to shipping of samples may be a possibility; however, samples were stored and tested within published stability requirements for each analyte.22,23
NGAL improves recognition of AKI when used alone or in conjunction with creatinine and UOP. Machine learning algorithms further enhance the predictive power of NGAL in adult burn and trauma populations. The use of an automated ML platform also identified better-performing algorithms than previous studies. Development of a handheld, field deployable POC device for quantitative NGAL/creatinine measurement synergizes ML algorithms to enable accurate clinical decision-making by combat casualty and emergency responders in the field. Future studies are needed to fully integrate ML algorithms into POC devices and evaluate them in prospective studies.
We thank Kelly Lima, BS, for supporting the specimen collection at UC Davis.
The study was funded by the United States Air Force/Air Force Material Command En Route Care Grant (Award No.: FA8650-17-2-6G15). The machine learning acute kidney injury algorithms and the Machine Intelligence Learning Optimizer (MILO) platform were developed previously by Rashidi, Albahra, and Tran through the Regents of the University of California. The MILO software is an intellectual property of the University of California. Nanomix, Inc, developed the point-of-care platform and neutrophil gelatinase-associated lipocalin and creatinine assays under the same grant.
The other authors have no relevant financial interest in the products or companies described in this article.
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of 711th Human Performance Wing or the US Government. This material is based on research sponsored by 711th Human Performance Wing under agreement No. FA8650-15-2-6605. The US Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon. This item was cleared for distribution A: Clearance No. 88 ABW-2019-6166.