Context.—

Disease courses in COVID-19 patients vary widely. Prediction of disease severity on initial diagnosis would aid appropriate therapy, but few studies include data from initial diagnosis.

Objective.—

To develop predictive models of COVID-19 severity based on demographic, clinical, and laboratory data collected at initial patient contact after diagnosis of COVID-19.

Design.—

We studied demographic data and clinical laboratory biomarkers at time of diagnosis, using backward logistic regression modeling to determine severe and mild outcomes. We used deidentified data from 14 147 patients who were diagnosed with COVID-19 by polymerase chain reaction SARS-CoV-2 testing at Montefiore Health System, from March 2020 to September 2021. We generated models predicting severe disease (death or more than 90 hospital days) versus mild disease (alive and fewer than 2 hospital days), starting with 58 variables, by backward stepwise logistic regression.

Results.—

Of the 14 147 patients, including Whites, Blacks, and Hispanics, 2546 (18%) patients had severe outcomes and 3395 (24%) had mild outcomes. The final number of patients per model varied from 445 to 755 because not all patients had all available variables. Four models (inclusive, receiver operating characteristic, specific, and sensitive) were identified as proficient in predicting patient outcomes. The parameters that remained in all models were age, albumin, diastolic blood pressure, ferritin, lactic dehydrogenase, socioeconomic status, procalcitonin, B-type natriuretic peptide, and platelet count.

Conclusions.—

These findings suggest that the biomarkers found within the specific and sensitive models would be most useful to health care providers on their initial severity evaluation of COVID-19.

The COVID-19 pandemic has fundamentally altered the way health data are collected, analyzed, and released to healthcare professionals and the general public. From the earliest availability of COVID-19 data, it was clear that patients were presented with highly variable clinical courses. Some patients recover quickly and without sequelae, while others require extensive hospitalization and prolonged supportive care.1  As of December 6, 2022, more than 640 million people had been infected and more than 6.6 million had died from this illness worldwide.2 

Several early analyses attempted to utilize laboratory testing to identify the specific patient populations at risk for more severe outcomes. Many of these studies reviewed the statistical differences between the respective cohorts on individual laboratory tests. One early study identified 6 biomarkers showing differences in distributions between survivors and nonsurvivors.3  Additional studies showed that temporal changes in certain biomarkers, such as D-dimer, may identify patients at risk.46  Some studies focused specifically on the host inflammatory response and excessive cytokine release (cytokine storm) and had similar findings for markers such as interleukin-6 and procalcitonin (PCT).79  Many studies were limited to China and do not represent the population demographics of the United States of America.1013 

While these studies provide some insight into the question of identifying patients at risk for negative outcomes, they also have several drawbacks. Most used analysis of variance between 2 patient groups; however, statistical differences do not always correlate to clinically useful predictions, and these studies used data throughout the disease course.14,15  Clinicians would like to be able to identify patients at risk at their initial encounter. Therefore, we sought to use the more comprehensive predictive modeling afforded by logistic regression, across dozens of parameters at the patient point of entry to care, to assess risk of severe disease, including mortality and prolonged hospitalization. The purpose of this study was to determine which demographic, clinical, and laboratory variables, at the time of initial patient contact, might be useful in predicting severe versus mild outcome, and to subsequently develop an algorithm that could be applied to patients with newly diagnosed COVID-19. In addition, race and ethnicity were included as part of the variables of the modeling process in our study to address any potential effect.

Study Population

Montefiore Health System (MHS) is an integrated healthcare delivery network comprising 4 hospitals totaling more than 1400 beds. MHS primarily serves the Bronx, New York, one of the most diverse and poorest urban communities in the United States. The MHS has 93 000 annual hospital admissions and nearly 300 000 emergency department visits per year. With 35.4% of Bronx residents being foreign born and 27.3% living in poverty, MHS was particularly struck by the COVID-19 pandemic in March 2020 and exhibited some of the highest rates of COVID-19 hospitalizations and deaths of the 5 boroughs of New York City.16 

Deidentified data were retrospectively collected for all patients who tested positive or negative for SARS-CoV-2 with a polymerase chain reaction assay at any of the 3 main tertiary care hospitals of the MHS from March 1, 2020, to September 30, 2021—this time frame encompasses multiple waves of the pandemic. Race and ethnicity were obtained from the medical records of patients. The study was approved by the Albert Einstein College of Medicine (Bronx, New York) Institutional Review Board. Due to the retrospective nature of the study, informed consent was waived.

We studied data from the 14 147 patients (Tables 1 and 2) from MHS. Fifty-eight variables (Table 3), including demographics, clinical parameters, and biomarker tests results, were available for the analysis. All data, including the biomarker testing data and emergency room admission, were queried using the institution’s electronic data warehouse. The electronic data warehouse is an industry-standard database for the common electronic medical records of laboratory, pharmacy, and admission, discharge, and transfer for inpatient, emergency, and outpatient care.

Table 1.

Study Population Demographics

Study Population Demographics
Study Population Demographics
Table 2.

Percentage of Patients With Severe Outcome by Demographic Groups

Percentage of Patients With Severe Outcome by Demographic Groups
Percentage of Patients With Severe Outcome by Demographic Groups
Table 3.

Variables Studied in the Models

Variables Studied in the Models
Variables Studied in the Models

The area deprivation index is a composite measure of 17 census variables designed to describe a region’s socioeconomic disadvantage based on income, education, household characteristics, and housing. This index has been validated for a range of health outcomes across several disease domains.17 

Socioeconomic status (SES) was computed as the sum of standardized variables (z-scores) related to education, occupation, and economic aspects based on the United States Census Bureau, 2015–2019 ACS 5-Year Estimate, accessed through GeoLytics (GeoLytics Inc, East Brunswick, New Jersey).
  • z_AtLeastBachelors: The z-score of the proportion of people with at least a bachelor’s degree. It is calculated by subtracting the average proportion of people with a college degree from the actual proportion, and then dividing by the standard deviation.

  • z_HighSchool: The z-score of the proportion of people with a high school diploma. It is calculated by subtracting the average proportion of people with a high school diploma from the actual proportion, and then dividing by the standard deviation.

  • z_occupation: The z-score of the proportion of people employed in management, professional, and related occupations. It is calculated by dividing the sum of the employed male and female populations in these occupations by the total employed civilian population (with a minimum value of 0.5 to prevent division by zero). The average proportion of people in these occupations is then subtracted from the result, and the difference is divided by the standard deviation.

  • z_ADO1999: The z-score of the ratio of aggregate interest to household income in 1999. It is calculated by dividing the aggregate interest by the household income in 1999 (with a minimum value of 0.5 to prevent division by zero). The average ratio is then subtracted from the result, and the difference is divided by the standard deviation.

  • z_MedHomVl: The z-score of the median home value. It is calculated by taking the logarithm base 10 of the median home value (with a minimum value of 0.5 to prevent a logarithm of zero), subtracting the average logarithm of the median home value, and dividing the difference by the standard deviation.

  • z_MedHHinc: The z-score of the median household income. It is calculated by taking the logarithm base 10 of the median household income (with a minimum value of 0.5 to prevent a logarithm of zero), subtracting the average logarithm of the median household income, and dividing the difference by the standard deviation.

The average and standard deviation were the state average based on the patient’s address.

Patient outcome was defined as severe (death or ≥90 days of hospital stay), moderate (alive but in the hospital for 2–89 days), or mild (alive and hospital stay ≤1 day).

Statistical Analysis

R (version 3.6.1, 2019, The R Foundation for Statistical Computing, Vienna, Austria) was used for all data processing, statistical analysis, and modeling. Logistic regression with cross validation (“cv.glmnet” from R package “glmnet”) was used to select critical variables most helpful in predicting patient outcome (1 = severe; 0 = mild). The data were randomly split into a training set (70%) and testing set (30%). The training set was used for building models and selecting variables, while the testing set was used for predicting patient outcomes and evaluating model performance. Backward stepwise regression was used to evaluate the importance of each variable in predicting patient outcome. We started with all (n = 53) variables, including demographic data, clinical parameters, and biomarkers, and then removed variables with the least significance (largest P value), and rebuilt the model with selected variables. Next, we removed multiple variables with P > .05, one by one, or variables known to have high clinical correlations and then reevaluated the model. The optimized probability cutoff for each model was picked at the point where the sum of probability and sensitivity was the highest and was used to generate predicted binary outcome. By comparing the predicted outcome and the true patient outcome, we generated the receiver operating characteristic (ROC) curve, area under the curve (AUC), positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity for various models. This way we ended with multiple models based on the various metrics. However, we selected 3 models using the minimum number of variables with the highest AUC, sensitivity, or specificity. The intermediate model with 37 variables serves as a reference point for these 3 selected models; therefore, it is named as “inclusive model (A).”

Patient Baseline Characteristics

Table 1 shows the demographics for the 14 147 patients included in the study. Ages ranged from 0 to 103 years, with a mean of 57.7 years and an SD of 20.8 years. Of 14 146 patients with sex information, 7340 (52%) were female and 6806 (48%) were male. Of all 14 147 patients, 7282 (51.5%) were White, 4555 (32%) were Black, 1059 (7.5%) were of Asian descent, 460 (3%) were reported as other, and 791 (6%) did not report race. Ethnicity was reported as Hispanic for 6233 (44%) of patients and as non-Hispanic for 6876 (49%) of patients. The mean body mass index (BMI) was 29.4 (SD = 7.3); Out of 10 889 patients with BMI information, 3859 (35%) of patients had a BMI >30, 3962 (36%) had a BMI of 25 to 30, and 3068 (28%) had a BMI <25.

Older patients had a higher rate of severe outcomes. Rates of severe outcomes were higher for patients >65 years old (1829 of 3020 patients, 60.6%; 95% CI 58.8%–62.3%) than for those 40 to 64 years old (627 of 1984 patients, 31.6%; 95% CI 29.6%–33.6%), 18 to 39 years old (81 of 756 patients, 10.7%; 95% CI 8.5%–12.9%), or <18 years old (13 of 164 patients, 7.9%; 95% CI 3.8%–12.1%). Rates of severe outcomes were higher for Black patients (817 of 1869 patients, 43.7%; 95% CI 41.5%–46.0%) than for White patients (1163 of 2944 patients, 39.5%; 95% CI 37.7%–41.3%) and were lower for Hispanic patients (911 of 2446 patients, 37.2%; 95% CI 35.3%–39.2%) than for non-Hispanic patients (1253 of 2841 patients, 44.1%; 95% CI 42.3%–45.9%). Rates of severe outcomes increased with decreasing BMI category: 682 of 1352 patients (50.4%; 95% CI 41.5%–46.0%) for BMI <25 versus 763 of 1662 patients (45.9%; 95% CI 43.5%–48.3%) for BMI 25 to 30 and 629 of 1536 patients (40.9%; 95% CI 38.5%–43.4%) for BMI >30 (Table 2).

Morbidity and Mortality of Patients

Among 11 406 patients with hospital stay information, 4019 (35%) were in the hospital for 1 day or less, 5768 (51%) for 4 days or less, and 8577 (75%) for 11 days or less. Two hundred sixty-two (2.3%) of patients were in the hospital for more than 60 days and 114 (1%) for more than 88 days. At the end of the data collection period among the total 14 147 patients, 11 786 (83%) of patients were alive, while 2214 (16%) were deceased (both inpatient and outpatient) and about 147 (1%) were lost to follow-up (Table 1).

Using the first date of a SARS-CoV-2 polymerase chain reaction–positive test result as a diagnosis date, we looked at the distribution of number of days from COVID-19 diagnosis to death. The distribution followed a negative power distribution (Figure 1, Supplemental Table 1, in the supplemental digital content, containing 5 tables, available at https://meridian.allenpress.com/aplm in the October 2023 table of contents). which showed the number of subjects with fewer days to death was the largest and dropped dramatically and quickly as the days to death increased. Among the 1883 deceased patients followed up to 1 year, 764 (40.6%) died within the first 7 days, 200 (10.6%) died between 7 and 10 days, 422 (22.4%) died between 10 and 20 days, and 470 (25%) died 20 days after diagnosis. Overall, of this group of deceased patients, 1699 (90.2%) died within the first 60 days after diagnosis; after 60 days, slightly less than 1% of the patients died for each additional 10 days, up to a year.

Figure 1.

The distribution of number of days from COVID-19 diagnosis to death.

Figure 1.

The distribution of number of days from COVID-19 diagnosis to death.

Close modal

Clinical Parameters, Biomarker Evaluation, and Regression Modeling

Evaluation of the variables revealed statistically significant differences and distributions of biomarkers and clinical parameters between the severe and mild outcome groups. These include albumin, C-reactive protein (CRP), D-dimer, ferritin, lactic dehydrogenase (LDH), PCT, diastolic blood pressure (DBP), and SES (Figure 2, A through I). The mean (SD) values for all the variables in the severe and mild outcome groups are shown in Supplemental Table 2. Following backward stepwise regression, 4 models were identified, each representing some aspect of superior performance in terms of predictive strength.

Figure 2.

Distribution plots comparing the severe versus mild outcome groups by biomarkers. A, Albumin. B, C-reactive protein. C, D-dimer. D, Ferritin. E, Lactic dehydrogenase. F, Platelet count. G, Procalitonin. H, Diastolic blood pressure (DBP). I, Socioeconomic status (SES). *** indicates statistical significance in the difference of biomarker distribution between severe and mild groups.

Figure 2.

Distribution plots comparing the severe versus mild outcome groups by biomarkers. A, Albumin. B, C-reactive protein. C, D-dimer. D, Ferritin. E, Lactic dehydrogenase. F, Platelet count. G, Procalitonin. H, Diastolic blood pressure (DBP). I, Socioeconomic status (SES). *** indicates statistical significance in the difference of biomarker distribution between severe and mild groups.

Close modal

Not all patients had all available clinical parameter and biomarker data. Within the first day of visit, a complete blood count (including the platelet count) was ordered for 90% of patients, albumin for 83%, D-dimer for 52%, ferritin for 47%, PCT for 36%, CRP for 32%, LDH for 27%, and B-type natriuretic peptide (BNP) for 20%. Therefore, the number of patients varied depending on the availability of results for various set of variables.

Inclusive Model (A)

This model is an intermediate one with a larger number of variables than the other 3 selected models. It had 37 variables (Table 4), with an AUC of 0.81, sensitivity of 78%, specificity of 73%, PPV of 72%, and NPV of 79% (Table 5; Figure 3, A).

Table 4.

The Variables Included in Each Modela

The Variables Included in Each Modela
The Variables Included in Each Modela
Table 5.

Impact of Removing SES From the 4 Modelsa

Impact of Removing SES From the 4 Modelsa
Impact of Removing SES From the 4 Modelsa
Figure 3.

AUC curves for the 4 models. Model A: inclusive; model B: ROC; model C: specific; model D: sensitive; with_SES–TRUE: models with SES as a predicting variable; with_SES–FALSE: Models without SES as a predicting variable. Abbreviations: AUC, area under the curve; ROC, receiver operating characteristic; SES, socioeconomic status.

Figure 3.

AUC curves for the 4 models. Model A: inclusive; model B: ROC; model C: specific; model D: sensitive; with_SES–TRUE: models with SES as a predicting variable; with_SES–FALSE: Models without SES as a predicting variable. Abbreviations: AUC, area under the curve; ROC, receiver operating characteristic; SES, socioeconomic status.

Close modal

If we removed SES from this model (Model A − SES), the AUC decreased to 0.76, sensitivity decreased to 62%, and specificity increased to 81%. The NPV decreased to 74% and the PPV remained unchanged at 72% (Table 5; Figure 3, A).

This inclusive model could, through the elimination of variables, attain an equivalent or higher maximation of outcome metrics. Thus, we optimized for various metrics with the following 3 models: ROC model (B), specific model (C), and sensitive model (D).

ROC Model (B)

This model prioritizes the discriminatory ability of the variables to differentiate the population of patients with severe disease against the population with mild disease. Therefore, this model demonstrates the largest AUC of the ROC. The following variables were included in this model: PCT, age, DBP, LDH, albumin, ferritin, SES, platelet count, and BNP (coefficients in Supplemental Table 3). This model had an AUC of 0.85, sensitivity of 86%, specificity of 68%, PPV of 71%, and NPV of 84% (Table 5; Figure 3, B).

When we removed SES from this model (Model B − SES), all the performance criteria decreased, the AUC to 0.74, sensitivity to 75%, specificity to 67%, PPV to 64%, and NPV to 78% (Table 5; Figure 3, B).

Specific Model (C)

This model prioritizes the ability of the data to falsely classify a patient in the severe disease group (ie, the model with the highest demonstrated specificity) and contained the following variables: PCT, age, CRP, DBP, LDH, D-dimer, albumin, ferritin, SES, and BNP (coefficients in Supplemental Table 3). This model had an AUC of 0.83, sensitivity of 77%, specificity of 81%, PPV of 78%, and NPV of 80%. This model had the highest specificity and PPV of all the 4 models (Table 5; Figure 3, C).

When we removed SES from this model (Model C − SES), all the performance criteria decreased, the AUC to 0.76, sensitivity to 71% and specificity to 68%. PPV to 63%, and NPV to 75% (Table 5; Figure 3, C).

The subsequent addition of individual variables to the specific model (C) did not show significant changes to the AUC, which remained between 0.80 and 0.83. This confirmed the nonsignificance of those variables, including aspartate transferase (AST):alanine transaminase (ALT) ratio, BMI, creatinine, sex, glucose, heart rate, pulse oxygen, race, respiratory rate, systolic blood pressure, and temperature (Supplemental Table 4). Only the addition of estimated glomerular filtration rate decreased the AUC to 0.79. Further verification of our models was performed by the removal of individual variables from the specific model (C). Removing any of the following variables decreased the AUC: age, albumin, BNP, CRP, ferritin, LDH, and PCT. The removal of D-dimer did not impact the AUC of the Specific Model (C).

Sensitive Model (D)

This model, which prioritizes the need to identify patients of concern, is the model with the highest sensitivity and contains the following variables: PCT, age, CRP, DBP, LDH, albumin, ferritin, SES, and platelet count (coefficients in Supplemental Table 3). It had an AUC of 0.80, sensitivity of 91%, specificity of 60%, PPV of 62%, and NPV of 90%. This model had the highest NPV.

When we removed SES from this model (Model D − SES), the AUC decreased to 0.75, sensitivity to 61%, and NPV to 76%; however, specificity increased to 80% and so did the PPV, to 67% (Table 5; Figure 3, D).

We found models using initial biomarkers that perform well in distinguishing between severe and mild outcomes in patients with COVID-19. Most patients do well and recover in several weeks from the acute illness, but many others proceed to hospitalization or succumb. Many demographics, clinical parameters, and biomarkers of disease and infectious process have been suggested to help prognosticate how well patients will do over the course of the disease. Prior studies have been limited to small numbers of patients or examined the biomarkers during the most severe phases of the disease and not during the initial phases of the disease, which may therefore be insufficient to develop prognostic models.14,15  Our goal in this study was to examine all the variables mentioned above in context with the early initial contact with the patient, then develop models that would be able to predict which patient had a higher probability of having a mild disease course, in contrast to those who would have a higher probability of a severe course. The severe course we defined as when patients succumbed to the disease or had a hospital stay exceeding 90 days. The mild course was defined as when patients survived the illness and had less than 1 day in the hospital.

The inclusive model (A), with the largest number of variables (n = 37), has a reasonable AUC for its ROC curve, and the sensitivity and specificity are acceptable. We were able to improve on the model by eliminating variables that had correlation with other variables through backward selection. As a result, 3 models stand out. The ROC model (B) had 9 variables and the highest AUC. The specific model (C) had 10 variables and the highest specificity and PPV. The sensitive model (D) had 9 variables and the highest sensitivity and NPV.

The ROC, specific, and sensitive models (B, C, and D) have minor differences in terms of the variables included. They share age, DBP, the socioeconomic marker SES, PCT, albumin, ferritin, LDH, and platelet count. Age is the most important demographic factor. The SES values are important, as well, and can be determined from the patient’s address. Since SES data were critical to all 4 models and boosted each model’s predictive power, we recommend that electronic medical record systems consider a computer-based tool to do the calculation to apply the model at the bedside. The only clinical parameter that is important in the models is the DBP. All the biomarker tests are readily available at clinical laboratories, although PCT is not frequently ordered. The ROC and specific models (B and C) share BNP, another easily available test; however, it was removed from the sensitive model (D) and CRP was added. The specific model (C) also includes CRP and D-dimer. We observed that any variable alone is sufficient to provide a good prediction but combining these small groups of variables provided a better predictive ability.

The variables that remained within the models fall in several categories, including demographics, clinical parameters, and laboratory biomarkers. Age was the only demographic variable that stayed in the models, whereas sex, race, and ethnicity did not. Black individuals fared more poorly than White individuals in terms of disease severity. Meanwhile, Hispanic individuals demonstrated a lower rate of severe disease than did non-Hispanic individuals. This is a similar finding to a study by Asch et al,18  where Black patients had a slightly greater odds of 30-day inpatient mortality or discharge to hospice care than did White patients (odds ratio [OR], 1.11; 95% CI, 1.03–1.19).18  However, the odds ratio for mortality or equivalent was not statistically significant after adjustment for hospital-level fixed effects between Black and White patients (OR, 1.02; 95% CI, 0.94–1.10).18  The cause of such differences in our data set is not readily apparent, because race and ethnicity were not included in the final models. These variables may be sufficiently covered by SES and may be related to underlying health conditions represented by laboratory biomarkers.

Of the clinical parameters, DBP is the only vital sign that stayed within the models. It is not clear why DBP is important, but its dominance may have been the reason that systolic blood pressure was eliminated. Because COVID-19 has a large impact on the respiratory system, and lung compromise is a common cause of mortality, one would expect respiratory rate and pulse oximetry (both direct measures of lung compromise) to remain as significant factors. Along the same lines, our data set showed an inverse relationship between BMI and the proportion of patients with severe disease. This observation runs counter to the prevailing consideration that being overweight or obese contributes to poorer outcomes. The cause of such differences is obscured by the other variables because BMI was not a variable that stayed in the final model. Patients with elevated BMIs tended to be younger than the others, but there is little difference in average age between patients with normal and low BMIs. Several other studies have also found an inverse relationship between BMI and mortality or disease severity.19,20 

In our study, the laboratory biomarkers associated with severe outcome that remained in the final models are associated with inflammation and sepsis, as shown in other studies. These include ferritin,21  CRP,22  D-dimer,23  PCT, LDH,24  BNP,25  and thrombocytopenia.26  Ferritin and CRP may act as acute-phase reactants, which have been shown to be increased in COVID-19.21,27  The average ferritin levels were twice as high among patients with mild disease compared to healthy individuals and twice as high among patients with severe disease as among those with mild disease. Masi et al26  noted that COVID-19 is associated with an increase in procoagulants. Elevation of D-dimer may be an indication of the early development of microthrombi. The platelet count decreases in patients with more serious disease, which may be an indication of platelet consumption.28  Likewise, the neutrophils may be acting in response to the infection, even though there is a decrease in the severe disease group (mean of 7.3 in severe versus 5.3 in mild group, P < .001). In addition to heart failure, BNP can be raised in patients with renal dysfunction and sepsis and may predict survival in patients presenting with severe sepsis.25  LDH is found in many tissues and is thus a general marker for tissue damage for erythrocytes and organs such as the kidneys, lungs, muscle, liver, and heart.24  Even though it is a nonspecific marker, LDH should be included in the initial encounter with the COVID-19 patient. Low albumin can be a marker of poor nutrition or poor hepatic function. Poor hepatic function may predispose patients to more severe outcomes or suggest a debilitated state, affecting the immune response. Since sensitive markers for sepsis were found in our severe group and remained in our models, it can be deduced that patients who had more severe outcomes with COVID-19 infection may have been experiencing early stages of sepsis.

Interestingly, many laboratory biomarkers did not remain in our models. Electrolytes, such as sodium, potassium, chloride, carbon dioxide, calcium, and anion gap, did not prove to be useful in the models. Initially, at least, there does not appear to be an acid-base disturbance. Initial glucose was not kept in the models in the process of backward stepwise selection, although diabetes is supposedly a risk factor for more serious illness. Bilirubin was not included in the models, but bilirubin is more likely to be elevated in chronic disease states or massive injury and thus not seen initially. Although renal function may be affected by COVID-19, the effect on patients in the initial phases of the disease appears minimal: the mean of the creatinine values was similar in the severe and mild disease groups, and blood urea nitrogen (BUN) was only slightly elevated in the severe disease group. Alkaline phosphatase, ALT, and AST were excluded from the models. The mean ALT was higher in the severe group (42.5 U/L, P < .001) than in the mild disease group (35.6 U/L). This finding suggests that there may have been some degree of liver injury, but any contribution of alkaline phosphatase, AST, and ALT was probably covered by LDH. Total protein was excluded from the models, probably because low albumin was included.

Even though some of the excluded biomarkers showed statistical association with severe outcome, they showed sufficient correlation with other biomarkers to fall out of the models. Bilirubin correlated with D-dimer (r = 0.4), glucose correlated with PCT (r = 0.5), and creatinine correlated with anion gap (r = 0.4), BNP (r = 0.7), and BUN (r = 0.7). In addition, BUN correlated with BNP (r = 0.4) and PCT (r = 0.3). Calcium was associated with low albumin (r = 0.6). Alkaline phosphatase showed a strong correlation with PCT (r = 0.5). ALT and AST showed a strong correlation with LDH (r = 0.5 and r = 0.4, respectively). Not all biomarkers that were significantly associated with an outcome have enough patient impact to be useful when considering multiple biomarkers (Supplemental Table 3). We are suggesting that the biomarkers of the specific and sensitive models would be most useful to order when initially diagnosing a patient with COVID-19.

Many studies highlight only single biomarkers as risk factors for severe COVID-19 disease. In a meta-analysis regarding serum ferritin, Kaushal et al21  reported that serum ferritin was higher in nonsurvivors than survivors, as well as for patients with severe and critical versus mild to moderate disease. However, there was large heterogeneity among studies, and they did not evaluate the effect of other comorbidities/confounders.27  Meanwhile, Zinellu et al29  investigated using the De Ritis ratio (AST:ALT) on admission as a prognostic marker of in-hospital mortality in 105 patients. The AST:ALT ratio (cutoff was 1.49) was lower in survivors than nonsurvivors (1.25 versus 1.67, respectively), with an ROC AUC of 0.70 (95% CI, 0.603–0.787), sensitivity of 74%, and specificity of 70%.29  Although this study used initial lab values, the cohort was small, and larger studies are required to confirm the significance of the AST:ALT ratio. Another small study by Karsli et al30  investigated the biomarker soluble P-selectin in 80 patients with COVID-19 infection. They separated COVID-19–positive patients into those with mild to moderate pneumonia and those with severe pneumonia requiring admission into the intensive care unit.30  They found that soluble P-selectin could discriminate between patients who needed intensive care unit treatment and those who did not, with an AUC of 0.70, a sensitivity of 76.9%, and a specificity of 51.9%, at a cutoff of 6.12 ng/mL (P = .005).30  Other tests that showed a significant difference were white blood cells, platelet count, CRP, BUN, creatinine, AST, ALT, total bilirubin, D-dimer, high-sensitivity troponin T, creatinine kinase–myocardial band, and lactate. However, the authors did not attempt to use these in context with their study.30  Soluble P-selectin testing is not widely available, and more common laboratory values may have proved more useful in terms of prognosis.

Other studies have reported abnormal values for biomarkers found during COVID-19 infection. Ferritin was lower for survivors versus nonsurvivors, as was the initial AST:ALT ratio, but these studies were limited by lack of evaluation for comorbidities and small size.27,28  Calprotectin could distinguish between the need for intensive care unit admission versus general ward care, but it correlates with tests such as neutrophil count, D-dimer, and CRP.2628  Soluble P-selectin and calprotectin are not readily available in most hospitals and clinics. A machine-learning prediction model could distinguish between death and survival during disease, with the differences most pronounced near death.3133 

Our study differs from other studies because it focuses on data available on the initial encounter with a patient, rather than later during disease, and associates such initial data with the outcomes of severe or mild disease. Construction of an algorithm to predict disease course from initial results is much more difficult than one using more extreme values, because the worst values tend to occur toward the end of the disease process.

The purpose of the models is to provide a clinician with information on the relative likelihood of a patient having a severe disease course. The specific and sensitive models (C and D) can be used for different purposes. The specific model (C), with higher specificity and PPV than the other models, can be useful in determining the probability of having the severe disease course. The sensitive model (D), with the higher sensitivity and NPV than the other models, can be useful in determining the probability of the patient having the milder disease course.

Table 6 lists a few examples of real patients from 40 to 60 years old to demonstrate the prediction of the outcome using models C and D. Actual subject biomarker values are shown in the table, as well as the probabilities as calculated by either the specific or sensitive models. The last row, labeled “actual outcome,” has a “0” for mild or “1” for severe outcome. The table also shows the predicted outcome and the predicted probability calculated by the models.

Table 6.

Examples of Predicting the Patient Outcome Using Specific Model C

Examples of Predicting the Patient Outcome Using Specific Model C
Examples of Predicting the Patient Outcome Using Specific Model C

The patient in the last column with severe outcome in the specific model (Table 6) was younger than most of the other subjects in this selection, with normal platelet count and BNP and mildly elevated PCT (0.78 ng/mL). However, this patient had a CRP level threefold greater than the upper limit, an elevated LDH, slightly decreased albumin, and extremely high ferritin. The probability of severe outcome calculated by the model matched the actual outcome.

Similarly, Table 7 shows values for subjects evaluated by the sensitive model (D) with higher sensitivity and NPV than the other models, which can be useful in determining the probability of the patient having the milder disease course. The probabilities matched the actual outcome. The ages are about the same for both groups. All the patients with severe outcomes had elevated PCT and CRP, but so did some of the subjects without serious outcomes. The DBP varied for both groups. Not all patients in the severe group had elevated LDH, nor did all in the nonsevere group have LDH within normal limits. Albumin tended to be lower in patients with severe disease. All patients with severe disease had elevated ferritin, but so did many of those with mild disease. Platelet counts varied considerably within both groups. No clear pattern manifested. In a multifactorial dependency, the models offer a convenient pathway for predicting outcome and estimating probability of severe disease.

Table 7.

Examples of Predicting the Patient Outcome Using Sensitive Model D

Examples of Predicting the Patient Outcome Using Sensitive Model D
Examples of Predicting the Patient Outcome Using Sensitive Model D

Our study had a few limitations. All data and biomarkers were not available for all patients in this cohort. For instance, albumin was only tested for 83%, ferritin for 47%, and BNP for 20% of the patients. Because we did not have information on all biomarkers for all patients, the number of patients in our modeling cohort is less than the overall patient population. However, our patient cohorts were still large enough to elucidate effective models. Another limitation is that we only worked with 1 data set, so its performance may have been overestimated. Given the higher performance of our selected models, future studies with more independent data set could be performed.

Based on our study, the biomarkers included in the sensitive and the specific models would be useful to test when initially evaluating a patient with COVID-19. Most of these tests are inexpensive and readily available. On the other hand, variables such as demographics, race and ethnicity, and clinical parameters do not add additional information. Therefore, consideration should be given to testing this limited set of biomarkers (albumin, BNP, CRP, D-dimer, ferritin, LDH, platelet, and PCT).

Thanks to Rita Vaswani for her administrative support. We also thank Kelly Schrank, MA, ELS, of Bookworm Editing Services LLC for her editorial services in preparing the manuscript for publication.

1.
Zhu
N,
Zhang
D,
Wang
W,
et al.
A novel coronavirus from patients with pneumonia in China, 2019
.
N Engl J Med
.
2020
;
382
(8)
:
727
733
.
2.
World Health Organization
.
WHO coronavirus disease (COVID-19) dashboard
.
https://covid19.who.int. Accessed December 6, 2022
.
3.
Aloisio
E,
Chibireva
M,
Serafini
L,
et al.
A comprehensive appraisal of laboratory biochemistry tests as major predictors of COVID-19 severity
.
Arch Pathol Lab Med
.
2020
;
144
(12)
:
1457
1474
.
4.
Tang
N,
Li
D,
Wang
X,
Sun
Z.
Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia
.
J Thromb Haemost
.
2020
;
18
(4)
:
844
847
.
5.
Thachil
J,
Tang
N,
Gando
S,
et al.
ISTH interim guidance on recognition and management of coagulopathy in COVID-19
.
J Thromb Haemost
.
2020
;
18
(5)
:
1023
1026
.
6.
Zhou
F,
Yu
T,
Du
R,
et al.
Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study
.
Lancet
.
2020
;
395
(10229)
:
1054
1062
.
7.
Xiong
Y,
Liu
Y,
Cao
L,
et al.
Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients
.
Emerg Microbes Infect
.
2020
;
9
(1)
:
761
770
.
8.
Ng
PC,
Lam
CW,
Li
AM,
et al.
Inflammatory cytokine profile in children with severe acute respiratory syndrome
.
Pediatrics
.
2004
;
113
(1 Pt 1)
:
e7
14
.
9.
Zhang
Y,
Li
J,
Zhan
Y,
et al.
Analysis of serum cytokines in patients with severe acute respiratory syndrome
.
Infect Immun
.
2004
;
72
(8)
:
4410
4415
.
10.
Guan
WJ,
Ni
ZY,
Hu
Y,
et al.
Clinical characteristics of coronavirus disease 2019 in China
.
N Engl J Med
.
2020
;
382
(18)
:
1708
1720
.
11.
Zhang
JJ,
Dong
X,
Cao
YY,
et al.
Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan, China
.
Allergy
.
2020
;
75
(7)
:
1730
1741
.
12.
Huang
C,
Wang
Y,
Li
X,
et al.
Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China
.
Lancet
.
2020
;
395
(10223)
:
497
506
.
13.
Wang
D,
Hu
B,
Hu
C,
et al.
Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China
.
JAMA
.
2020
;
323
(11)
:
1061
1069
.
14.
Kermali
M,
Khalsa
RK,
Pillai
K,
Ismail
Z,
Harky
A.
The role of biomarkers in diagnosis of COVID-19—a systematic review
.
Life Sci
.
2020
;
254
:
117788
.
15.
Malik
P,
Patel
U,
Mehta
D,
et al.
Biomarkers and outcomes of COVID-19 hospitalisations: systematic review and meta-analysis
.
BMJ Evid Based Med
.
2021
;
26
(3)
:
107
108
.
16.
Wadhera
RK,
Wadhera
P,
Gaba
P,
et al.
Variation in COVID-19 hospitalizations and deaths across New York City boroughs
.
JAMA
.
2020
;
323
(21)
:
2192
2195
.
17.
Kind
AJH,
Buckingham
WR.
Making neighborhood-disadvantage metrics accessible—the neighborhood atlas
.
N Engl J Med
.
2018
;
378
(26)
:
2456
2458
.
18.
Asch
DA,
Islam
N,
Sheils
NE,
et al.
Patient and hospital factors associated with differences in mortality rates among African American and white US Medicare beneficiaries hospitalized with COVID-19 infection
.
JAMA Netw Open
.
2021
:
4
(6)
:
e2112842
.
19.
Weiss
A,
Beloosesky
Y,
Boaz
M,
Yalov
A,
Kornowski
R,
Grossman
E.
Body mass index is inversely related to mortality in elderly subjects
.
J Gen Intern Med
.
2008
;
23
(1)
:
19
24
.
20.
Cho
Y,
Cho
Y,
Choi
HJ,
et al.
The effect of BMI on COVID-19 outcomes among older patients in South Korea: a nationwide retrospective cohort study
.
Ann Med
.
2021
;
53
(1)
:
1292
1301
.
21.
Kaushal
K,
Kaur
H,
Sarma
P,
et al.
Serum ferritin as a predictive biomarker in COVID-19. a systematic review, meta-analysis and meta-regression analysis
.
J Crit Care
.
2022
;
67
:
172
181
.
22.
Garnacho-Montero
J,
Huici-Moreno
MJ,
Gutierrez-Pizarray
A,
et al.
Prognostic and diagnostic value of eosinopenia, C-reactive protein, procalcitonin, and circulating cell-free DNA in critically ill patients admitted with suspicion of sepsis
.
Crit Care
.
2014
;
18
(3)
:
R116
.
23.
Han
Y-Q,
Yan
L,
Zhang
L,
et al.
Performance of D-dimer for predicting sepsis mortality in the intensive care unit
.
Biochem Med (Zagreb)
.
2021
;
31
(2)
:
020709
.
24.
Szarpak
L,
Ruetzler
K,
Safiejko
K,
et al.
Lactate dehydrogenase level as a COVID-19 severity marker
.
Am J Emerg Med
.
2021
;
45
:
638
639
.
25.
Brueckmann
M,
Huhle
G,
Lang
S,
et al.
Prognostic value of plasma N-terminal Pro-brain natriuretic peptide in patients with severe sepsis
.
Circulation
.
2005
;
112
(4)
:
527
534
.
26.
Masi
P,
Hekimian
G,
Lejeune
M,
et al.
Systemic inflammatory response syndrome is a major contributor to COVID-19–associated coagulopathy
.
Circulation
.
2020
;
142
(6)
:
611
614
.
27.
Mahroum
N,
Algory
A,
Kiyak
Z,
et al.
Ferritin—from iron, through inflammation and autoimmunity, to COVID-19
.
J Autoimmun
.
2022
;
126
:
102778
.
28.
Vardon-Bounes
F,
Ruiz
S,
Gratacap
M-P,
Garcia
C,
Payrastre
B,
Minville
V.
Platelets are critical key players in sepsis
.
Int J Mol Sci
.
2019
;
20
(14)
:
3494
.
29.
Zinellu
A,
Arru
F,
De Vito
A,
et al.
The De Ritis ratio as prognostic biomarker of in-hospital mortality in COVID-19 patients
.
Eur J Clin Invest
.
2021
;
51
(1)
:
e13427
.
30.
Karsli
E,
Sabirli
R,
Altintas
E,
et al.
Soluble P-selectin as a potential diagnostic and prognostic biomarker for COVID-19 disease: a case-control study
.
Life Sci
.
2021
;
277
:
119634
.
31.
Mahler
M,
Meroni
PL,
Infantino
M,
Buhler
KA,
Fritzler
MJ.
Circulating calprotectin as a biomarker of COVID-19 severity
.
Expert Rev Clin Immunol
.
2021
;
17
(5)
:
431
443
.
32.
Udeh
R,
Advani
S,
de Guadiana Romualdo
LG,
Dolja-Gore
X.
Calprotectin, an emerging biomarker of interest in COVID-19: a systematic review and meta-analysis
.
J Clin Med
.
2021
;
10
(4)
:
775
.
33.
de Guadiana Romualdo
LG,
Mulero
MDR,
Olivo
MH,
et al.
Circulating levels of GDF-15 and calprotectin for prediction of in-hospital mortality in COVID-19 patients: a case series
.
J Infect
.
2021
;
82
(2)
:
e40
e42
.

Author notes

Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the October 2023 table of contents.

Kroll, Bi, Salm, and Kapoor are/were employees of Quest Diagnostics and own stock in Quest Diagnostics. The other authors have no relevant financial interest in the products or companies described in this article.

Supplementary data