Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) immunoglobulin G (IgG) testing is used for serosurveillance and will be important to evaluate vaccination status. Given the urgency to release coronavirus disease 2019 (COVID-19) serology tests, most manufacturers have developed qualitative tests.
To evaluate clinical performance of 6 different SARS-CoV-2 IgG assays and their quantitative results to better elucidate the clinical role of serology testing in COVID-19.
Six SARS-CoV-2 IgG assays were tested using remnant specimens from 190 patients. Sensitivity and specificity were evaluated for each assay with the current manufacturer's cutoff and a lower cutoff. A numeric result analysis and discrepancy analysis were performed.
Specificity was higher than 93% for all assays, and sensitivity was higher than 80% for all assays (≥7 days post–polymerase chain reaction testing). Inpatients with more severe disease had higher numeric values compared with health care workers with mild or moderate disease. Several discrepant serology results were those just below the manufacturers' cutoff.
Severe acute respiratory syndrome coronavirus 2 IgG antibody testing can aid in the diagnosis of COVID-19, especially with negative polymerase chain reaction. Quantitative COVID-19 IgG results are important to better understand the immunologic response and disease course of this novel virus and to assess immunity as part of future vaccination programs.
The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has already infected more than 113 million individuals globally, with the United States contributing more cases than any other country.1,2 As the pandemic continues to expand globally, it is imperative to have a more complete understanding of its extent and spread. Serologic testing for SARS-CoV-2 antibodies has been designed to provide vital information regarding the presence of an adaptive host immune response and can be helpful in clarifying the extent of past infections within a population.3,4
Although there are limited data regarding the presence and duration of immunity, broad adoption of serologic testing in conjunction with longitudinal clinical follow-up is necessary to further clarify this crucial information.5,6 Given increased use of convalescent plasma as an aid in the treatment of severely ill patients, the ability to identify hyperimmune individuals is another area where SARS-CoV-2 serologic testing provides a distinct clinical benefit.7–9 Because clinical findings of multisystem inflammatory syndrome in children may occur weeks after SARS-CoV-2 infection, the Centers for Disease Control and Prevention suggests serologic testing for virus-specific antibodies regardless of polymerase chain reaction (PCR) results in a child.10 Finally, with the development of vaccines, serologic testing will play a critical role in addressing vaccine efficacy. These cases each highlight areas where the utility of serologic testing directly impacts the workup, care, and management of critical patient populations.
As the prevalence of COVID-19 begins to increase in subsequent waves and more people are becoming vaccinated, serologic testing will become critical in identifying immune status. The testing performance of these serologic assays needs to be reevaluated to meet these needs. Given the urgency to develop these tests and lack of standardized material, most SARS-CoV-2 immunoglobulin (Ig) G antibody tests were developed and approved as qualitative assays with a positive result above a defined cutoff. In this study, we investigate the diagnostic performance using different cutoff values and clinical utility of numeric results (index, AU/mL, absorbance, line intensity) that underlie the qualitative result from 6 different SARS-CoV-2 IgG antibody platforms for patients undergoing testing at Montefiore Medical Center, a major New York City hospital.
METHODS
Study Design
In this diagnostic performance study of 6 different SARS-CoV-2 IgG antibody tests, clinical cohorts were selected, and remnant samples were identified to evaluate the specificity, sensitivity, and clinical uses of the SARS-CoV-2 IgG tests. Antibody assay results were compared with SARS-CoV-2 PCR status when available. Only one specimen per patient at a single collection time point was included in this study. Although the methods are qualitative or semiquantitative, numeric results were included for analysis and cutoff values were evaluated.
Testing Platforms
The following 6 COVID-19 IgG antibody tests were evaluated: Abbott Laboratories SARS-CoV-2 IgG (Abbott), SD Biosensor Standard Q COVID-19 IgM/IgG Duo lateral flow assay (LFA; SD Biosensor), Diazyme Laboratories DX-LiteSARS-CoV-2 IgG (Diazyme), Babson Diagnostics first-generation assay aC19G1 (Babson G1), Babson Diagnostics second-generation assay aC19G2 prototype (Babson G2), and the Albert Einstein Laboratories enzyme-linked immunosorbent assay (ELISA) IgG Assay (Einstein).11
The Abbott, Diazyme, and Babson tests were performed on automated analyzers—the Abbott Architect, Diazyme DZ-Lite 3000 Plus, and Siemens Atellica IM, respectively—and testing was performed as per manufacturer's instructions. The Abbott test has a nucleocapsid target and the cutoff is 1.4 index. The Diazyme test has both the nucleocapsid protein and the spike protein (SP) as antigens and has a cutoff of 1.0 AU/mL. the Babson assays have the SP as antigens and have a 1.0 index cutoff. Babson G1 uses spike protein S1 domain (SP1) as the antigen and Babson G2 uses the receptor-binding domain portion of SP. The SD Biosensor Standard Q COVID-19 IgM/IgG Duo assay, an LFA, has a nucleocapsid antigen. According to the manufacturer, any visible band at the IgG test line (with visible control line) is positive. For the purposes of this study, the antigen-antibody reaction produces a visible line; in order to evaluate if the intensity of the line is meaningful, the intensity of the line was visually scored using a 0 to 6 grading scale: 0 was negative and 1 to 6 were positive, with 1 being extremely faint, just visible, and 6 being extremely dark.12 There were 4 independent observers, all of whom read the test result at 10 to 15 minutes as per manufacturer's instructions. When 3 or 4 of the observers came to a consensus, the agreed-on score was used. If there was not a consensus among the readers, the results were averaged for the final result. In cases when the resulting average was less than 1, the results were counted as negative. For the Einstein ELISA, the full spike ectodomain is used for the antigen and produces an absorbance value at A450 with a 0.90 cutoff; testing was performed in duplicate and results were averaged together for the final value. Details of the different tests including antigen target and cutoff values are shown in Table 1.
Study Population
After institutional review board approval, a total of 190 remnant serum specimens were obtained from patients undergoing testing at Montefiore Medical Center (Bronx, New York) from December 2019 to May 2020. Aliquots of specimens were stored at a minimum of −20°C until testing in May 2020. Limited clinical information regarding COVID-19 symptoms, dates of symptom onset, and PCR testing status and dates was obtained from the electronic medical record. For health care workers (HCWs) undergoing antibody testing, review of pretesting surveys provided limited clinical and diagnostic information related to COVID-19, including PCR results, date of PCR testing, symptoms, and exposure and symptom onset date. All antibody testing and analysis for this study were performed after removal of clinical data, anonymization, and coding of specimens. Details about the remnant sample study population cohorts are shown in Table 2.
The aim was to evaluate all samples on each platform, but some samples had an insufficient quantity to complete testing across all platforms. Because of a limited number of LFA devices for testing, inpatients, cutoff cohort samples, and HCWs with a known PCR result were prioritized for testing by LFA. All 190 specimens were tested by the Einstein SARS-CoV-2 IgG ELISA and by the Abbott SARS-CoV-2 IgG assay. One hundred eighty-three specimens were tested by the Diazyme DX-LiteSARS-CoV-2 IgG assay, 165 specimens were tested by both the Babson G1 SARS-CoV-2 IgG and Babson G2 SARS-CoV-2 IgG assays, and 124 specimens were tested by the SD Biosensor IgG LFA. The summary of groups and assays tested is shown in Figure 1.
Sensitivity Analysis
Sensitivity was calculated for each assay based on results from inpatients and HCWs tested for SARS-CoV-2 IgG antibodies 7 days or more from PCR positivity with both the current test cutoff and a lower cutoff. The cutoff was not changed for the ELISA. For the LFA, in cases when the resulting average was less than 1, the results were counted as negative (current cutoff <1 = negative); however, some results were 0.5, when 2 independent observers called the result negative (0) and 2 independent observers called it very faintly positive (1). For the lower cutoff, 0.5 was now counted as positive. For the Abbott assay, the cutoff was decreased from the current cutoff of 1.40 to the lower cutoff of 0.60, and for Diazyme and both Babson assays, cutoff was decreased from the current cutoff of 1.00 to a lower cutoff of 0.50. The lower cutoff was based on the range of values below the cutoff that would improve sensitivity with little change in specificity.
Specificity Analysis
Specificity was determined for each assay by evaluating the results for a specificity cohort that comprised the pre–COVID-19 cohort and the common coronavirus cohort with both the current test cutoff and a lower cutoff.
Numeric Result Analysis
Although all tests are reported as qualitative, except for Babson G2, which is semiquantitative, the numeric result produced for each assay was evaluated by comparing the specificity cohort and the sensitivity cohort to evaluate the signal of the response. The numeric results of each assay using 3 groups—(1) specificity cohort, (2) HCWs in the sensitivity cohort who did not require hospital admission, and (3) inpatients in the sensitivity cohort who had a more severe disease course than the HCWs—were compared. The numeric results for each assay are index for the Abbott and Babson assays, AU/mL for Diazyme, absorbance for the ELISA, and line intensity graded as 0 to 6 for the LFA.
Discrepancy Analysis
The inpatient, HCWs, and cutoff cohorts were evaluated in more detail to identify discrepancies among the different assays and to identify gray zone results. Gray zone numeric results are those that fall between the current test cutoff and the lower cutoff.
Statistical Analysis
Statistical analyses were conducted using SPSS version 9.4 (SPSS, Chicago, Illinois), GraphPad Prism version 8.4.3, and MedCalc software. Diagnostic performance was assessed by calculating sensitivity and specificity in designated cohorts (https://www.medcalc.org/calc/diagnostic_test.php, accessed March 30, 2021). For the quantitative analysis of antibody results, Kruskal-Wallis tests were used to compare if 3 groups had similar medians. If the Kruskal-Wallis test gave a P value <.05, Wilcoxon rank sum tests were used for post hoc pairwise comparisons between any 2 groups (specificity versus HCW, specificity versus inpatient, and inpatient versus HCW), with Bonferroni correction for multiple comparisons.
RESULTS
Diagnostic Performance Study
The sensitivity and specificity of all 6 SARS-CoV-2 IgG assays using current and lower cutoffs as well as the manufacturers published sensitivities and specificities are shown in Table 3.
Sensitivity analysis for the 6 SARS-CoV-2 IgG assays with their current cutoff ranged from 95.1% (Einstein ELISA) to 80.8% (Babson G1). By changing to the lower cutoffs, the sensitivities increased from 92.7% to 97.6% for Abbott, 82.9% to 87.8% for SD Biosensor, 82.9% to 88.6% for Diazyme, 80.8% to 88.5% for Babson G1, and 92.3% to 96.2% for Babson G2.
In specificity studies, the specificities for the 6 SARS-CoV-2 IgG assays were all above 93%, ranging from 100% (SD Biosensor and Abbott) to 93.1% (ELISA). When evaluating the lower cutoffs, there was no change in specificity for Abbott, SD Biosensor, or Babson G2 assays. Diazyme and Babson G1 specificity decreased from 96.6% to 93.1%. Using the manufacturer's cutoff, both Diazyme and Einstein ELISA had a single pre–COVID-19 specimen with a positive result. Using the lower cutoffs, an additional specimen had a positive result by Diazyme and Babson G1.
Numeric Result Analysis
Both the HCW and inpatient cohorts had significantly higher median results than the specificity cohort in all the assays (Figure 2, A through F). When comparing the HCWs in the sensitivity cohort, those who did not require hospitalization, with the inpatients who had more severe disease and required hospitalization, the inpatient cohort had significantly higher median (interquartile range) quantitative results on the Abbott IgG (6.66 [6.32, 6.86] versus 4.95 [3.18, 5.66], P < .001) (Figure 2, A), SD Biosensor IgG (2.75 [2.0, 3.25] versus 1.25 [1.0, 2.25], P = .02) (Figure 2, B), Diazyme IgG (30.40 [18.86, 56.90] versus 11.05 [1.20, 22.14], P = .02) (Figure 2, C), Babson G2 IgG (7.8 [6.87, 8.14] versus 4.41 [2.29, 6.62], P = .005) (Figure 2, E), and Einstein IgG (3.49 [3.44, 3.56] versus 2.45 [2.13, 2.88], P < .001) (Figure 2, F); the lone exception was Babson G1 IgG (2.17 [2.08, 2.27] versus 1.91 [1.04, 2.06], P = .15) (Figure 2, D).
Discrepancy Analysis
The numeric results for all 6 assays for the HCW, inpatient, and cutoff cohorts were evaluated in more detail to identify discrepancies among the different assays. Specimens with numeric results between the current and lower cutoffs on the different assays are indicated as gray zone results. The discrepancies from PCR-positive HCWs, inpatients, and the cutoff group are in Figure 3, and all other results are in Figure 4 for HCWs and Figure 5 for inpatients. In the HCW screening cohort, 23 HCWs were PCR positive, 25 were PCR negative, and 72 were not tested. Of the inpatients who were admitted to the hospital, 25 were PCR positive and 8 were PCR negative. There were 8 samples identified in the cutoff cohort; all were from PCR-positive patients.
HCW PCR Positive
Of the 9 HCW PCR-positive specimens with discrepancies in antibody tests, 6 had at least 1 IgG antibody result in the gray zone (Figure 3). These HCWs may have had antibodies recognized by some tests but just below the cutoff on others. Of the 9 PCR-positive HCWs with at least 1 negative antibody result, 6 (67%) were asymptomatic and/or afebrile patients. In contrast to this majority, only 3 of the 14 PCR-positive HCWs who were positive on all assays (21%) were asymptomatic and/or afebrile (Figure 4). There was only 1 HCW with a positive PCR result who was negative on all 6 SARS-CoV-2 IgG antibody tests (case H5).
Inpatient PCR Positive
Of the 7 PCR-positive inpatients who had at least 1 negative antibody result, 4 (57%) were patients with PCR-positive results less than 7 days before antibody sample, which may have been too early for IgG antibody detection on some assays. In contrast, of the 18 PCR-positive inpatients who had positive results on all assays tested, only 2 (11%) were patients with PCR-positive results less than 7 days before antibody sample (Figure 5). Three of the PCR-positive inpatients had a gray zone result and 1 patient (P2) had 3 gray zone results of the 4 assays tested. One PCR-positive inpatient was negative on all 6 assays (P4).
Cutoff Cohort
In order to evaluate samples near the cutoff on the in-house Abbott IgG assay, samples just below the Abbott signal to cutoff 1.40 cutoff (1.05–1.28) were identified and remnant samples were analyzed on each platform (when available). Therefore, all the Abbott tests are in the gray zone because the results fell between the current manufacturer's cutoff and our lower cutoff. All but 2 PCR tests were performed more than 25 days before antibody testing, except cases 2 and 8, for which PCR testing was done 1 day before serum testing. Interestingly, 4 samples (C1, C2, C6, and C8) were positive on Einstein, Babson G2, and Babson G1, which target the spike protein, but negative on the SD Biosensor and Abbott assays, which target the nucleocapsid protein using the current cutoff. Three of these (C1, C6, and C8) were positive on Diazyme, which uses both the nucleocapsid and spike protein as antigens. Two samples (C1 and C8) had results in the gray zone by SD Biosensor because 2 independent observers called each specimen positive and 2 called them negative.
HCW PCR Negative
Of the 25 HCWs that were PCR negative, 88% (22) had concordant results and were negative for IgG antibodies on all assays tested. One PCR-negative HCW (H24) was positive on all assays tested and another HCW (H25) was positive on 4 of the 6 assays. Both HCWs had symptoms suggestive of COVID-19 (Figure 4).
HCW Not Tested by PCR
Of the 72 HCWs not tested by PCR, 62 (86%) had concordant antibody results among the 6 assays. Fifty specimens were negative on all assays that were tested (H71–H120), 12 were positive on all assays that were tested (H49–H60), and 10 had discordant results (H61–H70). Of the discrepant samples, 4 had gray zone results (H61, H63, H65, H66) (Figure 4).
Inpatient PCR Negative
Of the 8 PCR-negative inpatients, 5 had positive SARS-CoV-2 IgG antibodies on all assays tested (P26–P30). One PCR-negative patient had a semiquantitative value that was positive on 5 assays but in the gray zone by Diazyme (P31). There was a high clinical suspicion of COVID-19 in these cases. Two patients who were negative for SARS-CoV-2 IgG antibodies on all tested assays were eventually diagnosed with bacterial pneumonia (Figure 5).
DISCUSSION
Although SARS-CoV-2 antibody manufacturers have performed their own validation studies, the clinical utility and the diagnostic performance of these tests require ongoing assessment. Given the urgency to develop these tests and lack of standardized material, most SARS-CoV-2 IgG antibody tests were developed and approved as qualitative assays with a positive or negative result reported based on a cutoff value. This study evaluated the numeric values underlying these qualitative results and the cutoffs of 6 different SARS-CoV-2 IgG assays tested in the midst of the COVID-19 pandemic in New York City. Unlike some clinical performance studies that used a select cohort of hospitalized PCR patients with severe disease,13,14 this study used different patient cohorts, including HCWs with mild to moderate disease as well as hospitalized patients. In addition, although other sensitivity studies had patients followed with multiple specimens collected over time,14 our study reflected a real-world testing scenario in which patients are only tested once. Therefore, it is not surprising that our study found a lower sensitivity with the current cutoff compared with published sensitivities on the Abbott, Diazyme, and Babson G1 assay. The assays tested in this study performed well in identifying patients with past COVID-19 infection. As COVID-19 PCR testing was limited early in the pandemic, several HCWs with symptoms who were not PCR tested did have SARS-CoV-2 IgG antibodies. This study demonstrates the important role of antibody testing to evaluate disease prevalence.
The assays evaluated in this study performed well based on the sensitivity and specificity studies. The main shortcoming of these assays was their qualitative reporting. The discrepancy analysis demonstrated that patients who had gray zone results (ie, just below the cutoff) on some assays had positive results on other assays. For example, study patients who had results below the Abbott cutoff often had positive results on other assays. In addition, by reporting only qualitative results, these assays omit the additional diagnostic and clinical insight numeric results might provide. For example, values of 0.03 and 1.20 on the Abbott assay might have different clinical interpretations, even though both numeric values would be reported as negative because of the 1.40 assay cutoff. Therefore, the numeric result should be reported. This shortcoming is analogous to numeric reporting of cycle threshold (Ct) values in SARS-CoV-2 PCR testing. A qualitative positive PCR result may have a different clinical interpretation based on the numeric Ct value: a Ct value just below the threshold cutoff could be either very early or late infection stage, whereas a lower Ct value in the 20s indicates active, high viral load infection.15 As recommended by Bryan et al,14 depending on the timing and the clinical indication for testing, it may be useful to report an inconclusive range around the index cutoff when using the Abbott SARS-CoV-2 IgG assay.
Another interesting finding was the significant difference in the numeric values between mild to moderate COVID-19–positive HCWs and severe COVID-19 inpatients for 5 of the 6 assays. Importantly, these were intra-assay comparisons; results should not be compared between assays because of lack of standardization. Zhao et al16 demonstrated a more robust antibody response in critical compared with noncritical patients, similar to our findings. Additional studies are needed to fully understand the significance of these higher antibody numeric results and disease severity.
With the advent of SARS-CoV-2 vaccines, selection of the right SARS-CoV-2 serologic assay will become important to properly assess immune responses in different clinical scenarios. In SARS-CoV-2–infected patients, both spike and nucleocapsid antibodies develop as part of a natural immune response, but in vaccinated individuals, antibodies are produced against the spike protein only. Therefore, postvaccine monitoring of an immune response can be performed only using spike antigen target assays, as testing with a nucleocapsid antigen target assays will be negative. Meanwhile in some clinical situations, testing both spike and nucleocapsid protein may be useful to differentiate COVID-19 infection from a vaccine response, because only the natural infection should produce antibodies to the nucleocapsid protein.17 Furthermore, panel testing using both antigen targets should be considered for SARS-CoV-2 IgG serologic testing. Laboratories may consider using a panel with both spike and nucleocapsid target antigen assays to assess certain clinical situations. Ideally, in subsequent studies, semiquantitative and quantitative results from different antigen target assays will also become important in assessing immune response, including the need for vaccine boosters and assurances of protective immunity.
This study shows a potential diagnostic role for SARS-CoV-2 serologic testing. In this study, PCR-negative inpatients as well as 2 PCR-negative HCWs were positive for SARS-CoV-2 IgG, confirming the COVID-19 diagnosis. Several of these patients had multiple negative PCR tests. Although PCR testing is considered the gold standard diagnostic assay, viral detection is dependent on the sample collection technique. Recommended sampling of the upper respiratory tract (nasopharyngeal and/or oropharyngeal) may miss virus reservoirs in the lower respiratory tract. In addition, patients who present with symptoms later in the COVID-19 disease course may no longer have a detectable viral load using PCR tests. In this study, a positive serologic test did not necessarily identify COVID-19 patients with active viral infection, but it aided in their COVID-19 diagnosis and subsequent treatment. As Zhao et al16 concluded in their study, combining both molecular and antibody testing improved the sensitivity of diagnosing COVID-19, a disease that has variable presentations and clinical outcomes.
The limitations of this study include that only one sample from each patient was evaluated, and therefore longitudinal trends were not assessed. This study had a smaller study cohort than the manufacturer's cohorts used in initial validation studies. In addition, all specimens could not be evaluated on each assay because of remnant specimen volumes. Not all samples were tested on the LFAs because of limited test availability. In evaluating the LFA, this study used 4 independent observers and a grading system, but in a clinical setting one person would perform the testing and subjectively interpret the line result. The subjective interpretation of the LFA in this study, including 5 samples with 2 observers' interpretations as negative and 2 as positive (very faint band), is a limitation of this method. In comparing serology testing with PCR results, this study used PCR results from several different PCR platforms with variable sensitivities and specificities for the diagnosis of COVID-19. The PCR samples were collected at various times in the disease course or exposure and may have had an impact on how the cohorts were defined. Finally, Ct values were not available on all samples in this study. Using the limited number of available Ct values, no strong conclusion could be drawn on the impact of Ct values on serology testing.
This study evaluated the cutoff values and numeric results of several SARS-CoV-2 IgG assays. Although the study does not justify nor propose changing the current manufacturer's cutoff values, it does highlight the importance for manufacturers and laboratories to continually evaluate the numeric results and cutoffs as they continue to expand serologic testing in a real-world clinical setting. Furthermore, this study suggests that reporting the associated numeric values may help aid in the clinical interpretation, especially when these values are close to the assay's cutoffs. The ultimate goal for SARS-CoV-2 IgG antibody testing should be to standardize testing and develop quantitative assays in order to understand what level of antibodies is protective from COVID-19 infection. Additional longitudinal studies are needed to follow an individual's antibody response over time. This includes not only identifying patients with an immune response obtained by natural immunity, but also monitoring immune responses to COVID-19 vaccines. In these scenarios, SARS-CoV-2 serologic assays using both spike and nucleocapsid protein targets may need to be implemented by laboratories. The current reporting of qualitative results suggests SARS-CoV-2 serology results are just black and white, but in reality, there is potential usefulness in the gray zone of SARS-CoV-2 IgG assays.
The authors thank Eric Olson, BS, MSE, Christopher DiPasquale, BS, and David Stein, PhD, from Babson Diagnostics for performing testing on the Babson platforms. The authors thank James Pullman, MD, Kathleen Whitney, MD, Michelle Ewart, MD, Garrison Pease, MD, Ridin Balakrishnan, MD, Daniel Casa, MD, Linlin Yang, MD, and Kevin Kuan, MD, for their participation in the evaluation of the lateral flow assays and Carlos Castrodad-Rodriguez, MD, and Roger Fecher, MD, PhD, for their participation in data collection.
References
Author notes
Funding: R01-AI125462 to Lai; Einstein COVID Pilot Project grant to Lai and Chandran; Malonis was supported by National Institutes of Health (NIH) Medical Scientist Training Grant T32-GM007288 and Fellowship F30-AI150055; Georgiev was supported by NIH Training Program in Cellular and Molecular Biology and Genetics T32-GM007491.
Competing Interests
Chandran, Lai, Fox, Prystowsky, and Weiss are inventors on a patent application related to a COVID-19 diagnostic antibody test. Chandran is an inventor on a patent application related to a COVID-19 neutralization assay. Both applications are assigned to Albert Einstein College of Medicine. Chandran is a member of the scientific advisory board of Integrum Scientific, LLC. Chandran and Lai are members of the scientific advisory board of the Pandemic Security Initiative of Celdara Medical, LLC. The other authors (Forest, Orner, Goldstein, Wirchnianski, Bortz, Laudermilch, Florez, Malonis, Georgiev, Vergnolle, Lo, Campbell, Barnhill, Cadoff, Wolgast) have no relevant financial interest in the products or companies described in this article.