Serologic tests on automated immunology analyzers are increasingly used to monitor acquired immunity against SARS-CoV-2. The heterogeneity of assays raises concerns about their diagnostic performance and comparability.
To test sera from formerly infected individuals for SARS-CoV-2 antibodies by using 6 automated serology assays and a pseudoneutralization test (PNT).
Six SARS-CoV-2 serology assays were used to assess 954 samples collected during a 12-month period from 315 COVID-19 convalescents. The tests determined either antibodies against the viral nucleocapsid (anti-NC) or spike protein (anti-S). Two assays did not distinguish between antibody classes, whereas the others selectively measured immunoglobulin G (IgG) antibodies. PNT was used to detect the presence of neutralizing antibodies.
Comparison of qualitative results showed only slight to moderate concordance between the assays (Cohen κ < 0.57). Significant correlations (P < .001) were observed between the antibody titers from all quantitative assays. However, titer changes were not detected equally. A total anti-S assay measured an increase in 128 of 172 cases (74%) of a suitable subset, whereas all IgG anti-S tests reported decreases in at least 118 (69%). Regarding the PNT results, diagnostic sensitivities of 89% or greater were achieved with positive predictive values of at least 93%. In contrast, specificity changed substantially over time, varying from 20% to 100%.
Comparability of serologic SARS-CoV-2 antibody tests is rather poor. Owing to different diagnostic specificities, the tested assays were not equally capable of capturing changes in antibody titers. However, with thoroughly validated cutoffs, IgG-selective anti-S assays are a reliable surrogate test for SARS-CoV-2 neutralizing antibodies in former COVID-19 patients.
The COVID-19 pandemic has motivated diagnostic manufacturers to develop tests that are useful to detect the serologic response to SARS-CoV-2 infection. The assays were developed in a very short time, limiting the possibility for thorough validation before market launch. Antibody testing has gained substantial interest since more than 300 million individuals have been infected globally. Clear recommendations for a rational use of SARS-CoV-2 antibody assays are lacking, although they may be helpful to evaluate and monitor the immune response after infection.
Existing SARS-CoV-2 antibody assays are rather heterogeneous, which causes substantial uncertainty in the interpretation of results. In general, these assays detect either antibodies against the nucleocapsid (anti-NC) or the spike protein (anti-S) in a qualitative or quantitative manner. The spike protein mediates cellular infection through binding of its receptor-binding domain (RBD) to angiotensin-converting enzyme 2 (ACE2), a membrane-bound receptor on the host cell.1 In contrast, the NC protein packages viral RNA to form ribonucleoprotein structures and facilitates infection by manipulating the host cell's innate immune processes.2 Available serologic SARS-CoV-2 antibody assays detect either anti-NC or anti-S. In addition, some assays capture all classes of immunoglobulins (referred to as “total”), whereas others measure only immunoglobulin (Ig) G. This analytical heterogeneity and the lack of established standard reference materials raise concerns about their diagnostic performance for COVID-19 patients and for convalescent and possibly vaccinated individuals.
The human immune response to viral infections results in the production of virus-specific antibodies. Yet these antibodies vary in their ability to bind and neutralize the virus. During the course of an infection, antibody specificity improves through a natural selection process, which results in a more efficient defense against a pathogen. The aforementioned heterogeneity of immunoglobulin classes and targets implies that serologic assays provide only a rough estimate of the protection offered by the antibodies detected in a sample. Antibodies capable of protecting host cells from infection with a specific virus are called neutralizing antibodies and can be determined with a virus neutralization test (VNT).3 This method is labor- and cost-intensive, not standardized, and in the case of SARS-CoV-2, requires a biosafety level-3 laboratory. Therefore, the VNT is not suitable for clinical practice. Pseudoneutralization tests (PNTs) are considered to be an acceptable alternative when a VNT is not available.4 These assays indirectly estimate the amount of neutralizing antibodies by measuring the inhibition of RBD-ACE2 binding in vitro.
The monitoring of SARS-CoV-2 immunity with serologic tests requires a solid characterization of their diagnostic performance and knowledge of how these assays compare to each other. Multiple blood samples collected from COVID-19 convalescents in the months after the infection are well suited for this purpose. The present study aimed to perform a head-to-head comparison of 6 widely used, fully automated SARS-CoV-2 antibody assays in a large and well-characterized longitudinal cohort of individuals who had recovered from COVID-19. Furthermore, we compared a subset of the results to those of a surrogate test (ie, PNT).
Study Population and Samples
Serum samples from 365 individuals who had recovered from COVID-19 were provided by the Biobank Graz at the Medical University of Graz (Austria). This cohort's characteristics were published elsewhere.5 Participants were observed for up to 12 months after inclusion. The date of the first positive SARS-CoV-2 polymerase chain reaction (PCR) test result was recorded as an estimate for the presumed time of infection. A total of 1109 serum samples were collected from June 2020 to March 2021. Owing to prior vaccination or incomplete data sets, 155 samples and 50 participants were excluded. To estimate whether the remaining 315 eligible participants were sufficient for the purpose of this study, the sample size required for the qualitative comparison of 6 assays was calculated. Given that the assays agree differently with the respective reference method (Table 1), we calculated the sample size (power = 0.8, α = 5%) by using the performance specifications of the worst (86% agreement) and the best (99% agreement) performing assay. Using 1-way analysis of variance for paired samples in 6 groups led to a recommended sample size of 93 cases. Considering that for all statistical analyses the eligible cases exceeded this number, we deemed our study sufficiently powered. The study protocol was approved by the Ethics Committee of the Medical University of Graz (identifier: 32-423 ex 19/20), and every participant gave written informed consent before inclusion into the study.
Venous blood samples were drawn into VACCUETTE CAT Serum Sep Clot Activator tubes (Greiner Bio-One, Kremsmünster, Austria) at all visits. Samples were allowed to clot for 30 minutes at room temperature before undergoing 10 minutes of centrifugation at 2300g. Serum was aliquoted and stored at −80°C until analysis.
Automated Anti–SARS-CoV-2 Antibody Assays
In all samples, anti–SARS-CoV-2 antibodies were determined with the following quantitative tests: Elecsys Anti-SARS-CoV-2 S (Roche Diagnostics GmbH, Mannheim, Germany), SARS-CoV-2 IgG II Quant (Abbott Laboratories, Sligo, Ireland), LIAISON SARS-CoV-2 S1/S2 IgG (DiaSorin SpA, Saluggia, Italy), and LIAISON SARS-CoV-2 TrimericS IgG (DiaSorin). The same samples were also analyzed with 2 qualitative assays from Roche (Elecsys Anti-SARS-CoV-2) and Abbott (SARS-CoV-2 IgG), both detecting anti-NC. All quantitative assays determine the concentration of anti-S (note: Although not fully adequate, we also use the term titer assays in this article to differentiate the quantitative from the qualitative tests). Table 1 summarizes the characteristics and performance specifications of all assays as provided by the manufacturers in the respective package inserts.6–11 All measurements were performed according to manufacturer's instructions.
Surrogate Antibody Testing
To evaluate the diagnostic performance of the serologic anti–SARS-CoV-2 antibody assays, the presence of virus neutralizing antibodies was determined with a PNT (ACE2-RBD Neutralization Assay, DIA.PRO, Milan, Italy) in 416 samples from 104 participants who presented at 4 study visits (Table 2). This test uses a direct competitive enzyme-linked immunosorbent assay (ELISA) format to assess the ACE2-RBD neutralization capacity of antibodies in a sample. Briefly, the samples are transferred into the wells of an ELISA plate precoated with recombinant RBD proteins. Anti-S antibodies in the sample bind to the RBD. After washing, ACE2 protein is added and binds to the remaining RBD binding sites. Then, a streptavidin-horseradish peroxidase conjugate is used to induce a color reaction that allows photometric detection of ACE2-RBD binding at 450 and 620 nm. If the optical density of a sample exceeds the negative control, neutralizing anti-S are present.
The distribution of positive and negative results of all automated assays was assessed in relation to the time elapsed since infection. The whole study period (12 months) was divided into 9 time intervals with each containing at least 100 samples and only 1 sample per individual (Tables 3 and 4). This splitting aimed to address interindividual differences in the course of the immune response and the variable interval of approximately 3 months between the first positive SARS-CoV-2 PCR test result and the first study visit (median, 66 days; interquartile range, 40–114 days).
The qualitative agreement between assays was evaluated for all assay combinations and time intervals. Quantitative test results were considered positive when the titer exceeded the assay-specific cutoff (Table 1). For the LIAISON SARS-CoV-2 S1/S2 IgG (DiaSorin), which defines a “borderline” titer interval (13–15 AU/mL), we labeled all results lower than 15 AU/mL as negatives. For each set of paired results, we calculated the Cohen κ and performed the McNemar test. In addition, quantitative tests were also compared by Spearman rank correlation and the cumulative sum (CUSUM) test for linearity.
In the next step, we investigated whether the 4 quantitative anti-S assays are equally capable of detecting titer changes over time. For this purpose, we compared the first and last results from each participant. Participants were excluded from this analysis if the first visit sample was collected earlier than 30 days after the presumed time of infection and the last visit sample was obtained earlier than 60 days after the corresponding first visit sample. Applying these criteria, 172 cases with a median interval of 140 days between the first and last visit were eligible for analysis. The difference of each pair of results was classified as “decrease,” “increase,” and “no change.” This classification took intra-assay and interassay imprecision into account.
The diagnostic performance of the 4 quantitative anti-S assays was further explored by measuring the virus neutralization capacity using a PNT in samples from 104 participants who presented at all 4 study visits. The results from the comparative assays were only considered “true positive” when they also tested positive in the PNT. First, we explored if serologic antibody titers are related to the probability of having virus neutralizing antibodies. For this purpose, the distribution of PNT positive and negative results was analyzed in assay-specific quintiles of anti–SARS-CoV-2 antibody titers. In addition, the following parameters were calculated for each assay, using the cutoff values provided by the manufacturers: diagnostic sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In the case of the Elecsys Anti-SARS-CoV-2 S, Roche recommends 2 thresholds. The original cutoff of 0.8 U/mL has the highest sensitivity for detecting anti-S antibodies, whereas titers higher than 15 U/mL are more specific for the presence of neutralizing antibodies. Accordingly, the latter cutoff was used in all comparisons with the PNT results. Finally, we performed receiver operating characteristic (ROC) curve analyses for all assays and study visits. From the Youden index (YI), we determined the most appropriate cutoff values for each of the 4 study visits.
Excel 2016 (Microsoft Corp), Analyse-it (Analyse-it Software, Ltd), and SPSS (IBM, version 27) were used for statistical calculations and the preparation of figures and tables.
In total, 954 blood samples from 315 individuals were analyzed with 6 automated anti–SARS-CoV-2 antibody assays. Figure 1, A through F, illustrates the distribution of negative and positive results for all assays across the 9 time intervals. Both anti-NC assays (Figure 1, A and B) showed good agreement during the first 2 months after infection. Thereafter, the rate of IgG anti-NC positives decreased constantly, down to only 40 of 105 (38%) at 8 months after infection, whereas the corresponding results delivered by the total anti-NC assay deemed 102 of 105 (97%) to be positive. The anti-S assays from Roche and Abbott (Figure 1, C and D) showed comparable positivity rates with at least 94% positive results at all time points. In contrast, both anti-S tests from DiaSorin (Figure 1, E and F) reported substantially lower rates of positive results.
Comparing the distribution of positive and negative results revealed great discrepancies between various assay combinations. Based on the McNemar test for paired nominal data, only the 2 Roche assays and the anti–IgG-S test from Abbott had no systematic differences (P ≥ .05) between their results. This agreement was consistent throughout the observation period. However, Cohen κ showed at best moderate (0.41 ≤ κ ≤ 0.60) agreement between individual combinations of assays (Supplemental Table 1 at https://meridian.allenpress.com/aplm in the May 2022 table of contents). Cohen κ was highest when comparing the results of the 2 DiaSorin assays (κ = 0.56), whereas the worst agreement was seen between Abbott's anti-NC and Roche's anti-S test (κ = 0.02).
When comparing the quantitative results with each other, all assay combinations exhibited correlations with a significance of P < .001 (Supplemental Table 2). The highest degree of correlation was seen between the LIASION SARS-CoV-2 S1/S2 IgG and the Roche total anti-S test with ρ ≥ 0.90 at all time intervals. In contrast, the CUSUM test showed a lack of linearity (P < .05) at several time points between the 2 DiaSorin assays as well as between both DiaSorin assays and the assays from Roche and Abbott.
The 4 quantitative tests in this study showed substantial heterogeneity in detecting changes of antibody titers during the follow-up period (Figure 2, A through D). Among 172 eligible cases, the total anti-S assay from Roche measured an increase in 129 (74%; Figure 2, A), whereas the IgG anti-S tests from Abbott and DiaSorin indicated decreases in 118 to 147 (69%–85%; Figure 2, B and D).
The determination of neutralizing antibodies by PNT in samples from 104 participants showed 5 negative results (4.8%) at visit 1. This portion increased to 11 (10.6%) at visit 4. Most PNT negative samples had anti–SARS-CoV-2 antibody titers in the lowest quintile, regardless of assay type (Figure 3, A through D).
Based on the PNT results and assay-specific cutoffs provided by the respective manufacturer, diagnostic sensitivities ranged from 89% to 100% with PPVs of 93% or greater at all time points. In contrast, specificity changed substantially over time, varying from 20% to 100% (Table 2). Between the first and fourth study visit, the total anti-S assay from Roche lost specificity from initially 100% down to 36% at study end, whereas specificity of the IgG anti-S assays increased during the same interval. However, even the IgG anti-S assays had vastly different NPVs. Abbott's SARS-CoV-2 IgG II Quant showed the best overall performance with PPV and NPV of 94% or greater at all visits.
Finally, ROC analyses showed that the appropriate cutoffs for distinguishing individuals with PNT-positive and -negative results vary over time (Table 2). Amongst the 104 individuals included in this analysis, more reliable cutoffs for the identification of virus neutralizing antibodies were as follows: 35 U/mL for the Elecsys Anti-SARS-CoV-2 S test, 98 AU/mL for the Abbott SARS-CoV-2 IgG II Quant test, 23 AU/mL for the LIASION SARS-CoV-2 S1/S2 IgG test, and 25 AU/mL for the LIAISON SARS-CoV-2 TrimericS IgG test.
This comparison of 6 commercial SARS-CoV-2 antibody tests showed significant disagreement; not all assays detected the natural loss of anti–SARS-CoV-2 antibodies over time. Based on the results of the PNT, the IgG-selective anti-S assays were more specific for the detection of neutralizing antibodies than the total anti-S test at later study visits. Moreover, the cutoffs recommended by the manufacturers overestimated the neutralizing capacity in the samples of our cohort.
The qualitative anti-NC assays from Roche and Abbott delivered largely discordant results. Their good agreement in the early postinfectious phase deteriorated continuously until the end of the study. A decrease in positive results with the selective IgG anti-NC assay, but not with the total anti-NC assay, has also been reported in 2 previous studies.12,13 However, these studies analyzed substantially smaller cohorts and covered only the first 9 months after infection. The apparent discrepancy between total and IgG-selective anti-NC tests remains unexplained but may be due to differences in assay architecture. While both assays use recombinant NC antigen to bind all anti-NC antibodies, Abbott uses an additional antibody against human IgG, capturing only complexes formed by IgG. The lack of such a second analytical target in Roche's total anti-NC assay leaves room for the detection of non-IgG antigen complexes. Furthermore, minor modifications of the NC antigens used in the assays may result in different binding characteristics. Simply concluding that the Roche assay produces an excessive number of false positives may be inadequate, as the measurement of 10 453 prepandemic samples resulted in only 21 positives (as per package insert of the assay). Considering that the present cohort included only individuals who actually had recovered from COVID-19 implies that most positive total anti-NC results relate somehow to the previous SARS-CoV-2 infection.
Anti-S antibody assays had a better concordance of positive rates than anti-NC antibody assays, especially at later postinfectious stages. For example, the anti-S assays from Roche and Abbott revealed positive results ranging from 94% to 100% at all time points. However, the isolated consideration of positivity rates may be misleading. The direct comparison of both assays with Cohen κ indicated substantial to moderate agreement only in the first 2 months. After this period, the total anti-S assay lacked agreement not only with Abbott's IgG anti-S assay but also with all other assays. These results suggest that the anti-S assay from Roche, which does not distinguish between immunoglobulin classes, is overly sensitive for the detection of neutralizing antibodies. It has been speculated that this excessive sensitivity may be the result of circulating IgA anti-S antibodies, which persist in the circulation for more than 2 months. However, the clinical and analytical relevance of IgA anti-S antibodies in human plasma samples is insufficiently understood.14 So far, there is no consistent evidence that IgA anti-S titers continue to increase after the first 2 months post infection.15,16
Better interassay comparability has been observed amongst the selective IgG anti-S assays. However, Cohen κ showed at best moderate agreement between the 2 assays from DiaSorin. In contrast to our results, a previous study by Jung et al17 reported substantial agreement (κ = 0.65) between DiaSorin's LIAISON SARS-CoV-2 TrimericS IgG test and Abbott's anti-S test in 173 samples from COVID-19 patients. The comparability with our study is limited by the timing of blood collections and cohort size. Our samples covered a broad time frame up to 1 year, whereas Jung et al17 collected blood in many patients within the first 2 weeks after infection. When limiting our analyses to samples collected between 13 to 46 days after infection, the same assays agreed with κ = 0.55. Another recent study concluded that qualitative agreement between the anti-S tests by Roche, Abbott, and DiaSorin is substantial, with κ values between 0.60 and 0.80.18 However, this study analyzed samples from healthy individuals who underwent vaccination where the resulting antibodies are probably much more homogeneous than after a natural SARS-CoV-2 infection. The discrepant results between vaccinated individuals and COVID-19 patients support the concept that the rather heterogeneous antibody spectrum in samples from COVID-19 patients reduces assay specificity.
Further evidence for the limited comparability of serologic SARS-CoV-2 assays comes from the longitudinal analysis of anti-S results. The kinetics of anti-S titers differ profoundly when measured with different assays. Measurement of serial plasma samples collected at least 2 months apart showed increasing titers in 75% of participants with the Roche assay, but only in 12% to 27% with the other assays. The selective measurement of IgG anti-S seems to reduce variability substantially, although differences remain. Even the 2 DiaSorin assays showed significantly different kinetics in the same set of samples. Such discordant kinetics provide further evidence for substantially different specificities amongst individual assays. Our results are in line with a previous study by Schallier et al12 in which the total anti-S test from Roche and the IgG anti-S test from Abbott showed discordant kinetics during a 7-month period. In principle, the variable assay specificities may be due to differences in the antigens used for antibody capture and the antibody specificity. The reagents used by Roche and Abbott are designed for the detection of antibodies against the RBD in the S1-unit of the spike protein. In contrast, both DiaSorin tests include the S2-unit in the antigen used for antibody capture. Recently, the World Health Organization and the National Institute for Biological Standards and Control (NIBSC) have developed a standard reference material (NIBSC 20/136), which should help manufacturers in harmonizing their anti–SARS-CoV-2 antibody assays.19 However, evidence that shows a substantial improvement in interassay comparability is still lacking and has to be addressed with the next generation of assays.18,20
The different capture antigens used in the tested anti-S assays are likely to result in different specificities. Assays that include the S2-unit of the spike protein in their capture antigen may be especially susceptible to “back-boost” effects, which can occur when COVID-19 convalescent individuals are exposed to other ordinary coronaviruses. Although the resulting antibodies are detected by the assay, they lack neutralizing capacity against SARS-CoV-2.21 However, comparable titer trends obtained with Abbott's IgG anti-S assay and the 2 DiaSorin tests argue against a relevant back-boost effect in the present cohort. Moreover, during follow-up the specificity of both DiaSorin assays to detect neutralizing antibodies increased. This observation further reduces the probability of back-boosting amongst the participants of this study.
The variable performance of anti–SARS-CoV-2 antibody assays raises the question of whether or not their results correlate with the presence of virus neutralizing antibodies. Our results demonstrate that medium and high antibody titers are a reliable surrogate of virus neutralizing antibodies, regardless of the assay used. In contrast, samples with low anti–SARS-CoV-2 antibody titers lack virus neutralizing capacity in up to 35% of cases. This observation is in accordance with previous studies showing that the levels of virus neutralizing antibodies correlate with the anti–SARS-CoV-2 antibody titers of most serologic assays.17,22–24
The distinction between patients with and without adequate titers of virus neutralizing antibodies requires proper cutoffs. Owing to the lack of standardization and harmonization, these cutoffs have to be assay specific. In our cohort, higher cutoffs would have resulted in a more reliable identification of individuals with neutralizing antibodies. These results emphasize the impact of study design on the results of assay comparison studies. Over time, the specificity of IgG-selective assays increases, whereas the total anti-S assay loses specificity. This observation may, at least partly, be explained by variable virus neutralizing capacities of different immunoglobulin classes. For example, anti-S IgG3 antibodies seem to have a strong virus neutralizing effect, whereas anti-S IgAs do not.25,26 Based on our data, Abbott's anti-S test showed the best diagnostic performance. In another study of COVID-19 patients, the same assay had a sensitivity of 96% and a specificity of 99% when using the manufacturer's cutoff.17 Lowering the cutoff from 50 to 42 AU/mL slightly improved sensitivity (97%) without affecting specificity. In contrast, the same study showed that DiaSorin's LIAISON SARS-CoV-2 TrimericS IgG test and Roche's total anti-S assay performed better with significantly lower cutoffs. As this study was not longitudinal and blood samples were collected rather early, comparability with the present results is limited.
The present study has strengths and weaknesses that should be considered. The Biobank Graz COVID-19 Convalescent Cohort is one of the largest longitudinal sample collections of former COVID-19 patients with a rather long follow-up. A limitation of our study is the variable time between infection and the first study visit. However, this issue has effectively been addressed by the categorization into nonoverlapping intervals that contain comparable numbers of samples. In addition, a solid number of samples at early and later postinfectious stages lends substantial robustness to our results. Another strength of this study is the parallel evaluation of 6 widely used, fully automated anti–SARS-CoV-2 antibody assays. Moreover, the diagnostic performance of each assay was verified by a PNT as surrogate of virus neutralizing antibodies. It should be stated that we did not verify the PNT's diagnostic sensitivity with a VNT. However, even if the PNT used in this study is not fully equivalent to the VNT, it still provides valuable and solid information that aids in the interpretation of our results. Previous investigations showed that RBD-ACE2–binding inhibition assays tend to misclassify samples with low titers of neutralizing antibodies as negatives. However, high titers of neutralizing antibodies are reliably recognized by PNTs.27,28 In a pandemic environment with a high pretest probability, this results in a high PPV. Furthermore, the increasing proportion of PNT-negative results several months after infection clearly is in line with longitudinal results obtained with VNTs.29,30 The potential exposure to virus variants that feature structural changes of the spike protein may represent another limitation, as all current anti-S assays are aligned to the RBD or the S-units of the ancestral virus. These assays may react differently to antibodies induced by later virus variants, such as delta or omicron. As most participants got infected in 2020, when the ancestral variant was still dominant, it is unlikely that our results are influenced by variant effects. It is important to note that the present results exclusively apply to individuals who have recovered from COVID-19. In the absence of VNT results, the effective level of protection remains uncertain. Therefore, our results should not be used to predict any future outcome. As the humoral immune response upon natural infection differs from that after vaccination with mRNA vaccines, the serologic assays tested here may behave differently in vaccinated individuals. Especially the capture of anti-S should be verified in cohorts of vaccinated individuals. Therefore, serologic tests represent a convenient tool to assess vaccine immunogenicity in clinical trials, but they should not be mistaken as a measure of vaccine efficacy.
Serologic immunoassays for the detection and quantification of anti–SARS-CoV-2 antibodies are not interchangeable. In particular, the assays tested in this study showed substantially different diagnostic performance and were not equally capable of capturing changes in antibody titers of persons who have recovered from COVID-19. Assays that do not differentiate between IgG and other immunoglobulins show analytical characteristics that require further investigation. However, when using thoroughly validated cutoffs, IgG-selective anti-spike antibody assays reliably identify individuals with high titers of neutralizing antibodies against SARS-CoV-2.
Supplemental digital content is available for this article at https://meridian.allenpress.com/aplm in the May 2022 table of contents.
The manufacturers of the serology assays (Abbott Laboratories, DiaSorin SpA, and Roche Diagnostics) provided reagents, calibrators, and controls for this study. Pseudoneutralization tests were financed with funding resources by the city of Graz, Austria. The authors have no other relevant financial interest in the products or companies described in this article.