Abstract
Context.—In 2003 the Chemistry Resource Committee of the College of American Pathologists introduced a commutable specimen in the Neonatal Bilirubin Surveys. This specimen was intended to help evaluate all bilirubin methods.
Objective.—To evaluate the effect of commutable specimens on the performance of selected clinical analyzers in measuring neonatal bilirubin from 2003 through 2006.
Design.—A human serum–based specimen enriched with unconjugated bilirubin in human serum has been included since 2003 in the Neonatal Bilirubin Surveys. The bilirubin values of these specimens were determined by the reference method and used to evaluate results reported by various chemistry analyzers.
Results.—Coefficients of variation for College of American Pathologists All Data ranged from 4.9% to 6.2% for the Neonatal Bilirubin Survey. However, coefficients of variation for the 4 major instrument groups (Dimension, Olympus, Synchron, and Vitros), which report 65% of all results, varied from 2% to 3%. College of American Pathologists All Data mean bilirubin values were within 0.46 mg/dL (7.8 μmol/L) of the reference method mean in 2003; in subsequent years these differences became larger, peaking at 1.87 mg/dL (32 μmol/L) in 2005.
Conclusions.—The large systematic error of bilirubin measurements is due primarily to failure of instrument manufacturers to produce reliable bilirubin calibrators. Primary calibrators should consist of human serum enriched with unconjugated bilirubin. Bilirubin values must be assigned by the reference method, the performance and robustness of which are reported in this article. Secondary calibrators distributed to users must be traceable to primary calibrators.
In 2003, the Chemistry Resource Committee of the College of American Pathologists (CAP), acting on a report regarding inaccuracies in the measurement of bilirubin in clinical laboratories,1 introduced a new specimen in the Neonatal Bilirubin Surveys (NB-Surveys). This specimen, human serum enriched with unconjugated bilirubin, was expected to be commutable (free of matrix effects) because it closely resembles specimens drawn from healthy neonates. Properly calibrated clinical analyzers were expected to be accurate, providing total bilirubin (TBIL) values close to those obtained by the reference method. The noncommutable or conventional NB-Survey specimens consisted of unconjugated bilirubin and ditaurobilirubin in bovine serum.1
A guideline issued by the American Academy of Pediatrics recommended a blood exchange transfusion for infants older than 48 hours when bilirubin levels near 25 mg/dL (428 μmol/L) cannot be reduced by phototherapy.2 Because this is considered a critical bilirubin value, the CAP Chemistry Resource Committee decided to adjust the bilirubin level in the human serum–based specimen near 25 mg/dL (428 μmol/L).
Results from the 2003 human serum–based specimen (NB-02) were very encouraging. The grand mean value for CAP All Data was only 2.4% higher than the reference method mean and the coefficient of variation (CV) % of 5.7.
This communication is a progress report on the performance of laboratories participating in the NB-Surveys for the years 2004 through 2006. The practice of including the special, commutable NB-Survey specimen in the Chemistry Survey (C-Survey), as specimen C-96 or C-97, continued until the end of year 2006.
MATERIALS AND METHODS
The preparation and distribution to participating laboratories of the human serum–based specimens has been described previously.3 All NB-Survey and C-Survey specimens (from 2003 through 2006) were analyzed for TBIL by the reference method4 at the Reference Standards Laboratory of the Children's Hospital of Wisconsin, Milwaukee. All other bilirubin results shown in this article have been reported in the participant summary report. Instruments specifically considered in this report include the BR2 (Advanced Instruments Inc, Norwood, Mass), Advia (Siemens Medical Solutions Diagnostics, Tarrytown, NY), Aeroset and Architect (Abbott Diagnostics, Abbott Park, Ill), Cobas Integra, Roche Modular/Hitachi (Roche Diagnostic Systems, Branchburg, NJ), Dimension (Dade Behring, Newark, Del), Olympus (Olympus America Inc, Melville, NY), Synchron (Beckman Coulter Inc, Fullerton, Calif), Unistat (Leica Microsystems Inc, Buffalo, NY), and Vitros (Ortho-Clinical Diagnostics, Raritan, NJ).
According to the participant summary report, methods based on the coupling of bilirubin with a diazo compound are used in the Advia, Aeroset, Architect, Cobas Integra, Dimension, Roche Modular/Hitachi, Olympus, Synchron, and Vitros (TBIL slide). Methods based on the oxidation of bilirubin are used in the Advia (vanadate oxidation) and Synchron (bilirubin oxidase). Methods based on direct spectrophotometry are used in the Aeroset, Architect, Advanced BR2, Unistat, and Vitros neonatal bilirubin (BuBc) slide, which measures the sum of unconjugated bilirubin and conjugated bilirubins (bilirubin monoglucuronide and diglucuronide). According to Abbott, both the Aeroset and Architect methods use a diazo method (diazotized 2,4-dichloroaniline), but for the NB-Survey users prefer direct spectrophotometry.
RESULTS
Table 1 shows CAP All Data mean bilirubin values and CV% for the NB-Survey and C-Survey specimens consisting of unconjugated bilirubin in pooled human sera sent to participating laboratories during 2003–2006; also shown are values obtained by the reference method. For each year from 2003 through 2006, one commutable sample was prepared and sent out as a specimen in the NB-Survey and C-Survey; thus, in 2003, NB-02 was identical to C-96; in 2004, NB-06 was identical to C-97; in 2005, NB-05 was identical to C-96; and in 2006, NB-01, C-97, and NB-11 were identical. Shown in Figure 1 are differences between mean values of CAP All Data and those of the reference method for NB-Survey and C-Survey specimens 2003– 2006. The best overall laboratory performance was achieved in 2003. The CAP All Data mean value for NB-02 was identical to that for C-96 and the difference, 0.46 mg/dL (7.9 μmol/L) or 2.4%, between the CAP All Data mean and that of the reference method was the smallest in 4 years.3
In 2006, the commutable specimen was mailed to participants on 3 occasions, in April and November with the NB-Survey (specimens NB-01 and NB-11, respectively) and in June with the C-Survey (specimen C-97). The unusually large difference, 0.64 mg/dL (10.9 μmol/L), between the NB-01 and NB-11 CAP All Data bilirubin mean values (Figure 2) is due to large shifts in the mean values reported by the Cobas Integra, Dimension, Roche Modular/Hitachi, and Vitros, which constitute 60% of the participating laboratories. Discrepancies between NB-01 and C-97 (Figure 2) are primarily due to the large percentage of Vitros participants in the C-Survey.
Figure 3 shows differences between bilirubin mean values for selected field methods and the reference method. With 3 exceptions, Advanced BR2, Aeroset (2003–2004), and Architect (2006), mean bilirubin values for all other instrument groups are higher than those of the reference method. Note that there were no data from the Abbott instruments in 2005, and both Abbott instruments use the same diazo and direct spectrophotometry methods. The positive bias ranges from less than 1 mg/dL to 4 mg/dL (17.1 to 68.4 μmol/L). The Synchron group consistently shows a small positive bias. The negative bias of the BR2 has decreased over time and is smallest in 2006.
Reference methods are considered the gold standard against which field methods are compared and evaluated. It is important, therefore, to establish the long-term performance characteristics of the reference method for serum TBIL. From September 2003 to April 2007, the Reference Laboratory of the Children's Hospital of Wisconsin prepared 11 bilirubin standard solutions by adding National Institute of Standards and Technology Bilirubin, Standard Reference Material 916, to pooled human sera to a concentration of 20 mg/dL (342 μmol/L). Following preparation, these solutions were analyzed by the reference method and the molar absorptivity (ɛ) of the alkaline azopigment was compared with the values established in 2 round robin studies with participation from national and international laboratories.5,6 Collaborating laboratories in the 1988 round robin study were from the United States (3) and The Netherlands (2); in the 1998 round robin study, laboratories were from the United States (3), Germany (2), and The Netherlands (1). Data from the 2 round robins and from the Children's Hospital of Wisconsin Reference Laboratory, shown in Table 2, indicate that the established molar absorptivity of the alkaline azopigment, because of its reproducibility and narrow range, can and should be used to assess the accuracy of bilirubin calibrators consisting of unconjugated bilirubin in human sera.
The 20-mg/dL (342-μmol/L) stock solutions are dispensed in airtight plastic tubes and kept at −70°C until use. Each time the reference method is used, the stock solution is analyzed to verify that its ɛ value is within the accepted limits (mean value ± 3 SD). Next, the stock solution and 3 dilutions (2.5 mg/dL [42.7 μmol/L], 5.0 mg/ dL [85.5 μmol/L], and 10 mg/dL [171 μmol/L]) are analyzed to construct a calibration curve, the regression equation of which is used to calculate the results of the specimens analyzed. Shown in Table 3 are molar absorptivities, slopes, and intercepts of calibration curves obtained during a 3.5-year period, and precision data on controls prepared in our laboratory. The data in Tables 2 and 3 demonstrate the impressive stability and robustness of the reference method in US and European laboratories.
In 2003, the CAP All Data mean values for NB-02 and C-96 were identical and within 2.4% of the reference value (Figure 1). In our opinion this was due to the use of calibrators with accurately assigned bilirubin values. In subsequent years accuracy deteriorated. Results for 2006 are strange; there is good agreement for CAP All Data means between specimens C-97 and NB-11, but the mean value for NB-01 is greater than NB-11 by 0.64 mg/dL (10.9 μmol/L). The lower mean bilirubin value for NB-11 was due to the downward shift of 4 instrument group mean values (Cobas Integra, Dimension, Hitachi/Roche Modular, and Vitros, which represent 59% of the reported results) (Figure 2). The close agreement between the mean values for C-97 and NB-11, 22.44 mg/dL (383.7 μmol/L) and 22.32 mg/dL (381.7 μmol/L), indicates a small, if any, loss of bilirubin, in the 6-month period (June–November). The most logical explanation for the difference between NB-01 and NB-11 would be a change in calibrators or reassignment of bilirubin values in the calibrators used by these instruments. Consistent with this explanation, Dade Behring and Roche Diagnostics were among the diagnostic companies that lowered the assigned values to their bilirubin calibrators (Diane Stille, Dade Behring Inc, Newark, Del, written communication, May 7, 2006, and Melanie Swartzentuber, Roche Diagnostics Corporation, Indianapolis, Ind, written communication, August 18, 2006). In addition, the Roche line of instruments revealed dramatic improvement from 2005 to 2006 (Figure 3) and is consistent with an earlier change to lower assigned bilirubin calibrator values (Melanie Swartzentuber, Roche Diagnostics Corporation, written communication, October 5, 2005). The large difference between C-97 and NB-01 mean bilirubin values for the Vitros indicates a lapse in the calibration of the TBIL and neonatal bilirubin (Bu+Bc) methods; because NB-01 contained only unconjugated bilirubin, one would expect the TBIL and neonatal bilirubin values to be very close.
Precision
Coefficients of variation for CAP All Data increased in 2004 and 2005, but in 2006 returned to the level of 2003 (Table 1). Lack of improvement is attributed to the large differences in the mean bilirubin values reported by the various instrument groups (Figure 3). It should be noted that the CVs for the major instrument groups (Dimension, Olympus, Synchron, and Vitros), which report about 65% of all results, vary from 2% to 3%.
Accuracy
The goal of limiting the total error to ±10% of the reference method value3 can be realized only by improving accuracy. Since 2003 the gap between the reference values and the CAP All Data has progressively increased (Table 1); this applies to most instrument groups with notable exceptions of Synchron, and lately Advanced BR2 and Cobas Integra. To state simply: Calibrators used by most manufacturers are inaccurate, that is, the assigned values are much higher than the actual bilirubin concentrations. Because for most major instrument groups precision is quite good (CVs, 2%–3%), further improvement in accuracy would depend on the use of accurate calibrators.
Instruments from the same manufacturer using methods based on different principles exhibit the widest variation in results. Examples are as follows: (1) By direct spectrophotometry (neonatal bilirubin method), differences between the reference method results and those reported by the Vitros are 2.74 mg/dL (49.9 μmol/L) (NB-01) and 2.06 mg/dL (35.2 μmol/L) (NB-11), respectively, but only 0.36 mg/dL (6.1 μmol/L) by the TBIL diazo method; the reference method and the mean TBIL Vitros values for C-97 were 21.04 mg/dL (360 μmol/L) and 21.40 mg/dL (366 μmol/L), respectively. (2) Synchron values obtained by the diazo method differ by less than 1.0 mg/ dL from those of the reference method while those of the bilirubin oxidase method differ (except for 2003) from 1.97 mg/dL (33.7 μmol/L) to 2.96 mg/dL (50.6 μmol/L) (Figure 3). (3) According to Abbott package inserts, 2 methods are available for TBIL for both the Aeroset and Architect analyzers. One is a diazo method (diazotized 2,4-dichloroaniline) and the other a direct spectrophotometry. However, because these are “open platform” instruments, the users may have chosen a variety of other methods (listed in the participant summary report as “JG w/blank,” “Spectrophot w/o blank” or “w/blank,” “DZ Salt,” “oxidation,” “vanadate oxidation”), which yield very variable results. It is important to emphasize that calibrators containing ditaurobilirubin or made in a protein base other than human serum may not be suitable for calibrating all bilirubin methods,1 that is, assigned values to calibrators may have to be different for methods based on different principles. A calibrator suitable for all methods is one made with human serum enriched with unconjugated bilirubin. This is because it has the same composition as specimens obtained from healthy neonates.
COMMENT
Because of the importance of accuracy in the measurement of bilirubin in neonates, we proposed setting a goal for limiting the total error (inaccuracy plus imprecision) to ±10% of the reference method value at bilirubin concentrations greater than 10 mg/dL (171 μmol/L).3
In 2003, the first year of the introduction of the commutable survey specimen consisting of human serum enriched with unconjugated bilirubin, that goal appeared to be within reach for at least 80% of the reported results. That was presumably due to better bilirubin calibrators used in that year as judged by the small difference between the CAP All Data and reference method mean value. In 2006, for specimen NB-01, only the Advanced BR2 and the Synchron CX4/5CE met the upper limit of the ±10% goal; all instruments met the lower limit except the Architect. The ±20% error allowed by the Clinical Laboratory Improvement Amendments of 1988 implies that acceptable values for a specimen having bilirubin concentration of 25 mg/dL (428 μmol/L) would be from 20 mg/ dL (342 μmol/L) to 30 mg/dL (512 μmol/L); this very large window should be unacceptable to pediatricians and neonatologists. To improve bilirubin assays, manufacturers of clinical analyzers and reagent kits will need to improve the quality of bilirubin calibrators. A bilirubin reference method and reference material, as well as an established molar absorptivity for the alkaline azopigment, can be used to assess the accuracy of calibrators made with unconjugated bilirubin in human serum. These calibrators may then be used to assign values to secondary calibrators for field methods.
References
The authors have no relevant financial interest in the products or companies described in this article.
Author notes
Reprints: Stanley F. Lo, PhD, Reference Standards Laboratory, Children's Hospital of Wisconsin, 9000 W Wisconsin Ave, Wauwatosa, WI 53226 ([email protected])