Disease guidelines specify universal alanine aminotransferase (ALT) thresholds for clinical decision-making, yet the effect of variability among ALT analyzers remains unclear.
To compare ALT results from different analyzers from 2012–2017.
Veterans Health Administration (VHA) laboratories perform external ALT proficiency testing using standardized College of American Pathologists (CAP) samples in analyzers by 5 manufacturers. In this operational analysis, we evaluated 22 950 ALT values from 80 independent CAP samples tested at 223 laboratories. Using mixed effects modeling, we estimated the association between analyzer manufacturer and CAP outcome, adjusting for manufacturer, facility, and calendar year. We performed subgroup analyses on CAP samples with overall means near clinical guideline-specified thresholds, including less than 50 U/L (n = 10) and less than 35 U/L (n = 5).
The VHA used Abbott Laboratories (n = 3175; 14%), Beckman Coulter Diagnostics (n = 8723; 38%), Roche Diagnostics (n = 2595; 11%), Siemens Healthineers USA (n = 5713; 25%), and Vitros/Ortho Clinical Diagnostics (n = 2744; 12%) analyzers. The CAP samples (n = 80 samples, n = 22 950 tests) covered a wide range of mean ALT values (21–268 U/L). The average difference in mean ALT value per sample between the highest-reading and lowest-reading manufacturers was 15.4 U/L (SD = 1.8) for the 10 samples with mean ALT less than 50 U/L, and it was 10.4 U/L (SD = 3.6) overall (n = 80). In linear mixed effects modeling, we found statistically significant differences in ALT values between the different manufacturers in each year.
We found statistically and clinically meaningful differences between analyzers across the ALT spectrum in each year, including at ALT levels lower than 50 U/L and lower than 35 U/L. Universal ALT thresholds should be avoided as a trigger for clinical action until differences between analyzers can be resolved.
Alanine aminotransferase (ALT) is a marker of hepatocyte injury widely used to screen for and monitor liver disease.1,2 Alanine aminotransferase features prominently in clinical algorithms for detecting and managing liver disease and for triggering subspecialty referral. For example, US and European guidelines recommend evaluation for suspected nonalcoholic fatty liver disease among individuals with elevated liver enzymes and fatty liver,3,4 which is the most common cause of incidental liver enzyme elevations in patients presenting to primary care.5,6 The US guidelines recommend using ALT elevations to determine the need for antiviral therapy in patients with chronic hepatitis B virus infection.7 Expressed as a multiple (or “fold”) of upper limit of normal (ULN), ALT elevations are commonly used to identify liver injury in clinical trials as well as in drug-induced liver injury.
Known variability exists in laboratory determination of ALT.1 The American College of Gastroenterology (ACG) specifies a range of “normal” ALT values based on published results of healthy individuals in blood donor populations, healthy liver transplant donors, and the National Health and Nutrition Examination Survey. Recent ACG guidelines identify the ULN for ALT as 29 to 33 U/L (males) and 19 to 25 U/L (females) (to convert to microkatal per liter, multiply by 0.0167), irrespective of ALT assay or analyzer.2 Similarly, the American Association for the Study of Liver Diseases recommends that chronic hepatitis B patients with ALT exceeding 2 times ULN (defined as 35 U/L [males] and 25 U/L [females]) be considered for hepatitis B virus antiviral treatment.7
Variability in ALT measurement by analyzer manufacturer and over time has important implications if universal “normal” cutoffs are to be applied. Alanine aminotrasferase variability across analyzer platforms in the United States has received only limited study.8 Concerningly, a recent large national Canadian laboratory study reported a 25% coefficient of variation in ALT results across laboratories (n = 40) and a 22% variation in the ALT reference range across analyzer manufacturers using a single standardized ALT sample.9
As the largest integrated US health care network, the Veterans Health Administration (VHA) presents a unique opportunity to examine the performance of ALT analyzers on a national scale. The VHA has more than 9 million enrollees, 170 medical centers, and 1061 outpatient sites of care.10 All VHA laboratories operate independently and are Clinical Laboratory Improvement Amendments (CLIA) certified.11 All VHA laboratories perform externally validated laboratory proficiency testing through the College of American Pathologists (CAP) 3 times per year using standardized ALT samples.12 Using CAP test results from 2012–2017, we sought to: (1) compare ALT results produced by 5 different analyzer manufacturers; and (2) examine whether trends in analyzer performance remained stable over time.
MATERIALS AND METHODS
The VHA Office of Pathology and Laboratory Medicine oversees all VHA laboratories and requires them to meet the rigorous accreditation standards of the CAP, in addition to the US federal standards of the CLIA for acceptable analytic performance.11,12 As part of its accreditation procedures, the CAP independently prepares blinded aliquots of 5 ALT samples and supplies them to each participating VHA and non-VHA laboratory 3 times per year, for a total of 15 CAP samples per year. CAP testing was performed on every ALT analyzer operating within the VHA at the time of testing. Testing was carried out in compliance with federal regulations and adhered to the analyzer manufacturers' instructions.
The VHA Office of Pathology and Laboratory Medicine provided data on ALT values and analyzer manufacturers for CAP testing performed from 2012–2017 within the VHA system. It also provided characteristics of the laboratories for 2017, including VHA-defined complexity of the parent facility (a surrogate for tertiary status), and the number of full-time equivalent clinical pathologists on staff. Volume of ALT tests performed per facility for fiscal year 2016 was obtained from the VHA Corporate Data Warehouse, a comprehensive repository of data from the VHA's universal electronic medical record system. National descriptive statistics for each CAP sample (including mean, standard deviation, coefficient of variation, and range) for each manufacturer are calculated and published by the CAP using ALT results received from all accredited laboratories across the United States.
This operational evaluation project was sponsored by the HIV, Hepatitis, and Related Conditions Program Office/Office of Specialty Care Services. The activities undertaken in the conduct of this project were in support of VHA operational programs and did not constitute research, in whole or in part, in compliance with VHA Handbook 1058.05. Therefore, Institutional Review Board approval was neither required nor sought. All procedures conformed to the ethical guidelines of the 1975 Declaration of Helsinki.
The Office of Pathology and Laboratory Medicine received a total of 22 950 valid CAP test results from 223 laboratories within 140 parent facilities during the 2012–2017 period. During this observation period, 80 unique CAP samples were tested in analyzers from 5 manufacturers. Although an ALT analyzer by the manufacturer Biolis was used briefly at 1 site, all 20 results from this machine were rejected for technical issues, and these data are not presented.
We tabulated counts of all 80 tested CAP samples performed across the VHA, overall and by manufacturer. We computed descriptive statistics to characterize the geographic location, complexity, and pathologist full-time equivalents for clinical care (ie, excluding research and administrative time) per parent facility in 2017. We computed the total volume of ALT tests run per parent facility for 2016.
To identify the differences in ALT results between manufacturers, we first calculated means for each of the 80 CAP samples, within manufacturers. Next, we calculated the range in means between the highest-reading and lowest-reading analyzers for each individual CAP sample.
To evaluate differences between manufacturers, we used a linear mixed effects model to estimate the association between analyzer manufacturer and ALT results, using the sample average ALT values within manufacturer. We adjusted for manufacturer, laboratory, and calendar year as fixed effects. The CAP sample was included as a random effect and was used to adjust for repeated observations across the 80 samples across years 2012–2017. We included an interaction term for year and manufacturer to account for the possibility that the analyzer model or other technical factors, such as reagents, might have changed over time within manufacturers. We report average adjusted mean ALT values for each manufacturer across all years and their associated significance tests. We tested the significance of the fixed effects and interaction term using F-tests.
To determine whether differences between analyzers were present at all levels of ALT, we repeated the linear mixed effects modeling for subgroups deemed clinically meaningful (in the determination of the normal range), including less than 50 U/L and less than 35 U/L. We used the VHA-wide mean ALT for each CAP sample to select these 2 subgroups because the “true” ALT values were unknown.
We also identified the CAP sample with the lowest overall ALT mean. This sample was selected because it was closest to the range for ULN recommended in ACG practice guidelines for healthy males (29 to 33 U/L).2 The proportion of VHA results exceeding the recommended ULN threshold of 33 U/L was calculated overall and by analyzer manufacturer for this single CAP sample.
Analyses were performed using SAS version 9.4 (Research Triangle Park, North Carolina). All means and descriptive statistics are unadjusted, unless specified. All years are calendar years, unless specifically indicated as fiscal years.
VHA Facilities and Analyzer Manufacturers
Although the number of laboratories and analyzer machines in use varied throughout 2012–2017, an overall 223 laboratories in 140 parent facilities completed CAP testing of 22 950 specimens. The ALT measurements were performed on analyzers using standard procedures from 5 manufacturers: Abbott Laboratories (n = 3175 [14%]), Beckman Coulter Diagnostics (n = 8723 [38%]), Roche Diagnostics (n = 2595 [11%]), Siemens Healthineers USA (n = 5713 [25%]), and Vitros/Ortho Clinical Diagnostics (n = 2744 [12%]). In total, we excluded 345 CAP results (1.5%) that were classified as nonviable or were rejected because of technical issues (Table 1).
In 2017, results from 5 CAP samples were processed by 369 machines in 210 laboratories (Table 2; data provided by VHA National Pathology and Laboratory Medicine Services). For inpatient and outpatient care, each VHA parent facility performed a mean of 61 481 (SD = 39 458) ALT tests the preceding year; 40 of 141 sites (28%) were classified as tertiary (highest complexity). The locations of the 5 types of analyzers varied in the 9 VHA regions in 2017, ranging from 4 regions with the Vitros/Ortho analyzer up to 8 regions for Beckman (Table 2).
Differences in CAP Results by Analyzer Manufacturer
From the 80 individual CAP samples, the mean VHA-wide ALT results ranged from 28.5 to 236.3 U/L per sample (Figure 1; Supplemental Table [see supplemental digital content at www.archivesofpathology.org in the June 2020 table of contents]). These ALT means were nearly identical to the US published means for these same CAP samples (Supplemental Table).
Across all 80 CAP samples, we found a mean difference of 10.4 U/L (SD = 3.6) between the highest- and lowest-reading manufacturers. Vitros/Ortho analyzers consistently produced higher results than analyzers by other manufacturers (Figure 2; Table 3).
Mean ALT for the 10 individual CAP samples in the less than 50 U/L subgroup ranged from 28.5 to 48.7 U/L across different analyzer manufacturers (Supplemental Table). Within this subgroup, we found a mean difference of 15.4 U/L (SD = 1.8) between the highest- and lowest-reading manufacturers. Across different analyzer manufacturers, mean ALT for the 5 individual CAP samples in the less than 35 U/L subgroup ranged from 28.5 to 34.2 U/L. In this subgroup, we found a mean difference of 16.4 U/L (SD = 1.2) between the highest- and lowest-reading manufacturers. In both the less than 35 U/L and less than 50 U/L ranges, Vitros/Ortho machines produced higher results than other analyzers, and Roche produced the lowest (Figures 2 and 3; Table 3; Supplemental Table).
In adjusted models, the interaction between year and manufacturer was statistically significant (P < .001), indicating that differences in ALT values between manufacturers differed by year. Within each given year, there were statistically significant differences in ALT values between the different manufacturers (P < .001 for all years). Unadjusted mean results from Roche analyzers were consistently lower and Vitros/Ortho consistently higher compared with the other analyzers.
These statistically significant differences between manufacturers persisted after restricting our sample population to: (1) samples with mean ALT less than 50 U/L (all P < .001), and (2) samples with mean ALT less than 35 U/L (all P < .001), and after excluding results from Vitros/Ortho.
Comparison of VHA ALT Results to ACG Practice Guideline ALT ULN Thresholds for Healthy Males
The CAP sample with the lowest mean ALT (28.5 U/L [SD = 5.9]) was selected for comparison with ALT elevations exceeding the normal thresholds recommended in published guidelines.2 Across all 5 analyzers, 43 of 227 (19%) of ALT results exceeded the ACG threshold of ULN (33 U/L) for this CAP sample and would be considered an elevated ALT in males. Of the 227 analyzers testing this CAP specimen, Abbott, Beckman, and Roche analyzers did not produce any ALT results higher than the 33 U/L threshold; 11 of 47 (23%) ALT values were above 33 U/L for Siemens analyzers, and 31 of 31 (100%) ALT values were above this threshold for Vitros/Ortho analyzers.
Alanine aminotransferase analyzers produced by different manufacturers yield statistically different results that can be clinically relevant, particularly at lower ALT ranges. After controlling for the effects of facility and year in a national health care system, we found a statistically significant association between manufacturer and ALT proficiency testing results during our 6-year analysis. Among samples with a mean ALT lower than 50 U/L, we noted an average 15.4 U/L difference between the highest- and lowest-reading analyzers. We additionally applied the ALT thresholds recommended in US liver disease guidelines to our data and found substantial differences by analyzer manufacturer. For example, the CAP sample with the lowest national mean result (28.5 U/L) produced a mean ALT of 24.3 U/L on Roche analyzers and ALT 41.1 U/L on Vitros/Ortho analyzers. Applying the ACG “normal” threshold of ALT 29 to 33 U/L for healthy males to proficiency testing data,2 we found that nearly 1 in 5 results from this “normal” ALT sample would have been classified as abnormal (>33 U/L) in clinical practice. Although proficiency test samples for ALT are not necessarily commutable with human specimens and therefore cannot be extrapolated literally to real-world results, our findings raise significant concern that cross-platform variability may also occur in clinical practice.13 At a minimum, our findings suggest a need to further investigate the source of the observed variability and to consider the need for standardization before applying universal ALT cutoffs in clinical practice.
With guidelines recommending the evaluation of unexplained ALT elevations, our findings suggest that many patients are at risk for misclassification as having “abnormal” ALT simply as a result of analyzer characteristics. Misclassification could potentially launch a negative cascade of events, including needless anxiety and uncertainty for patients and families, costly additional testing, and unnecessary specialist referral and treatment. Clinical pathologists have already highlighted their concerns for the use of a universal ALT threshold (with its potential for overdiagnosis of liver disease) and the importance of pathologist input in guideline development.14 Our results suggest that analyzer-specific normal ALT ranges might be more appropriate than an absolute cutoff for normal range, at least until cross-platform analyzer differences can be further explored.
When examining the broader spectrum of ALT data from 29 to 236 U/L, we found a mean difference of 10.4 U/L (SD = 3.6) between the highest- and lowest-reading manufacturer means for each sample. Our ALT findings mirror a recent Canadian national study (performed with pooled human blood samples) as well as a state-wide study in Indiana; Siemens analyzers were “biased high” compared with Abbott, Beckman, and Roche analyzers, which were “biased low.”8,9 We found Vitros/Ortho analyzers to be the highest-reading, followed by Siemens, Abbott, Beckman, and Roche. When we excluded the results of the Vitros/Ortho analyzer, there was greater consistency between the other 4 manufacturers, yet statistically significant variation persisted. These observed differences in ALT values across analyzer manufacturers highlight the need for provider awareness of the ALT reference range when assessing ALT test results. In patients with markedly elevated ALT, between-manufacturer differences of less than 20 U/L are unlikely to influence management but could potentially cause misclassification at lower ALT levels.
Among our study's strengths are its national scope and use of externally certified, high-quality ALT proficiency tests during 6 years. The VHA proficiency testing results from our integrated national health care system are nearly identical to those from US-wide proficiency data, supporting their reliability. All VHA sites are certified by CAP and adhere to analyzer manufacturers' instructions. We examined more than 22 000 CAP samples over a wide range of ALT values—far more than any previously published work—and accounted for facility-level effects and year. We accounted for potential changes in analysis technique over time by including an interaction term representing the product of analyzer and year.
Our conclusions should be interpreted within the context of several limitations. Most importantly, although the CAP publishes the national mean and other statistics for its standardized specimens, the objective “target” result for each CAP sample is unknown. This lack of a gold standard precludes an assessment of the relative superiority of one analyzer manufacturer over another. Without an international certified ALT reference standard and ALT assay standardization (eg, use of coenzyme pyridoxal 5′-phosphate), we cannot provide insight into the many other factors that may influence the determination of what constitutes a “normal” ALT, such as patient characteristics (eg, age, sex, ethnic background), and laboratory factors, such as ALT assay reagents.1,14–16 In the future, an international certified reference standard for ALT could be used to investigate and address the cross-platform variability we observed. A second limitation is that our analysis is restricted to the 5 ALT analyzer manufacturers used in the VHA system through 2017. Future work is needed to assess the performance of other manufacturers. Third, any changes in analyzer technology after 2017 are not reflected in our results. Finally, proficiency test data for ALT are not known to be commutable with true human samples for multiple potential reasons, including the use of nonhuman matrix or modification for stability.17 Future study is needed to assess the commutability of reference materials used in external quality assurance for ALT. Future VHA studies in ALT quality control will be explored—for example, to determine whether each manufacturer's calibration is determined from a known absorptivity value in the indicator reaction.
In conclusion, significant ALT measurement differences occur by analyzer manufacturer, particularly with an ALT in or near the “normal” range. Use of universal thresholds to define the “normal” ALT values will potentially result in some patients being misclassified as having ALT elevations, with the possibility for unnecessary medical intervention due to measurement differences. To address ALT measurement differences by analyzer, manufacturers could standardize their analyzers against an objective reference population, as has been attempted in Europe.18,19 In the United States, the Food and Drug Administration has recently started the Food and Drug Administration–Systemic Harmonization and Interoperability for Enhancement of Laboratory Data (SHIELD) program, a pilot for inclusion of the unique device identifier for each lab result as metadata.20 This would allow health care organizations to implement decision support and reference ranges based on the type of device used. Finally, over and above laboratory harmonization and standardization, continued work is required to determine what constitutes “true normal” ALT in healthy individuals.
The authors have no relevant financial interest in the products or companies described in this article.
Supplemental digital content is available for this article at www.archivesofpathology.org in the June 2020 table of contents.
Presented as a poster at the American Association for the Study of Liver Diseases; November 9, 2018; San Francisco, California.
The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs.