In the XN series of hematology analyzers (Sysmex, Kobe, Japan), the probability of the presence of abnormal cells is indicated by flags based on Q values.
To evaluate the Q value performance of the Sysmex XN-20 modular analyzer.
The interinstrumental concordance, intrainstrumental precision, and diagnostic accuracy of Q values, with tested flags of “blasts/abnormal lymphocytes,” “atypical lymphocytes,” and “blasts,” were investigated.
Absolute concordance rates in flagging between 2 analyzers ranged from 69.8% to 80.8%, and κ values ranged from 0.43 to 0.61. In samples with absolute related cell counts lower than 100/μL, the values ranged from 0.31 to 0.52. For intrainstrumental precision, standard deviations ranged from 4.8 to 23.9 for the blasts/abnormal lymphocytes, from 18.7 to 59.1 for the blasts, and from 11.0 to 23.0 for the atypical lymphocytes. Using a default Q value cutoff, diagnostic accuracy values based on the area under the curve, sensitivity, and specificity were, respectively, 0.910, 90.9%, and 72.2% for blasts/abnormal lymphocytes; 0.927, 84.9%, and 89.8% for blasts; and 0.865, 74.4%, and 84.9% for atypical lymphocytes. The diagnostic accuracy of Q values was much lower in samples with absolute related cell counts lower than 100/μL than in those 100/μL or higher.
Q values of the Sysmex XN-20 analyzer were found to be imprecise and irreproducible, especially with samples containing a small number of pathologic cells. Adjustments in the Q value threshold may help in the detection of these cells.
A complete blood count, including a differential count of white blood cells (WBCs), is the main test used to detect hematologic anomalies and monitor a patient's disease status during or after treatment.1 Combining hematologic analysis with sophisticated algorithms for data interpretation has led to dramatic improvements in the results from such tests. The utility of automated analyzers and newer instrumentation for patient care far surpasses that of past screening tools.2–4
The Sysmex XN modular analyzer (Sysmex, Kobe, Japan), which was introduced in 2011, uses newer analytic channels than the previous XE series. The WBC precursor channel (WPC), one of the new channels, is based on fluorescence flow cytometry technology and was developed particularly to detect myeloblasts or lymphoblasts more accurately than the immature myeloid information channel used in the XE series.5 Another new channel, a WBC differential channel (WDF), for WBC differential counts, is now included; a similar channel was also used in the XE series, but the WDF in the XN series includes the basophil count. There is no longer a separate basophil channel, as there is in the XE series. In addition, separation of lymphocytes and monocytes has also been optimized in the XN series, with a milder lysis reaction than that used in the XE series. The WDF has new shape recognition software, flag graph areas, and algorithms.5,6
In automated hematology analyzers, a flag is generated either by instrument-defined criteria or by user-defined criteria. Flagging indicates that the technologist should review the peripheral blood smear for abnormal cells or mark it for further investigation.2,3 In Sysmex instruments, the probability of the presence of abnormal cells is indicated by the Q value, which is based on a scale from 0 to 300, with increments of 10 arbitrary units. The default threshold setting is 100. At this setting, abnormal cells, such as “blasts” or “atypical lymphocytes,” are flagged with Q values 100 or higher and are not flagged with Q values lower than 100.7 The Q value cutoff can be adjusted by users, based on individual laboratory needs.
There have been several studies that have evaluated the analytic performance of the Sysmex XN modular analyzer5–12 ; however, studies that focus on the Q value of the XN module are rare. Although we found 1 study, by Eilertsen et al,7 addressing the reliability of the Q value, they studied the flag performance of the previous analyzer, the XE-5000, not an XN module, and they concluded that the performance of the Q flag in the XE series was questionable because of its imprecision. Of course, their study did not encompass the Q value performance of either the newly developed WPC or WDF available in the XN series. The analytic performance of the XN series, including the analysis algorithm, reagent reactions, signal processing, and flagging, has been improved from that of the previous XE series13,14 ; therefore, the reliability of the Q value may have changed. The aim of our study was to evaluate the flagging performance of the Sysmex XN modular analyzer on the basis of Q values.
MATERIALS AND METHODS
Analytic Methods
We studied the performance of Q value flagging based on interinstrumental concordance, intrainstrumental precision, and diagnostic accuracy. The flags under study were obtained using 2 channels: the WDF for “blasts/abnormal lymphocytes” and “atypical lymphocytes,” and the WPC for “blasts.” We evaluated 2 Sysmex XN-20 modules and designated them XN1 and XN2. These instruments were purchased at the same time and were calibrated and harmonized by the manufacturer. Manual WBC differential counts were performed for all samples. A total of 200 cells were counted by 2 highly trained technicians, each counting 100 cells. The presence of single cells (1 of 200; 0.5%) associated with each flag (blasts for “blasts/abnormal lymphocytes” and “blasts” flags, or atypical lymphocytes for “atypical lymphocytes”) qualified as a true-positive result for a blood smear. This criterion is supported by guidelines in the Clinical and Laboratory Standards Institute documents H20-A2 and H26-A2.15,16 Enlarged dysmorphic lymphocytes, with an irregular monocyte-like nucleus and abundant bluish cytoplasm, that came from the patients without evidence of hematologic malignancies were counted as atypical lymphocytes. This study was conducted at Chung-Ang University Hospital (Seoul, Korea) from August 2015 to December 2016. The study protocol was approved by the Chung-Ang University Hospital Institutional Review Board (IRB No. C2016033) and was compliant with the ethics standards codified in the 1964 Declaration of Helsinki and later amendments.
Materials
To study interinstrumental concordance, clinical samples initially analyzed in the XN1 module were reanalyzed with the XN2 within 2 hours of the initial analysis. The numbers of samples with the flags “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes” were 500 (including 35 true-positive samples), 338 (including 45 true-positive samples), and 354 (including 136 true-positive samples), respectively.
To evaluate intrainstrumental precision, 18 clinical samples (3 samples for Q values of 50–150 and 3 samples for Q values ≥150 for each flag, designated S1–S18) were randomly selected, and complete blood count analyses were repeated 10 times per sample using the XN1 module. All replicates were performed within 2 hours of the initial analysis.
The diagnostic accuracy of each flag was determined using 500 (including 95 true-positive samples), 334 (including 127 true-positive samples), and 408 (including 114 true-positive samples) samples with the flags “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes,” respectively. Q values ranged from 0 to 300 for all 3 flags. All samples were analyzed within 4 hours from blood collection.
All samples were obtained consecutively, rather than selected during the study period.
Statistical Analysis
To determine interinstrumental concordance, a regression analysis including Pearson correlation was performed to evaluate the relation between the absolute Q values of the 2 analyzers, and results with P values <.05 were regarded as evidence of a significant correlation. Pearson χ2 test was carried out to evaluate the concordance between flagging by the modules. From these data, absolute concordance rates, κ values, and P values were calculated; differences between the 2 analyzers at P < .05 were considered significant. True-positive samples were analyzed separately according to absolute related cell counts (blasts for “blasts/abnormal lymphocytes” and “blasts” flags, and atypical lymphocytes for “atypical lymphocytes”) of lower than 100/μL, or 100/μL or higher, to determine whether cell counts affected interinstrumental concordance.
To evaluate intrainstrumental precision, standard deviations, mean Q values, and coefficients of variation based on 10 replicates for each sample were analyzed.
The diagnostic accuracy of each flag was determined by means of a receiver operating characteristic curve analysis. A statistical analysis according to absolute cell counts (<100/μL or ≥100/μL) was also performed. The statistical analyses were conducted in SPSS version 19 (SPSS, Chicago, Illinois).
RESULTS
Interinstrumental Concordance
Table 1 shows results of the regression analysis of flagging by 2 XN modules for “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes.” Although there was a significant correlation between Q values from the 2 modules for each suspect flag (P = .007, <.001, and <.001, respectively, for the tested flags), Pearson correlation coefficients (r) for the 3 flags were low at 0.76, 0.56, and 0.73, respectively. Using true-positive samples, only “atypical lymphocytes” flags showed a significant correlation between the 2 modules, with r = 0.66 (P < .001).
Table 2 shows results of the flagging concordance analysis of the 2 analyzers. According to Pearson χ2 tests with cross-tabulation, absolute concordance rates ranged from 69.8% to 80.8%. The κ values for the “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes” were 0.43, 0.46, and 0.61, with P values of <.001, <.001, and .006, respectively. With true-positive samples, samples with absolute cell counts lower than 100/μL showed statistically significant lower agreement between the 2 modules than that of samples with absolute cell counts 100/μL or higher in “blasts/abnormal lymphocytes” (87.0% versus 100.0%; P = .02) and “blasts” (60.9% versus 86.4%; P < .001) flags. The “atypical lymphocytes” flags were not different between the 2 modules.
Intrainstrumental Precision
Table 3 shows the intrainstrumental variability of an XN module. For samples with Q values of 50 to 150 (S1–S9), the SD ranged from 4.8 to 23.9 for the “blasts/abnormal lymphocytes” flag, from 18.7 to 59.1 for the “blasts” flag, and from 11.0 to 23.0 for the “atypical lymphocytes” flag. Among true-positive clinical samples revealed with Q values of 50 to 150 (S3, S5, S6, and S9), only 2 samples showed consistency in flagging for “blasts/abnormal lymphocytes” (S3) and “blasts” (S6) in 10 replicates; all Q values calculated from 10 consecutive analyses of these 2 samples were 100 or higher. In S5 for “blasts” and S9 for “atypical lymphocytes,” flags were absent (Q values lower than 100) 4 times in 10 replicates, despite the presence of these cells. No samples showed consistency in the “atypical lymphocytes” flag, regardless of whether atypical lymphocytes were present in the peripheral blood smear. Based on the first and the second measurements alone, 5 samples (S1, S2, S4, S5, and S9) showed no consistency in the presence or absence of flags.
For samples with Q values higher than 200, CVs ranged from 1.4% to 11.2% for the “blasts/abnormal lymphocytes” flag, from 9.9% to 29.5% for the “blasts” flag, and from 2.1% to 5.5% for the “atypical lymphocytes” flag. Unlike samples with Q values of 50 to 150, all samples with Q values greater than 150 (S10–S18) showed consistency in flagging performance.
Diagnostic Accuracy
Table 4 shows the diagnostic accuracy of flagging in the Sysmex XN module, based on a receiver operating characteristic analysis. For the flags “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes,” the area under the curve values were 0.910, 0.927, and 0.865, respectively. At the default Q value cutoff setting (Q value of 100), sensitivity and specificity for the “blasts/abnormal lymphocytes” flag were 90.9% and 72.2%; for the “blasts” flag, 84.9% and 89.5%; and for the “atypical lymphocytes” flag, 74.4% and 84.9%, respectively. The diagnostic accuracy of the XN series for samples with absolute cell counts lower than 100/μL was lower than that for samples with cell counts higher than 100/μL. The area under the curve values of samples with cell counts lower than 100/μL for the flags “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes” flags were 0.863, 0.862, and 0.815, respectively, and those for samples with cell counts 100/μL or higher were 0.944, 0.968, and 0.905, respectively.
DISCUSSION
In this study, we determined the reliability of Q flag performance by a Sysmex XN modular system based on evaluations of diagnostic accuracy, intrainstrumental precision, and interinstrumental concordance.
The XN module showed generally acceptable diagnostic accuracy in the receiver operating characteristic analysis. The area under the curve values of the flags “blasts/abnormal lymphocytes,” “blasts,” and “atypical lymphocytes” were 0.910, 0.927, and 0.865, respectively. Nonetheless, the performance of each flag should be assessed for its own purpose. Because hematology analyzers are primarily used for screening, especially for the presence or absence of immature cells or blasts, they should show high sensitivity in detecting pathologic cells. In our study, when we designated 100 as the threshold Q value, the sensitivities of flags for “blasts/abnormal lymphocytes” and “blasts” were 90.9% and 84.9%, respectively, and these data were consistent with (or slightly lower than) the values reported in another study.6 The sensitivity of these flags is low; thus, this assay may miss the presence of blasts in whole-blood samples. Therefore, we recommend adjusting the cutoff thresholds for Q flags. For example, when we lowered the cutoff from 100 to 80, the sensitivities of “blasts/abnormal lymphocytes” and “blasts” flags increased to 97.3% and 89.6%, respectively. In contrast, because the “atypical lymphocytes” flag is necessary to identify the presence of atypical lymphocytes, it may be useful to set a higher cutoff value (120 or more). It is important to adjust the cutoff values based on the needs of individual laboratories; the specific methods for optimizing the thresholds have been thoroughly documented.17
In other studies of XN module performance,5,6,8–10,12 XN modules have shown comparable results among modules, especially in WBC differential counts and flagging performance. Findings of these studies were similar to ours; however, they did not evaluate the performance of Q values. In our study, we found that the reliability of Q values was poor.
A regression analysis of Q values from 2 XN modular systems showed a statistically significant correlation between the modules. In contrast, Pearson correlation coefficient was relatively low. In the concordance analysis, absolute concordance rates of the “blasts/abnormal lymphocytes” and “blasts” flags were 69.8% and 78.4%, with low κ values of 0.43 and 0.46, respectively; these results indicated that the performance of the 2 modules was different. Based on positive samples alone, especially samples with absolute blasts counts lower than 100/μL, κ values decreased to 0.31 and 0.39. Only 60.9% of the samples with absolute blast counts lower than 100/μL showed concordant results between the 2 analyzers. These results indicate the weak reliability of Q flagging for “blasts/abnormal lymphocytes” from the WDF and “blasts” from the WPC in samples with few blasts.
In 2013, Eilertsen et al7 reported that the concordance among 3 modules of the Sysmex XE-5000 series was poor. The κ values for the detection of “blasts” by the 3 modules ranged from 0.73 to 0.74, which are higher than the κ value of 0.46 found in this study. We believe that this discrepancy comes from a difference in the true-positive sample fraction, in addition to differences between the analyzers. Even considering this difference between the 2 studies, our results indicated that an improvement in performance of Sysmex XN module flags over that of XE module flags is questionable.
Intrainstrumental precision was also poor based on samples with Q values of 50 to 150. Only 2 samples (S3 for “blasts/abnormal lymphocytes” and S6 for “blasts”) showed consistency among the 10 replicates. Of the 3 flags, the “blasts” flag from the WPC showed an unusually large standard deviation, especially when measuring S5, the true-positive sample containing a small number of blasts. This sample showed no consistency in flagging for “blasts”; “blasts” were flagged only 6 times in 10 replicates. Sample S5 also showed Q values lower than 100 in both the first and second measurements (60 and 10, respectively). These results indicate that even if each laboratory routinely or selectively repeated tests, it would miss the “blasts” flag. Similarly, in S9, “atypical lymphocytes” were flagged only 6 times in 10 replicates. The imprecision of the XE-5000 series for samples with low Q values has been reported,7 and our results showed that flagging by the XN modules was as imprecise as with the XE series.
The manufacturer claimed that XN modules perform better than XE series ones, because the reagent reactions, signal processing, analysis algorithms, cell differentiation, and pathologic-cell detection of XN modules are improved.11,13,14 Several research groups that have conducted performance evaluations of the XN series concluded that the analytic performance, including flagging, was improved relative to that of the XE series.5,6,8,10,12 In contrast, our study revealed that the performance of Q values in the XN series, which were based on novel algorithms and technology, were not better than those of the XE series. We found them to be imprecise and irreproducible, especially for samples with few pathologic cells; in these samples, pathologic cells could be overlooked if we relied solely on flags. This problem can be overcome by clinical laboratories' own decision-making rules, on the basis of the present complete blood count profile as well as the past results and clinical department of patients; optimizing the Q value threshold according to the laboratory's work loading may also ensure detection of pathologic cells.
One limitation of our study is that we did not use the “low WBC mode” of the XN modules. In contrast to the previous version (the XE series), the XN series has this mode for samples with WBC counts lower than 1500/μL, and the “low WBC mode” is known to yield more precise differential counts than those based on conventional analysis.9,10 The concordance between the 2 analyzers may have been better if we had used this mode with low WBC counts. Another limitation was the possible bias from manual counts. Even if we strictly followed the guidelines, manual cell counting is known to be subjective, with low interobserver and intraobserver reproducibility.18 These limitations make our “standard” imperfect and may have led to a bias in our analysis. However, the regression analysis of manual counts generated by our technicians revealed that the intercept, slope, r, and P values were, respectively, 0.984, 0.1717, 0.995, and <.001 for the blasts, and 0.8792, 0.2077, 0.836, and <.001 for atypical lymphocytes. Therefore, the impact of interobserver reproducibility would have been small. Finally, the XN modules flag “abnormal lymphocytes” from the WPC for immature lymphocytes, but we did not evaluate this flag. This is because, during the study period, clinical samples with lymphoblasts or lymphoma cells were hard to find; thus, we excluded these parameters from our evaluation. The performance of the “abnormal lymphocytes” flag from the WPC should be evaluated in the near future using a sufficient number of positive samples.
In conclusion, our study revealed that the Q values of samples with a small number of pathologic cells were imprecise and irreproducible. Pathologic cells could be overlooked if researchers relied solely on these flags. This problem can be serious in clinical laboratories of health screening centers or hospitals where they do not encounter pathologic cells frequently. Therefore, such clinical laboratories should have their own decision-making rules and adjustments to Q value thresholds. These adjustments in the Q value threshold may ensure detection of samples with a trace number of pathologic cells by clinical laboratories.
References
Author notes
The authors have no relevant financial interest in the products or companies described in this article.