Many production systems employ standardized statistical monitors that measure defect rates and cycle times, as indices of performance quality. Clinical laboratory testing, a system that produces test results, is amenable to such monitoring.
To demonstrate patterns in clinical laboratory testing defect rates and cycle time using 7 College of American Pathologists Q-Tracks program monitors.
Subscribers measured monthly rates of outpatient order-entry errors, identification band defects, and specimen rejections; median troponin order-to-report cycle times and rates of STAT test receipt-to-report turnaround time outliers; and critical values reporting event defects, and corrected reports. From these submissions Q-Tracks program staff produced quarterly and annual reports. These charted each subscriber's performance relative to other participating laboratories and aggregate and subgroup performance over time, dividing participants into best and median performers and performers with the most room to improve. Each monitor's patterns of change present percentile distributions of subscribers' performance in relation to monitoring durations and numbers of participating subscribers. Changes over time in defect frequencies and the cycle duration quantify effects on performance of monitor participation.
All monitors showed significant decreases in defect rates as the 7 monitors ran variously for 6, 6, 7, 11, 12, 13, and 13 years. The most striking decreases occurred among performers who initially had the most room to improve and among subscribers who participated the longest. All 7 monitors registered significant improvement. Participation effects improved between 0.85% and 5.1% per quarter of participation.
Using statistical quality measures, collecting data monthly, and receiving reports quarterly and yearly, subscribers to a comparative monitoring program documented significant decreases in defect rates and shortening of a cycle time for 6 to 13 years in all 7 ongoing clinical laboratory quality monitors.
Clinical laboratory testing is a production system that turns inputs—orders for tests, patient identifiers, and specimens—in relevant cycle times, usually referred to as turnaround times (TATs), into outputs—reporting events, some of potentially critical value, and result reports stored in patient records.
Background
More than 80 years ago, Walter Shewhart showed that quantitative techniques can assess production systems by measures that came to be called statistical quality control.1,2 Thirty years ago, Shewhart's student and colleague, W. Edwards Deming, argued that in production systems, ongoing provision of statistically valid information decreased defect rates and shortened cycle times in systematic ways.3,4 The principles and techniques that Shewhart developed and Deming advanced have long been used within clinical laboratories to assess and improve result generation.5
Q-Tracks Monitor Program
For more than a decade, the College of American Pathologists Q-Tracks monitoring program has facilitated hundreds of laboratories as they examined preanalytic inputs and postanalytic outputs, as well as process cycle times using the approach of Shewhart and Deming.6–9 Some Q-Tracks monitors measure attributes specific to particular sorts of clinical laboratory testing: blood culture contamination in clinical microbiology,9 blood product wastage for blood banks,10,11 and cytologic-histologic diagnostic correlation in cytology.12 However, 7 Q-Tracks studies measure generic indices that apply across almost all clinical laboratory testing.13–28 From the beginning to the end of the laboratory testing process, the generic monitors were QT17–outpatient order-entry error rates,13,14 QT1–identification (ID) band defect rates,15,16 QT3–specimen rejection rates,17–20 QT15–median troponin order-to-report times,21,22 QT8–STAT test receipt-to-report TAT outlier rates, 23,24 QT10–critical value reporting event-defect rates,25,26 and QT16–corrected report rates.27,28
Rationale for Monitor Format and Analysis Design
The rationale for assessing 6 rates of defects and a cycle time is the proven utility of these 2 kinds of measures in many settings, including laboratories.29–37
In the Q-Tracks monitors, the measures are based on ongoing data collection, reported monthly, then, displayed quarterly and annually as individual performance measures, along with overall averages, and peer-group performance indices. There were 3 rationales for analyzing the indices as we did: (1) to provide reasonably rapid feedback to subscribers on their performance, (2) to set individual subscribers' performance in the context not only of the overall average but also of the performance indices of the 3 peer groups, and (3) to present those indices in formats that follow subscribers' performance over time, quarter-to-quarter, year-to-year, and over multiple years.
Relation of Q-Tracks Monitoring to Process Improvement
Monitors resembling the 7 Q-Tracks monitors are among indices advocated by Plebani and others38–45 as plausible quality indicators in laboratory medicine. All the Q-Tracks monitors assess defects that disrupt laboratory operations, as staff sort out order defects, investigate confused patient or specimen identifications, arrange recollection of specimens, respond to clinicians' complaints about median STAT TATs, field telephone calls about delayed STAT results, persist in attempting to report critical values, and correct results reported in error. A distinctive characteristic of Q-Tracks monitors is that they allow statistical comparisons among participating laboratories and permit longitudinal tracking of performance trends among many participants over long periods.
MATERIALS AND METHODS
Q-Tracks Monitor Design
The 7 Q-Tracks monitors were designed and are overseen by the College of American Pathologists' Quality Practices Committee (QPC). The design of all but one of the Q-Tracks has common features. The first common feature is that the 6 Q-Tracks detect events that individual Q-Tracks monitors specifically define. The defined events are numerators in fractions. The second common feature is that the directions in 5 of the 7 Q-Tracks also specify opportunities operationally for the events to occur. In the sixth monitor, surrogate units serve as denominators in the fractions. The opportunities are observed on occasions where the events could occur or, in Q-Tracks monitor 15, in surrogate units that sum up such opportunities.
The seventh monitor, rather than measuring events/opportunities, measures a cycle-time. This duration is a median order-to-results interval for STAT troponin tests.
Q-Tracks Monitor Data Collection
Subscriber laboratories paid yearly fees to participate in an individual Q-Tracks monitor. Following detailed directions supplemented by telephone advice from QPC staff at the College of American Pathologists, participants recorded the events in the 6 of 7 monitors that produced rates (outpatient order-entry errors, patient ID band defects, rejected specimens, STAT test receipt-to-report TAT outliers, defective critical value reporting events, and corrected reports). For the 6 rate monitors, subscribers simultaneously tracked the opportunities for the events' potential occurrence: outpatient test orders entered into a laboratory computer system, patients presenting to be identified for specimen collection, blood specimens collected, STAT test specimens processed, critical value reporting events attempted, and—as a surrogate unit for reports issued—units of 10 000 billable test results.
Special Features of 3 Monitors
The billable test results notation, used to produce a workable denomination for corrected result reports, deals with the relative infrequency of corrected reports by employing a summary unit to represent event opportunities. For the duration monitor, subscribers recorded the cycle time span, defined from order-to-report of STAT troponin tests. Troponin was selected for the median-duration TAT monitor because it was the most widely available test that met the STAT condition of a test event for which there is a time pressure to report results as soon as possible.
Similarly, in the STAT TAT outlier monitor, potassium was selected as the index test because that analyte is part of the most frequently ordered STAT clinical chemistry panels.
Transmitting Data From Q-Tracks Monitors
Employing a standard data-transfer form, subscribers submitted data to the QPC staff each month.
Reporting of Q-Tracks Monitors' Indices
Quality Practices Committee staff entered the participants' data, along with a standard grid of identifying information about submitting subscribers, into a prospectively designed Q-Tracks database. Each quarter, subscribers received the result of the QPC staff's standardized queries of the databases. The quarterly reports consisted of (1) the individual subscriber's own performance on the index in question, (2) the performance of all subscribers as an average, (3) and the performance of subscriber subgroups.
Among the subscriber subgroups, at one extreme (in this article called the 10th percentile) was the median performance of the 10th of subscribers with the fewest defects (the best performers). In the middle of the range was the performance of the overall median subscriber group. At the other extreme, in this article called the 90th percentile, was the median performance of the 10th of subscribers with the most room to improve. Subscribers followed their own, the average, and the stratified participant groups' performances from quarter to quarter.
Annual Reporting of Q-Tracks Monitors
At year's end, the QPC provided an annual summary of specific monitors' indices for each quarter and for the whole year. The committee provided summaries, along with QPC staff's analysis of the performance indices, in relation to potential stratifying variables. Stratifying variables were demographic or practice characteristics that sorted groups of participants into subgroups. Variables with significant associations that indeed stratified subscribers into distinct groups were also included in the annual reports.
Analysis of the Q-Tracks Monitors Over Time
Aggregations of quarterly and yearly reports, the first items in the long-term analysis, are abstracted in Tables 1 and 2 of this article.
Characterization of Monitors
Each monitor is characterized by initial year of availability, average numbers of subscribers per year, and total number of participants (Table 1, columns 3 through 5).
Range of Performance
Analysis of Trends
As presented in Table 3, the authors calculated starting performance and participation effect for subscription to each of the Q-Tracks monitors. The authors tested changes over time for evidence of improvement or deterioration. These tests correlated in 2 variables with subscribers' performance over time: (1) participants' starting level of performance (taken to be a participant's performance in the second quarter of the first year of participation), and (2) participants' length of participation in the monitor (in quarters). Table 3 presents assessments of those 2 direct measures and a calculation. The calculation combines the starting performance and length-of-participation measures. Rising and falling trends were tested for significance as indicators of improved or deteriorating performance quality.
Tests for Statistical Associations
A linear mixed model was fitted to test individual associations. The model specified a spatial power-covariance structure for the correlation between repeated measurements. Covariates significant at the .10 levels were included in the final model. The final, linear mixed model included the covariates identified from preliminary analysis and the 3 interaction terms (starting performance, length of participation, and the calculation that combined them), using a significance level of .05 for the analyses. All statistical calculations were run using SAS 9.2 statistical software (SAS Institute, Cary, North Carolina).
Other Analyses
In data not shown in this article, variations in the Q-Tracks monitors were further analyzed to determine whether demographic features, practice characteristics, or, for the patient ID-accuracy monitor, status before or after the 2007 Joint Commission selection of patient ID as a patient safety goal had any effects on the trends. Because the distributions of the indicators were slightly skewed, natural log transformations were deployed to create an approximately Gaussian transformations. The monitors were tracked over variable ranges of 6 to 13 years, so a saturated Cox proportional hazard model was used to test for attrition bias. Covariates were made time dependent by creating interactions between the listed factors and the quarterly time component. Because none of the covariates were statistically significant (P < .05), there was no additional bias adjustment.
Generation of Figures
The standard formation of the Q-Tracks monitors permitted construction of comparable graphs of subscribers' performance over time. As presented in Figures 1 through 7, the graphs provided signature patterns for each monitor. To generate curves of signature suites for each of the 7 monitors—because different monitors functioned at various times with different numbers of participants—the B-spline plot was adopted as a way to fit smoother curves. The B-spline plot allowed variable-sized subsets of data to fit in similar formats. In this mode of data display, the x-axis presents quarters of participation, and the y-axis provides a scale for the quantified performance indicator. The rationale for displaying the 7 monitors' data as trends in a series of B-spline plots was that the convention depicted in similar patterns, rates of varying magnitudes over long time spans.
Thus, for 5 of the 7 monitors, patterns were generated across the various spans, which consisted of average trend lines and three component trend lines for the best performers, median performers, and performers with the most room to improve. The other two monitors—median troponin order-to-report times and critical value reporting event-defect rates—produced 3, rather than the usual 4, curves. In the first case, there were just 2 component curves because cycle time proved to be dependent on the testing instrument. In the second case, the loss of the third component curve was due to the collapse of subgroups into just 2 cohorts—of better and worse performers.
Graphic Presentation of Trends
In the graphs for the first 3 monitors, order-entry errors, ID-band defects, and rejected specimens (QT17, QT1, and QT3), 4 performance trend lines are presented (Figures 1 through 3). A solid, average trend line tracked aggregates of subscribers. Variously dashed and dotted 10th percentile, median, and 90th percentile lines tracked the quarterly performance of the 3 component subgroups of subscribers. For the fourth monitor—median order-to-report time for STAT troponin (QT15)—3 trend lines are presented in Figure 4 with the solid, average line and the dashed, 2, subsidiary trend lines, one for each instrument type. The subsidiary curves were for the point-of-care test and laboratory instruments. In the fifth monitor, STAT test receipt-to-report TAT outliers (QT 8), the 4 trend lines reappear: the solid average, and variously dashed 10th percentile, median, and 90th percentile subsidiary lines (Figure 5). In the sixth monitor, defects in critical-value reporting events (QT 10), only 2 subsidiary trends were traced, one for performance below, and the other for performance above, the average (Figure 6). For the seventh monitor, test-result correction rates (QT16), yearly, average result-correction rates per 10 000 billable tests performed were, for each year of the monitor's operation, presented for all 4 trends: an average solid line and the 10th, median, and 90th percentile subsidiary dashed and dotted component trend lines (Figure 7).
More Detailed Accounts of Each Monitor
The authors have attached online supplemental material that provides in detail the definition of each individual monitor listed in Table 1 (the supplemental digital content is available at www.archivesofpathology.org in the June 2015 table of contents).
RESULTS
Definitions
Durations of Monitoring
The 2 monitors with the longest duration—ID-band defect rates (QTI) and specimen rejection rates (QT3)—were charted for 13 years (1999–2011); one monitor—STAT test receipt-to-report TAT outlier rates (QT8)—was tracked for 12 years (2000–2011), and another, rates of critical-value reporting event defects (QT10)—was tracked for 11 years (2001–2011). Median troponin order-to-report times (QT15) was tracked for 7 years (2005–2011), and the 2 monitors with the shortest durations—outpatient order-entry error rates (QT17) and corrected report rates (QT16)—were charted for 6 years (2006–2011) (Table 1, column 3).
Average Subscription Rates
The average number of subscribers for the different monitors varied during each monitor's career. The 2 longest running indices—ID-band defect rates (QT1) and specimen rejection rates (QT3)—had the highest average subscriber rates (n = 141 for ID-band defects; n = 159 for specimen rejections). Four of the studies had average participant rates that clustered around a hundred subscriptions: outpatient order entry error rates (QT17), n = 106; median troponin order-to-report times (QT15), n = 97; and both STAT test receipt-to-report TAT outlier rates (QT8) and corrected report rates per 10 000 billable tests (QT16), n = 103. The remaining index—critical value reporting event-defect rates (QT10)—had an in-between average subscriber rate of n = 123 (Table 1, column 4).
Fraction of all Subscribers Participating Each Year
For outpatient order-entry error rates (QT17), 35% (106 of 305) of all subscribers contributed data in an average year. For ID-band defect rates (QT1) and specimen-rejection rates (QT3), the average participation fractions both fell to 23% (141 of 620 and 159 of 702, respectively). For median troponin order-to-report times (QT15), participation was higher at 33% (97 of 298). For STAT test receipt-to-report TAT outlier rates (QT8) and critical value reporting event-defect rates (QT 10), the percentages were similar to each other and lower: 21% (103 of 487) and 25% (123 of 498), respectively. For the last index, corrected report rates per 10 000 billable tests (QT16), the average participation fraction was again higher at 35% (103 of 292) (Table 1).
Sequence of Monitored Steps in the Testing Process
Table 2, column 1, lists the 7 Q-Tracks monitors in the order in which they appeared during the testing process with the individual Q-Tracks monitor's QT number.
Time Spans of Monitoring
Table 2, column 2, lists the years that each monitor functioned.
Yearly Variation in Subscriber Numbers
For outpatient order-entry error rates (QT17), participation ranged, for 6 years, from 101 to 67 subscribers (mean, 89; median, 92). For ID-band defect rates (QT1), participation ranged, for 13 years, from 155 to 79 subscribers (mean, 124; median, 120). For specimen rejection rates (QT3), participation ranged, for 13 years, between 200 and 98 subscribing laboratories (mean and medium, both 139). For 7 years, participants in the median troponin order-to-report times (QT15), ranged from 101 to 70 subscribers (mean, 87; median, 91). For a dozen years, the STAT test receipt-to-order TAT outlier rates monitor (QT8) had participants in a range from 118 to 69 subscribers (mean, 91; median, 87). In the monitor for critical value reporting event-defect rates (QT10), participation varied during the 11 years, from 139 to 91 subscribers (mean, 110; median, 104). Finally, for 6 years, subscribers to the corrected report rates index (QT16) varied narrowly between 93 and 82 participants (mean, 90; median, 91) (Table 2, column 3).
Best Performers
Table 2, column 4, records performance for the best 10th of the subscribers.
Median Performers
Table 2, column 5, lists the same events/opportunities quotients and the median TAT duration for the median performers for each monitor, year, and number of subscribers.
Subscribers With Most Room to Improve
Table 2, column 6, lists the fractions and duration for the specified monitor, and year for the participant pool made up of the 10th of the participants with the most room to improve.
Trends in Performance
Regarding the ranges of performance documented in Table 2, 2 salient characteristics appeared among all 7 monitors: (1) the ranges between the 10th and 90th percentile performance groups remained relatively wide but migrated to lower (fewer defects, shorter duration) intervals during the years of function, and (2) the patterns of improvement for all 3 (10th, median, and 90th percentile) performance groups, included subscribers who tended to improve; however, they improved in patterns that varied from index to index and, within each index, from performance group to performance group.
Monitor-by-Monitor Analysis of Range Width and Improvement Patterns
Outpatient Order-Entry Error Rates (QT17)
(1) From the 10th to the 90th percentile, the performance of subscribers ranged 20-fold, from 1.1% to 22%; and (2) during the 6 years of availability, the defect fraction dropped from 1.9% defects to 1% among subscribers in the 10th percentile, whereas median performers improved from 6.3% defects to 4% defects, and defect rates among 90th percentile subscribers, remained around 21%.
Identification Band Defect Rates (QT1)
(1) The initial range, from the 10th to the 90th percentile, began almost 40-fold wide, but during the 13 years of monitoring, that range increased to a width of more than 1000-fold; and (2) improvement appeared at all 3 levels of performance during the 13 years, with more than a 30-fold increase among 10th-percentile subscribers, an 18-fold increase among median performers, and 7.5-fold increase among 90th percentile participants.
Specimen Rejection Rates (QT3)
(1) The initial range from the 10th to the 90th percentile was more than 20-fold wide and remained about the same 13 years later; and (2) improvement at the 10th-percentile level was from 10 specimens rejected per 1000 received to 6 specimens per 1000, for the median participant, rejection rates hovered around 50 per 1000 (in a range between 43 and 58 per 1000), and among the 90th-percentile subscribers, the rate fell 1.7-fold, from 215 per 1000 to 124 per 1000.
Median Troponin Order-to-Report Times (QT15)
As duration reported in minutes, this monitor differed from the other indices: (1) the first year's range was 29 minutes, from 37 to 66 minutes, and 11 years later, the range remained similar at 27 minutes but with a different lower limit of 33 minutes to 60 minutes; and (2) median performance improved by 8 minutes (falling from 55 to 47 minutes). This latter measure was fundamentally different from the median performance on other monitors' analyses because subscribers divided into 2 distinct groups of participants by instrument type—one group of subscribers used point-of-care instruments, and their TATs hovered around 30 minutes; the other group used laboratory instruments, and their median troponin order-to-report times hovered around 60 minutes. Both groups improved performance 4 to 6 minutes during 7 years of monitoring.
STAT Test Receipt-to-Report TAT Outlier Rates (QT8)
(1) The initial performance range had a width of 20-fold, from 2% outliers among the best performers to 40% outliers among the subscribers with the most room to improve, and 12 years later, that performance range remained wide at more than 50-fold (from 0.64% to 33.62%); and (2) during the 12 years of monitoring, the best performers' outlier fractions fell 3-fold (from 1.91% to 0.64%), whereas the median performers' outliers fell 1.5-fold (from 10.19% to 6.85%), and the performers with the most room to improve saw their defect fractions fall modestly, from 40% outliers to 34% outliers.
Critical Value Report Event Defect Rates (QT10)
(1) The initial range width for performance was almost 400-fold (from 0.04 in the 10th percentile to 15.18 in the 90th percentile), and in the last year of reported monitors, the range had narrowed to between 0 defects in the best 10th of subscribers to 4.05 defects in the 10th of subscribers with the most room to improve; and (2) the best performers moved to the less than 1/10 000 defect level after year 3 of monitoring, whereas median performers improved from 2 defects per 100 defective reporting events to .35 defects per 100 attempts to report critical values, and subscribers in the 90th percentile saw their report event defect rate fall from 15.2% to 4% during the monitor's 11-year span.
Corrected Report Rates Per 10 000 Billable Tests (QT16)
(1) The initial range width was 1.2 to 11.5 corrected reports per 10 000 billable tests, and the final range—6 years later—was only slightly narrower, at 0.8 to 9.1 corrections per 10 000 tests reported; and (2) the best performers, in the 10th percentile, improved during the 6 years from 1.2 defects per 10 000 reports to 0.8 defects per 10 000 reports, whereas median subscribers improved from 5.2 to 2.7 defects per 10 000, and subscribers in the 90th percentile improved from 11.5 to 9.1 corrections per 10 000 billable tests reported.
Trend Analysis
Table 3, column 1—labeled Quality Indicator—lists, in the order that each Q-Tracks monitor appeared in the clinical laboratory testing process, each monitor's name and how it is quantified: by percentage of events/opportunities for events to occur for the first 3 monitors by minutes for the duration in the fourth monitor, then percentage of events/opportunities for the fifth and sixth monitors, and events or surrogates for opportunities (10 000 billable test results) for the seventh monitor.
Table 3, column 2, lists 3 significant factors, which are 3 characteristics tested for their influence on process-quality measures: (1) starting performance, defined as a subscriber's performance from the second quarter of participation, (2) number of quarters of participation, and (3) the combination of starting performance and quarters of participation.
Table 3, column 3, lists P values of statistical significance for the effect of each of the 3 characteristics on participants' performance. For all 7 monitors, the 2 direct measures—starting performance and number of quarters of participation—and the calculation—quarters of participation combined with starting performance—were all significantly associated with improvement.
Overall Effect of Participation
Table 3, column 4, lists, for each monitor, the overall quarterly fall in defect rates per quarter.
By order of magnitude, the significant improvement effects were a 1.9% decrease in defects per quarter of participation for outpatient order-entry error rate, a 5.1% fall per quarter's participation in ID-band defect rates, a 2.9% decrease per quarter for specimen rejection rates, a 1.3% decrease per quarter in the fraction of STAT test receipt-to-report TAT outlier rates, a 0.42% decline per quarter in critical value reporting event-defect rates; and a 1.4% decline per quarter in corrected report rates/10 000 tests.
For median troponin order-to-report times in minutes, the effect of participation in the monitoring program was a 0.85% decrease in troponin TAT per quarter of participation.
Graphic Presentations of Suites of Performance Curves
A final major feature of the Q-Tracks performance-improvement model is a graphic presentation of change over time for the suites of curves that were generated to compare the signature patterns of performance for each monitor. Figures 1 though 7 illustrate that feature. Tracking began, for all monitors, in the second quarter of a subscription.
Figure 1 tracks outpatient order-entry error rates as percentages. All 4 curves derive from data that were adequate for the 24 quarter (6 years). The average curve (solid line) traces an S-curve shape that starts between 11% and 12% defects and declines to just above 4% at the end of 6 years. The lowest of the 3 dashed lines follows the group of participants with initial best performance; those subscribers aggregated around a starting performance level of less than 2%. Note that in the graph the best performers' average for the group trend began above that 2% level and fell from just above 4% to around 2%. The middle of the 3 dashed lines is the curve of the median performers. Their starting performances defined a range between 2% and 15%. Their average error rate initially stayed stable—just below 8%—then fell during the second half of the tracking period to between 4% and 5%. The top-most dashed line is the curve for the performances of those with the most room to improve. Error rates in that group of subscribers, which initially aggregated around a level of greater than 15% order-entry errors, fell precipitously during the first 5 years of monitoring—from almost 30% to less than 14%—and then, climbed back slightly to just above the 14% level in the last year of observation.
Figure 2 tracks ID-band defect rates. The curves run for all 52 quarters (13 years) for the overall average and the trend lines for 2 subgroups: participants with the most room to improve (those starting with more than 1% ID band defects per patient ID opportunities) and participants starting in the range around the median (those starting with between 0.25% and 1% ID band defects). For the participants starting with the best initial performance (those starting with a median less than 0.25% error defects), because of technical limits (the B-spline plot's presentation of small data sets), the curve could be charted only out to 40 quarters (10 years), rather than the 52 quarters (13 years). The average (solid line) curve charts an initial abrupt drop in ID band defects from 4% to around 1% defects during the first 6 years of monitoring, then traces a subsequent slow decline from 1% to 0.6% during the last 7 years reported. The lowest subordinate trend line (subscribers with the best initial performance), charts change over time for a group that began with an average ID band defect rate of less than 0.25% per opportunity. That curve also fell steadily. Note, again, that the average trend line, beginning just above 0.3%, starts higher than the group median of 0.25%. During the 10 years that could be charted, the best performer group's rate fell from just above 0.3% to just above 0.1%. The dashed trend line for the median performers, those with initial ID band defects rates ranging between 0.25% and 1.0%, fell from 1% to 0.3% more rapidly during the first 3 years, then, more slowly across the subsequent decade. The curve for participants with the most room to improve, who started with defect rates greater than 1% followed the same pattern, but for a wider range: Their average defect rates fell from 6% to 1.2% during the first 6 years, then, declined from 1.2% to 0.8% during the last 7 years of monitoring.
Figure 3 follows specimen rejection rates in percentages. As with Figure 2, the overall average trend and the trend lines for subscribers with the most room to improve and the median performers were traced for all 13 years, but the smaller subgroup of best performers, because of the constraints of the B-spline convention, could only be followed for 12 years. The overall average followed an S-curve, from between 1.0 and 1.1% at the beginning to between 0.6 and 0.5% at the end of the 13-year period. The lowest dashed line follows the initial best performers, the group that averaged, at the outset, less than 0.5%. They stayed in that range, below 1.0%, for the dozen years that they could be followed. The middle dashed line traces median performers, those with starting performances ranging between 0.05% and 1.50%. They began with an average specimen rejection rate of just below 0.8%. That rate dipped below, and then returned, to that level. The topmost dashed line follows performers who started with more than 1.50% specimen rejections; those were the subscribers with the most room to improve. They began at a 2.9% average defect rate that fell steadily during the 4 years. Their rate was then stable at 1.6% for 5 years, before it fell steeply to 0.9% during the final 4 years of monitoring.
Figure 4 differs from the previous 3 figures. First, it tracks durations, rather than rates. Second, it is made up of only 3 curves. The durations turned out to be dependent on the testing instrument, with differences in them due to whether participants used a laboratory instrument (abbreviated in the key as Lab Inst), or a point-of-care device (abbreviated in the key as POC). The average line declined from an initial duration of 54 minutes to a final duration of 44 minutes along a fairly constant slope. The lower point-of-care device line began at 40 minutes' duration and fell slowly during the first 3 years to 36 minutes, then more steeply to 31 minutes by the 27th quarter (the last quarter that could be entered in the B-spline plot was at 6.75 years). The higher laboratory instrument curve began just above the average, between 54 and 55 minutes, stayed around 53 minutes for 5 years, then, during the last 2 years, fell to around 45 minutes.
Figure 5 returns to tracking rates. Those rates—of STAT test TAT outliers—are of STAT testing events that took longer than the STAT test TAT intervals designated by the participating laboratories. Again, for this measure, curves trace the overall average, the best performers, median performers, and the subscribers with the most room to improve. Also, once again, for technical reasons of the B-spline plot convention, the curve of the relatively few best performers came up short, in this case, one-half of a year (2 quarters) short. The overall average's initial downward slope, from 16% to 12% during the first 3 years, is marginally steeper than the slower decline from 12% to 10% during the subsequent 9 years. The best performers—the subgroup associated with less than 5% of defects—began at a starting level of 3%. The defect rate remained stable at that level for 9 years, then rose slightly toward 4% at the end of its slightly truncated charting period. The median performers, the subgroup with outlier rates between 5% and 20%, produced rates that fell consistently from an initial average between 11% and 12% to rates between 5% and 6%. The last subgroup, those with the most room to improve, started with defect rates greater than 20%. That initially challenged group's rate first fell steeply from around 35% to just below 15%, during the first 6 years. The rate then stabilized between 15% and 16% for 2 years. During the final 4 years, the rate rose slightly toward 18%.
Figure 6 shows critical value reporting-event defect rates measured for 11 years. This monitor is different from the other 5 rate-based monitors in that only 2 component curves, besides the overall average line, could be generated. The 2 components curves were composed of, among better performers, those starting with fewer than 3% defective critical-value reporting events and, among worse performers, those tending to start with 3% or more such events. The overall average line began between 5% and 6% defects, fell at a moderate slope to just above 2% for 4 years, then declined more slowly to around 1% during the remaining 7 years' monitoring. The better performers started at 2% defects and fell slowly but steadily to the 1% level over the entire 11-year monitoring span. In contrast, the initial performers with the most room to improve, who started at 13% defects, fell for 6 years in a steep descent to the 3% level, then fell slowly also to the 1% level, at the end of the 11-year span. Thus, the combined pattern is of curves of slight, moderate, and steep slopes converging on the 1% defect level.
Figure 7 returns to the 4-curve array seen in Figures 1 though 3 and 5. The overall average began at approximately 7% corrected reports per 10 000 billable tests and fell in a slightly concave trajectory to just below 4%. The best performers, those starting with fewer than 2 corrections per 10 000 billable tests, followed, in contrast, a slightly convex track from just above 1 correction per 10 000 tests to just below the same level.
The curve for median participants (those beginning with between 2 and 7 corrected reports per 10 000 billable tests) began just above 7 corrected reports per 10 000 billable tests, then, descended to just above 4 corrected reports for 3 years. Finally, that rate rose slightly to around 5 corrected reports per 10 000 billable tests during the subsequent 3 years. The subgroup with the most room to improve, made up of participants starting with more than 7 corrected reports per 10 000 billable tests, began much higher at between 13 and 14 corrected reports per 10 000 billable tests. The numbers then fell much more steeply to 4 corrected reports for 4 years and, then, declined slowly toward 3 corrected reports during the final 2 years. In this monitor, we see the median and most-room-to improve cohorts converge; however, the best performers remained not only consistently better but also on a different path.
COMMENT
Previous Publications of Q-Tracks Experience
Twelve years ago, a summary6 of the first 2 years' experience of the initial 6 Q-Tracks studies included evaluation of 2 monitors presented in this article: ID band defects (QT1) and specimen rejection (QT2). In their first 2 years of operation, both monitors showed significantly improved performance: the ID band monitor subscribers who participated for all 7 quarters demonstrated a continuous fall in defects; for the specimen rejection monitor, the downward trend in defects was not only significant overall but also was more significant for the subgroup of participants who had subscribed to the monitor in both years.
Also in 2002, a second article15 examined the ID band monitor in greater detail. That article emphasized an initial steep downward slope in ID band defects, from 7.4% to 3.05% of encounters.
In 2004, a third study23 of Q-Tracks data investigated similarities and differences between STAT and routine receipt-to-report time outlier rates. For the STAT receipt-to-report interval, the 2004 study reported that the outlier rate fell from 11.2% to 7.1% during the first 4 years of that monitor's performance.
A 2007 study25 quantified statistically significant declines in critical-value reporting-event defects in 3 categories during the Q-Tracks monitor's first 4 years (2001–2004) of operation. Defective reporting events fell among Q-Tracks subscribers overall. They fell to a lesser degree among events involving outpatients and, to a greater degree, among events involving inpatients.25 The 2007 study25 also stressed that lower rates of defective reporting events were significantly associated with longer participation in the monitor program. Specifically, decrements in defects were proportionate to the number of years (4 versus 2–3 versus 1) that subscribers spent in the program.
For those 4 monitors, whose results QPC members have reported previously; our article confirms and extends the data for longer time spans. Three observations were made in those previous publications: (1) participants in the Q-Tracks monitor programs consistently improved, (2) improvement depended both on the levels of performance at which subscribers started and how long they stayed in the program, and (3) participants could be stratified into performance groups and tracked over time. This article extends those same observations to 3 other monitors; together, this group of 7 monitors covers the entire clinical laboratory testing process from test ordering to result reporting.
The Statistical Approach to Improvement in the Clinical Laboratory Testing Process
The monitors apply the statistical approach that industrial quality control has shown to be of value in many production processes. The clinical laboratory testing process, as understood in this way, consists of the sequence of sets of conditions and causes that produce clinical laboratory test results. The statistical approach assesses each process step with the help of defined numeric indices. Among the 7 monitors, those indices were 5 rates of events or opportunities, one rate of event or surrogate measure of opportunities, and one duration in minutes. Quality in this context consists of achieving the intended characteristics of each step that makes up the process: (1) accurate order entry; (2) correct patient ID; (3) adequate patient specimens; (4) short TATs; (5) consistent TATs; (6) timely, complete critical value reporting events; and (7) accurate result reports in patient records.
Distinction Between Process and Product Quality
Process quality is defined by defect rates and cycle times. That definition of quality is familiar to laboratorians from internal laboratory quality control of test methods. In the 7 Q-Tracks monitors, this sort of control measurement is extended to preanalytic and postanalytic process steps and cycle times, with the objectives of getting the component steps to behave as intended and to shortening the cycle time.
Goodness or badness of a product also defines quality. In medical environments, product-based definitions of quality focus on what patients experience as the results of medical care.46
Connections Between Clinical Laboratory Process and Test Results in Medical Care
The main justification for effort that goes into improving process quality is its connection to product quality. Some plausible connections between the two sorts of quality are those between process defects and harmful results:
Incorrectly entered test orders delay patients' physicians' receipt of test results; less often, they lead to failure to perform appropriately ordered tests or to wrong tests being performed. Among 181 closed cases of ambulatory malpractice suits in which diagnostic errors were judged to have harmed patients, 17% were classified as due to either adequate diagnostic or laboratory tests being ordered but not performed or diagnostic or laboratory tests being performed incorrectly.47 Among 182 adverse event reports in a Veterans Health Administration study48 of sentinel events that were due to patient misidentification, 31 events were due to laboratory tests ordered on the wrong patients.
Wrong patient ID at specimen collection can also lead to test results attributed to the wrong patient. In rare cases, misattribution of results may provoke wrong or unnecessary interventions or cause opportunities for right interventions to be missed. In the Veterans Health Administration study just cited,48 patients with wrong wristbands accounted for 8 of the 182 adverse events (4%) attributed by root-cause analysis to patient misidentification.
Specimens that cannot be processed usually lead to patients being troubled by another specimen collection but, perhaps surprisingly often, recollection may not ensue. Jacobsz and colleagues20 reviewed 481 rejected specimens received during a 2-week period in a large hospital laboratory in South Africa; they found that only 52% of rejected specimens were recollected, but 5% of the recollected specimens yielded critical values. The contrast between those 2 rates raises the possibility that inpatient information may be lost by failing to recollect specimens after specimens are rejected. In any case, rejected specimens may have downstream adverse effects. From their review of patient records, the Cape Town, South Africa, authors20 concluded that 40% of the rejected samples had an effect on patient care.
Rapid availability of troponin results (documented by short STAT troponin order-to-report times) is an expectation in emergency department investigations of patients presenting with chest pain.49 Emergency department physicians also sometimes argue that longer order-to-report troponin TATs slow patients' progress to the next steps in the clinical diagnostic process, including the critical next step of cardiac catheterization.50,51
The more clearly defined defect of delay beyond the anticipated interval in STAT test results' receipt-to-report times stimulates more clinician dissatisfaction.52 That dissatisfaction follows from experience that such longer TAT events delay appropriate medical interventions, or, in other circumstances, force actions that turn out in retrospect to have been premature or misguided. The effect of STAT TAT on emergency department length of stay has emerged as a measure of this friction in the process of emergency department care delivery.53
Enforcement of standards for critical value reporting events as a national patient-safety goal rests on the presumption that delayed or otherwise inadequate critical value reporting events exert malign influences on patient care.26 That presumption is based on experience and individual reports of dramatic cases; however, a systematic study54 of the effect of sodium critical values on length of stay and mortality argues that the presumption is well founded.
Increased numbers of reports that later require correction increase opportunities for physicians to fail to notice corrections in clinical laboratory test results. Those failures, in rare situations, can lead to either lack of intervention or mismanagement of patients based on previously reported, erroneous results.27,55 The clinical effect of report errors is also rare; it is easier to demonstrate for microbiology results that must be altered55 but may also appear after erroneous chemistry results.27,51
Rationale for the Q-Tracks Focus on Process Defects
The product defects described in the 7 scenarios just described rarely produce measurable effects in the complicated processes of medical care. Clinical impact can be dramatic but its relative rarity makes the relation between cause and effect difficult to study systematically. It is easier to investigate and ameliorate the more-common process defects than it is to study and prevent the rare sequence of specific process events contributing to rare bad outcomes. The differences between clinical laboratory testing process defects and defective products of clinical laboratory testing, especially those that harm patients, are analogous to those between latent errors and active errors in James Reason's study56 of human error's contribution to system disasters. Latent errors are defects that lie dormant within a system until they become evident as they combine with other factors to breach a system's defenses. Active errors emerge when combinations of circumstances, some quite extraordinary, expose or amplify the effects of latent defects.56
On this basis, the rationale for the Q-Tracks' focus on process defects is 2-fold. First, monitoring defects in process steps exposes and reduces latent errors before they can combine with other factors to make active errors. Second, monitoring process defects presents opportunities to assess the effects of countermeasures aimed at reducing errors. Those 2 features are at the heart of the Q-Tracks' approach.
Strengths and Limitations of the Q-Tracks' Approach
The strengths and limitations of the Q-Tracks' approach to quality improvement correlate with one another. The first strength of the approach is that the Q-Tracks monitors function in the real world of daily laboratory testing. The cognate limitation is that data collection in real-world conditions inevitably varies from one subscriber site to another. Because of that limitation, results of Q-Tracks monitors can always be dismissed as arising from data acquired without adequate data-collection controls. A second strength of the Q-Tracks format is that the results of the monitors are available to participants on a quarterly basis to influence, in the short term, alterations in the local testing process. Those practical alterations can be made in that short time frame to reduce defects or decrease cycle time in relatively close temporal proximity to the indexing of the defects and delays themselves. The corresponding weakness is that those alterations are necessarily specific to individual subscribers. Such unstandarized counter measures are not themselves open to direct analysis for effectiveness. Because of that limitation, changes in rates of Q-Tracks monitors can always be dismissed as not necessarily connected to the process alterations that they trigger. A third Q-Tracks strength is that the quarterly and yearly Q-Tracks comparative reports facilitate interlaboratory comparison of participant performance. The cognate limitation is that laboratories in the comparative array are not matched in a controlled way, by demographic or other stratifying parameters, as they would be in a prospective study. The objection can always be lodged that such matching is necessary before any assertion can be advanced that performance at one laboratory is better than performance at another laboratory. These, and, no doubt, other uncontrolled variations, always intrude in the real world where Q-Tracks monitors operate.
Other Features of Q-Tracks Monitors
In its focus on changes in defect rates and cycle times, this report does not cover the breadth of information that the 7 Q-Tracks monitors produced. Other valuable features of these 7 Q-Tracks monitors include (1) identifying 6 types of test-entry defects that account for more than 80% of order-entry errors, (2) identifying and assigning relative frequencies to 5 causes that account for essentially all ID band defects, and (3) establishing that 6 defects account for more than 80% of specimen rejections.57
Still other useful features not covered in the present report include (4) identification of practice features associated with shortened STAT order-to-report times,57 (5) best performance in avoiding STAT collection-to-report time outliers,23 (6) policies that reduce defective critical value reporting events,25 and (7) features of information transfer associated with fewer corrected result reports.57
ANSWERS TO FREQUENTLY ASKED QUESTIONS ABOUT Q-TRACKS
The results presented in this article address 7 frequently asked questions about the Q-Tracks as a voluntary, subscription-based, continuous improvement program.
- 1.
Can Q-Tracks monitors continue to deliver improvement over long periods?
The Q-Tracks in this study continued to show improvement for periods of 6 to 13 years (Table 1, column 3).
- 2.
Can the monitors manage many subscribers?
The 7 generic Q-Tracks monitors, covering the clinical laboratory testing process from order to report, maintained subscription bases of around 100 (range, 97–157) participants from year to year (Table 1, column 4).
- 3.
Will subscribers stay with the monitors for the long term?
Between 21% and 35% of all subscribers to a generic Q-Tracks monitor contributed in an average year (Table 1, column 5).
- 4.
How steady is the subscription rate from year to year?
Median participation rates ranged from 87 to 120 subscribers from year to year (Table 2, column 3).
- 5.
What are the measurable benefits of the program in the long run?
The 7 monitors showed significant reductions in defect rates (and shortening of a cycle time) during the entire period of each monitor's operation. For the 7 monitors, participation effects, which quantified quarterly decreases in defect rates, varied from monitor to monitor between 5% and 0.4% (Table 3). The Q-Tracks performance ranges now offer themselves as published benchmarks to which laboratorians adopting this quality-improvement approach can compare their own defect rates and STAT troponin cycle times.
- 6.
How are the similarities and differences among the monitors captured?
A series of graphs, presented in this article, depict the courses of various monitors across the years of the Q-Tracks monitoring operation (Figures 1 though 7). Developed from a finding that 2 factors—initial rates or duration and numbers of quarters of participation in a monitor—influence performance over time, the suites of curves show change both on average, and, for most monitors, for 3 component groups: initial best performers, initial median subscribers, and participants beginning with the most room for improvement.
- 7.
What is the overall benefit from participation in the 7 monitors?
The 7 monitors produced documented process quality improvements that may or may not have decreased harmful effects on patients (that is, improved product quality). Q-Tracks subscribers did receive clear, overall, systematic benefit from participation. That benefit was decreased rework: laboratorians spent less time and effort dealing with defects. Specifically, fewer outpatient order-entry errors led to less test reordering being required. In a similar way, fewer patient misidentifications led to fewer results on wrong patients having to be identified, labeled in error, and placed in apposition in patient records to later corrected results. Along the same lines, fewer rejected specimens meant fewer specimens recollected (and fewer specimens that should have been recollected but were not obtained). There were other systemic benefits as well. Shorter median STAT troponin TATs and fewer STAT TAT outliers both directly addressed major occasions for clinician dissatisfaction with laboratory cycle times.21 Fewer delayed or otherwise inadequate critical value reporting events demonstrated increased compliance with regulatory standards. Finally, fewer reports in patient records needing correction meant, once again, fewer occasions when laboratorians had to enter medical records to identify wrong or incomplete reports, label them, and add new corrected reports.
The changes that the 7 monitors documented at each step of the clinical laboratory testing process are 7 changes that, in themselves, take steps in the right direction. Together, the 7 monitors offer the most comprehensive and effective approach available to improve process quality in clinical laboratory testing.
References
Author notes
Supplemental digital content is available for this article at www.archivesofpathology.org in the June 2015 table of contents.
Competing Interests
The authors have no relevant financial interest in the products or companies described in this article.