The analysis described in part 1 of this two-part series of articles addressed whether a device could pose a risk of serious injury or death to a patient or staff member if the device should fail from a planned maintenance (PM)-preventable cause. In effect, the analysis in part 1 separated the entire universe of medical devices into two categories. One category, which is similar to what the Centers for Medicare & Medicaid Services (CMS) characterizes as “critical equipment,”2 makes up a subinventory of devices that, to be compliant with the intent of the CMS regulation, should continue to be maintained according to manufacturer recommendations. With the exception of four CMS-specified subcategories,2 the second category of devices can be incorporated into what the MPTF calls a phase 1 AEM program.
In part 1, we mentioned a second AEM program inclusion criterion that the MPTF is calling the “likelihood of PM-preventable failures” criterion, which will identify manufacturer-model versions of “potential high PM-risk devices” that have been shown to be unlikely or very unlikely to fail from a PM-preventable cause. Although no specific language in the CMS regulation addresses this possibility, the MPTF believes that a good case can be made for these particular versions of device types, previously identified as potential high PM-risk devices, being made eligible for an AEM program because of a substantial, documented record of acceptable PM-related reliability. In this article, we describe the process for identifying these additional devices by manufacturer-model.
By combining the MPTF's “likelihood of PM-preventable failures” criterion with the “severity of PM-related harm” criterion (Figure 1), the number of devices that need to be maintained according to manufacturer recommendations can be reduced even further than is achieved with just a basic phase 1 AEM program. This more comprehensive analysis divides the various manufacturer-model versions of each of the different device types into seven categories of PM risk, ranging from high PM risk to zero PM risk. Using three levels of likelihood that the device will fail from a PM-preventable cause (quite likely, unlikely, and very unlikely) increases the number of categories of PM-related risk from the five achieved with a phase 1 program to seven (Figure 1, left column).
Measuring PM-Related Reliability
The basic measure of the likelihood that a device will fail from a PM-preventable cause (its PM-related reliability) is the frequency with which PM-preventable device failures are encountered during everyday use. In addition to noting the frequency of PM-preventable total failures of the devices during everyday use, tallying the frequency with which hidden failures are discovered during routine PM inspections is equally important. The failure count should include hidden failures such as when a device does not pass one or more critical, manufacturer-recommended performance or safety checks, as well as when one or more critical, nondurable parts for which the manufacturer recommends restoration are found to be already past their optimum restoration point. The MPTF's recommended codes for these findings are described below.
The PM-related reliability of each make-model version of a particular device type can be expressed as either its PM-related failure rate (i.e., how many PM-related failures are encountered during everyday use over a certain time period, including when PMs are performed) or as the corresponding mean time between failures (MTBF). Using the MTBF metric is preferred because the failure rates usually will be fractional, whereas the corresponding MTBF is a larger, more readily comprehended number.(sidebar on p. 352).
Acceptable Levels of PM-Related Reliability
Although CMS2 appears to accept the premise that device risk is a combination of the worst-case outcome severity of the device failure and the likelihood that such a failure will occur, no debate has appeared in the published literature about measuring PM-related reliability and, more importantly, about what levels should be considered acceptable and unacceptable.
Based on a relatively small amount of initial data collected for just a few manufacturer-models of defibrillators (Table 5.4 on the MPTF website), the MFTF has set an initial placeholder for the threshold for an acceptable level of PM-related reliability for potential PM-critical devices, such as defibrillators, at not more than one failure every 75 years. In other words, if a particular manufacturer-model defibrillator demonstrates that it develops a PM-preventable failure no more frquently than once every 75 years, it should be considered sufficiently reliable to be included in an AEM program. Defibrillators are in the category of devices that potentially have the most serious (level of severity [LOS] 3) adverse outcomes when they fail. The MPTF also believes that it is reasonable to set the thresholds for devices with less serious levels of adverse outcome severity at somewhat lower levels. Accordingly, we have set the MTBF threshold placeholder for devices with less serious (LOS 2) levels of outcome severity at not more than one failure every 50 years and, for devices with even less serious (LOS 1) levels of outcome severity, at not more than one failure every 25 years (Table 1).
To define the seven levels of PM risk shown in Figure 1, the MPTF chose to use three ranges for the likelihood (probability) of a device failing from a PM-preventable cause (likely, unlikely, and very unlikely). We also defined tentative ranges of MTBF values for each of those three levels (Table 1). Implicit in these threshold values is the idea that the transition point between “quite likely” and “unlikely” for a critical (LOS 3) device is a value beyond which the “critical” device should be considered sufficiently reliable that it can be included in an AEM program. As noted below, the MPTF is planning to use actual maintenance data as a more rational basis for determining what the threshold levels should be.
Mean time between failures (MTBF) is the inverse of the failure rate. For example, a device that has failed twice in nine years is demonstrating a failure rate of 0.22 failures per year and an MTBF of 4.5 years. Average failure rates also can be derived by dividing the total number of device failures occurring during the observation period by the number of device-years making up the total device experience. For example, if a batch of 10 devices experiences two failures during nine years, then the failure rate is 0.022 failures per year and the MTBF is 45 years. The larger the experience base (in device-years), the greater the number of devices in the sample, and the longer the observation period, the closer the observed failure rate will be to the device's true failure rate.
What level of PM-related reliability is achieved by using the manufacturer's recommendations? It seems reasonable to presume that a device maintained according to the manufacturer's recommendations will demonstrate a level of PM-related reliability that the manufacturer considers to be safe and acceptable. Further, because the Food and Drug Administration (FDA) has approved the device as safe and effective, it also seems reasonable to assert that the FDA has tacitly approved this same level. Therefore, the MPTF is planning to explore what actual levels are found for various devices maintained according to their manufacturer-recommended procedures.
We expect to find that the actual levels will vary over a range. If the range is broad, we propose to adopt as the standard either the average value or an average that is weighted according to the relative sizes of the samples.
Communitywide Database Needed
Collecting sufficient data to provide a statistically meaningful body of evidence to support the use of particular alternate maintenance strategies may prove difficult for many healthcare facilities, for the following reasons:
Because they are designed and constructed by different entities, different manufacturer-model versions of devices with the most severe (LOS 3) outcomes (e.g., defibrillators, critical care ventilators) likely will display different levels of reliability. This means that the maintenance findings for each manufacturermodel version of these device types will need to be analyzed separately.
Devices that have the most severe (LOS 3) outcomes are presumably designed to be very reliable; therefore, they likely will demonstrate a correspondingly low PM-related failure rate. This anticipated high reliability will reduce the number of failures that an individual facility will be able to document over a reasonable time period.
Many healthcare facilities will have only a small number of different manufacturer-model versions of the device types that have the most severe (LOS 3) outcomes. To illustrate this quandary, suppose that a facility has three similar (same manufacturer, same model) heart-lung units and only three years of maintenance history for each unit. This amounts to an experience base of only nine device-years. If the actual PM-related MTBF of the units is greater than nine years, then the facility may not have experienced even one PM-preventable failure during the three-year observation period. (The MPTF expects to find that the PM-related MTBF values for typical high-reliability devices will be at least 75 years.)
In this case, the facility would have to report its finding with respect to the devices' indicated failure rate (zero failures experienced during the nine device-years of exposure) as “undetermined.” Even if the devices experienced one or more failures during this relatively short exposure, the indicated MTBF (reported as “up to nine years”) will appear to be unacceptably short for a device that is potentially high PM risk (PM priority 1). With an indicated MTBF this low, it would be prudent for the facility to look at the PM-related reliability for this device type in the database on the MPTF website to determine whether its experience is typical. For more on this possible situation, see Ridgway and Fennigkoh3 and Ridgway and Lipschultz.4
The bottom line is that many individual facilities will have difficulty generating enough failure data to get a good indication of each device's true PM-related failure rate and, therefore, the device's true level of PM-related safety. To get accurate measures of the true PM-related failure rate of PM priority 1 devices, creating a pool of maintenance statistics containing a minimum number of device-years of experience for each manufacturer-model of each device type will be necessary.
The MPTF has selected 50 device-years as a reasonable benchmark for the minimum amount of maintenance-related failure data needed in the experience base to properly characterize the PM-related reliability of each particular device. Of course, more data are always better (Table 2).
Aggregating the Data
We are appealing to the healthcare technology management (HTM) community to provide the MPTF with summaries of findings from ongoing maintenance of devices that have been classified as potential PM priority 1. To allow the findings to be properly aggregated, the maintenance, testing, and reporting should be performed in accordance with the following standardization guidelines:
For all potential PM priority 1 device types, the maintenance entity must use a manufacturer-recommended PM procedure or one that includes, at minimum, all of the device restoration and safety verification tasks listed in the manufacturer's procedure.
Although regulatory constraints exist, for the purpose of this project, it is not necessary for the maintenance entity to perform the PM tasks at the same interval as that recommended by the manufacturer. In the absence of regulatory mandates, diversity is welcome because one of the goals of the project is to compare levels of PM-related device reliability achieved at different maintenance intervals.
The maintenance entity must use some form of repair call coding similar to that described in Ridgway et al.5 and in HTM ComDoc 1 on the MPTF's website. This will allow a separate count of the failures that are judged to be PM preventable.
The maintenance entity also must use some form of coding for the PM findings similar to that described below. This will allow a separate count of the number of times that a hidden failure was detected (PM code F), as well as the number of times that a nondurable part was found to have deteriorated beyond the optimum (PM code 9).
Preferred System for Coding
Equipment systems fail for a variety of reasons, and recognizing that only a few of these failures can be prevented by periodic maintenance is important. Ridgway et al.5 point out that equipment failures can be classified into three general types depending on which part of the equipment system has failed. For a more detailed description of this repair call coding system, see section 1.4 (“What are the causes of medical device failures?”) in HTM ComDoc 1 on the MPTF website.
For PM findings, the MPTF recommends the following codes:
PM code A (passed). Safety verification testing to detect hidden failures found the device to be in complete compliance with the relevant specifications, and any other functions tested were within expectations.
PM code B (minor out-of-spec [OOS] condition[s] found). One or more of the tests revealed a slightly OOS condition. The purpose of this rating is to create a watch list to monitor for future adverse trends (particularly performance or safety failures), even though the discrepancy is not considered to be significant at present. A PM code B finding is considered a passing grade.
PM code F (failed). One or more of the tests found that one or more of the device's performance or safety features were considerably OOS. This is a failing grade, and if this is a PM priority 1 device, it should be removed from service immediately.
The service person also should indicate (by circling one of four numbers [1, 5, 9, or 0]) whether the physical condition of any parts of the device that were restored (as called for in the procedure) were:
PM code 1 (still good/better than expected). Restored parts showed little or no deterioration.
PM code 5 (about as expected). Minor deterioration was observed, but it probably was not affecting the device's function adversely.
PM code 9 (already worn out/serious physical deterioration). One or more of the restored parts were found to be considerably worse than expected. They were worn out and probably having an adverse effect on the device's function.
PM code 0 (no physical restoration required). The device has no parts requiring physical restoration.
Systematically documenting these findings each time a PM is performed, and then aggregating the data, will make it possible to obtain two important pieces of information:
An indication of how well the PM interval matches the optimum. The optimum PM interval is when the parts being restored have deteriorated but not to the point where the deterioration has started to affect the functioning of the device. The indicators for how close the interval is to this optimum are as follows. A preponderance of:
PM code 1 findings (still very good) is an indicator that the interval is too short.
PM code 5 findings (about as expected) is an indicator that the interval is about right.
PM code 9 findings (already worn out) is an indicator that the interval is too long.
A numerical MTBF indicating the device's level of PM-related reliability. This indicator is the lesser of the following MTBF values (representing the lower level of PM-related reliability):
The MTBF based on the total of any overt failures caused by inadequate device restoration (from the repair cause coding) and any PM code 9 findings (which are immediate precursors of the overt failures caused by inadequate restoration).
The MTBF based on the total of any hidden performance and safety degradations detected by the safety verification tasks (PM code F findings).
Compiling Data into Organized Batches
To streamline the reporting, the MPTF will be asking certain organizations to volunteer to act as data-aggregating intermediaries. Organizations that are candidates for this data aggregator role include independent service organizations, national or regional hospital systems with in-house maintenance services, and computerized maintenance management system companies. For additional information, see section 7.5 (“Guidelines for compiling the data into organized batches”) in HTM ComDoc 7 on the MPTF website.
Key Database Tables
The summary proof tables are the most important part of the community database on the MPTF website. These are numbered as subsidiary tables grouped under Table 5 on the website. Each table catalogs PM-related failure rates calculated from aggregated maintenance data submitted for each of the potential PM priority 1 device types.
The tables display the accumulated data for each device and the MTBF for the PM-related failure rate. These data were derived by totaling the number of reported overt failures that were judged to be PM preventable and the number of PM code 9 failures (which are immediate precursors of overt failures) found during the reporting period.
Generally speaking, all devices will exhibit different levels of PM-related reliability and an associated level of PM-related risk when maintained at different intervals. Devices that exhibit an unacceptably high risk of an adverse outcome when they fail from a PM-preventable failure usually will exhibit a lower, more acceptable level of risk when the PM interval is reduced.
After this information becomes available on the website, guessing at what would be a “safe” PM interval for any particular device will no longer be necessary. The answer will be apparent from the numbers in the summary proof tables. In time, the results will show whether the manufacturer's recommendations result in a fairly consistent level of PM-related reliability or if some appear to require adjustment.6
Improving the Efficiency of an Equipment Maintenance Program
PM priority 1 devices with parts that the manufacturer indicates need periodic restoration. These are potentially hazardous devices with either overt or hidden PM-preventable failures that could cause a life-threatening injury and that are demonstrating PM-related failure rates greater than the currently acceptable level (not more than one failure every 75 years). For these devices, it would be prudent to continue to follow the manufacturer-recommended PM procedure (for both the interval and the scope of the tasks) and to routinely monitor the levels of patient safety being achieved (as described in part 11). This should be continued until acceptable evidence exists in the national database that some other procedure with more efficient tasks and/or a longer interval is found to demonstrate the same or better level of PM-related reliability or a comparable level of patient safety.
PM priority 1 devices with no parts the manufacturer says need periodic restoration. These are potentially hazardous devices with hidden PM-detectable failures capable of causing a life-threatening injury that are demonstrating PM-related failure rates greater than the currently acceptable level (not more than one failure every 75 years). For these devices, for which the only “maintenance” that the manufacturer recommends is periodic safety verification, it would be prudent to continue to follow the manufacturer-recommended safety verification testing schedule and routinely monitor the levels of patient safety being achieved (as described in part 11) until evidence exists that testing at a longer interval results in the same or better level of PM-related reliability or a comparable level of patient safety.
When testing for possible hidden failures with potential high-severity outcomes, there is no optimum interval—shorter is always better. However, it has been shown7 that for safety verification–related (hidden) failures with MTBF values greater than about 50 years, the increase in the time that the patient would be exposed to potentially hazardous hidden failures if the testing interval was increased from six months to as long as five years is very small.
Devices that exhibit an unacceptably high risk of an adverse outcome when they fail from a PM-preventable failure usually will exhibit a lower, more acceptable level of risk when the PM interval is reduced.
All PM priority 2–5 devices. These lower PM-risk devices qualify for inclusion in an AEM program either because of the lower level of severity of the outcomes of potential failures or because they have demonstrated an acceptable level of PM-related reliability. Therefore, they can be maintained using a maintenance procedure or strategy other than that recommended by the manufacturer. They can be transitioned immediately to less stringent PM strategies, such as the cost-efficient light maintenance (run-to-failure) strategy mentioned in Appendix A of the CMS memo.2 At the very least, the manufacturer-recommended procedures can be modified (e.g., by omitting electrical safety checks that the facility has found to be nonproductive, by extending the testing interval to make it coincide with a more convenient or more efficient routine).
The logical rule here is to explore the national database for evidence of more efficient maintenance procedures. It would be prudent to monitor the levels of patient safety (as described in part 11) being achieved by the current procedure (or any of the more efficient procedures, if chosen) for devices categorized as PM priority 2 (moderate PM-risk) devices. Monitoring those in the lower risk categories is much less important but can be undertaken if the facility chooses.
For all negligible or zero PM-risk devices. Should these devices fail, there is a negligible or zero additional risk to patient safety. Therefore, in the absence of other regulatory mandates, unless there is a convincing case that periodic PM can be justified through lower maintenance costs, these devices are excellent candidates for the very efficient light maintenance (run-to-failure) strategy. By adopting this run-to-failure maintenance strategy in the early 1960s, the civil aviation industry was able to reduce its maintenance costs by 50% while, unexpectedly, also improving the reliability and safety statistics for civilian aircraft by a factor of 200.7
This series of articles has addressed longstanding misunderstandings about how much regular PM contributes to keeping modern medical equipment safe.
Final Cautionary Note
Patient and staff safety has long been the primary justification in medical equipment maintenance programs for performing routine PM on the hospital's frontline patient care equipment. Regular PM also has become a deeply rooted symbol of institutional caution and caring. After all, if the equipment doesn't look well cared for, what does that imply about how well the organization takes care of its patients?
This series of articles has addressed long-standing misunderstandings about how much regular PM contributes to keeping modern medical equipment safe. If this analysis is accepted as a way to support a reduction in PM, we urge that careful thought be given to replacing those services with more efficient or less technically intensive alternative routines (e.g., department rounds) to ensure that clinical staff remain confident in the equipment and that it still looks well cared for and ready to do its job.
More detailed discussions of this and other topics mentioned in this series of articles can be found in explanatory documents on the MPTF website (www.HTMCommunitydB.org).
Malcolm Ridgway, PhD, CCE, FAIMBE, is a retired clinical engineer who has been a leader in several initiatives aimed at elevating the healthcare technology management field and advancing the works of its professionals. Email: email@example.com
Matthew F. Baretich, PE, PhD, is president of Baretich Engineering, Inc., based in Fort Collins, CO. Email: firstname.lastname@example.org
Matthew Clark, MBA, CHTM, is a clinical engineer at Advocate Health in Downers Grove, IL. Email: email@example.com
Stephen Grimes, FACCE, FHIMSS, FAIMBE, is managing partner at Strategic Health Care Technology Associates, LLC in Swampscott, MA, and a member of the BI&T Editorial Board. Email: firstname.lastname@example.org
Bhaskar Iduri, MS, CCE, CHTM, is director of clinical engineering and quality assurance at Renovo Solutions, LLC in Irvine, CA. Email: email@example.com
Michael W. Lane, CHTM, is director of Technical Services Partnership at the University of Vermont in Burlington, VT. Email: firstname.lastname@example.org
Alan Lipschultz, CCE, PE, CSP, CPPS, is president of HealthCare Technology Consulting, LLC in North Bethesda, MD, and vice president of the American College of Clinical Engineering. Email: email@example.com
Nancy Lum, MHSc, is a clinical engineer project manager at Massachusetts General Hospital in Boston. Email: firstname.lastname@example.org
Editor's note: This article is part 2 in a two-part series. Part 11 described how to create a basic alternate equipment management (AEM) program using a one-criterion risk analysis developed by the AAMI-supported Maintenance Practices Task Force (MPTF).