Public and internal databases were examined to evaluate software-related recalls in the medical device industry sector. In the analysis of recalls reported from 2005 through 2011, 19.4% of medical device recalls are related to software. This paper includes analysis results, challenges faced in determining the causes, and examples and trends in software-related recalls. This information can be useful in enhancing our understanding of why medical devices fail, and it can help to improve medical device safety, and patient and public health.
More and more medical device systems rely on software to perform their intended functions. Software also has become an integral part of the design and manufacturing processes. Recalls related to software have increased from approximately 6% in the late 1980s1 to nearly 20% in recent years, as found in this analysis. Various definitions and methodologies have been used to derive these statistics over time. Results also are influenced by the increasing use and complexity of software, and the relative change in the number of devices containing software over this time period.
Recalls related to software have increased from approximately 6% in the late 1980s to nearly 20% in recent years, as found in this analysis.
Analyzing and leveraging recall information is a strategic priority for the Center for Devices and Radiological Health (CDRH) in order to identify and address risks inherent in device designs as a means to protect public health. Recall data from the U.S. Food and Drug Administration (FDA) can provide a wealth of information on trends, risks associated with unsafe devices, and adequacy of correction and removal actions if it is assessed in a systemic manner.2 Understanding how software-related problems can result in recalls provides medical device manufacturers with the opportunity to incorporate this information into their design life cycle and potentially produce safer products.
Researchers interested in medical device quality have used the FDA's publicly available data to explore product quality problems and recalls related to software. Two commonly used sources include the Medical & Radiation Emitting Device Recalls database (known as the Recall Enterprise System or RES)3 and the weekly Enforcement Reports.4 Both provide information about each recall from the same source documentation (such as the manufacturer, product, and reason for recall), although the information presented by each varies somewhat based on the communication purpose. The distinction is illustrated in the Methodology section below. The methodology employed by researchers may vary based on the available data and the specific objectives of each study. All involve a degree of reasonable inference concerning the nature of the software issues underlying each recall.
In 1992, the FDA completed an evaluation of software-related recalls for fiscal years 1983–19911 and found that 5.9% (165 of 2,792) of medical device recalls were attributed to software quality problems. A similar analysis for fiscal years 1992–1998 showed an increase to 7.6% (240 of 3,168).5 Results published in General Principles for Software Validation; Final Guidance for Industry and FDA Staff were nearly identical for the same time period (242 out of 3,140, or 7.7%).6 Using data provided by the FDA, Wallace and Kuhn reported in 2001 that 5.9% (165) of recalls received from 1983–1991 were related to computer software, and that the values for 1994, 1995, and 1996 were 11%, 10%, and 9% respectively.7 Bliznakov et al. reported in 2006 that 11.3% of recalls (425 of 3,771) from 1999 though 2005 were due to software, using information available from weekly FDA Enforcement Reports.8 In 2012, Kramer et al. reported that 15.1% of quality problems (279 of 1,845) listed in the Enforcement Reports between January 2009 and May 2011 were due to software problems.9 Keyword searches have also been used with varying degrees of success. Yang and Hyman in 2010 searched the 2009 recall database for the term “software” in the “Reason for Recall” field and identified 71 out of 2,355 recall records (3.0%).10
The term “recall” is not used consistently in the literature. Terms used to classify the role that software may play in a medical device failure often are undefined and used inconsistently. They include software fault, software failure, software error, software problems, and software related. For this analysis, recalls have been divided into two mutually exclusive groups: software-related recalls and recalls not related to software. Within the group of software-related recalls is a subset of recalls which are due to software errors. They are described more fully below.
The term “recall” is not used consistently in the literature. Terms used to classify the role that software may play in a medical device failure are often undefined and used inconsistently.
Recall: A “recall” is defined in 21 CFR 7.3(g) as “a firm's removal or correction of a marketed product that the FDA considers to be in violation of the laws it administers and against which the agency would initiate legal action, e.g., seizure.” [italics added]11 The FDA's RES database provides information on these individual marketed product recalls using a recall number or “Z number” for each affected product. However, it should be noted that more than one marketed product can be affected by a single quality problem or recall event. For example, a single quality problem in the packaging design for guide catheters affected 147 different types of catheter products (e.g., different diameter catheters). Information about a single quality problem or recall event is provided in the Enforcement Reports using a recall ID number, which includes references to all marketed products affected by that recall event. Counting individual product recalls rather than recall events can give an inflated view of the magnitude of some types of quality problems. For the purposes of this analysis, the term recall refers to a single quality problem or recall event and not to individual product recalls.
Recall due to a Software Error: The term software error is used to describe recalls caused by defects and faults introduced at any point in the software design and development process, such as those introduced during development of the software requirements, architecture and/or specifications, and during implementation and coding.
Software-Related Recall: The term software-related recall is not defined in FDA regulations. For this analysis, a recall is classified as software related anytime that software is somehow involved. At a minimum, these include cases in which a software error was the cause of the reported problem. Software-related recalls include a spectrum of reported problems such as when software is a contributing factor in a device failure, when it may be part of an inadequately-defined system level risk control measure, and when new software is released as part of a corrective or preventive action for a non-software error. Release of a software update does not necessarily imply that the recall is due to a software error.
The difference between a recall due to a software error and a software-related recall is cause. If the cause is known to be or to include a software error, the recall is considered to be both software related and due to a software error. If the cause is known and not due to software but software is somehow involved (as described above), the recall is considered software related. If the cause is not known but software is somehow related, the recall is also considered software related. Making this distinction allows us to capture situations and process phases in which defects are introduced and which might not be obvious, and to identify the role that software can play in recalls attributed to other causes. Several illustrative examples are included in the Examples section.
Software-related recalls (including those due to software errors) may involve any of several software sources, including software in the medical device or software as a medical device; off-the-shelf (OTS) software or supporting software used in or by the device; product development tools; software used and files created in electronic design automation; software used in verification and validation testing, manufacturing, installation, and upgrading of the medical device or software in the medical device; software in a device component; and software and devices that interact with the medical device in the use environment.
To perform this analysis, each recall in the database was assessed manually to determine if the recall was related to software, and, if possible, which were due to software errors. In many cases, a determination was obvious from the publicly available information. For example, a firm reported that its transport monitor stopped communicating with its patient data module after 414 days of continuous run time and displayed a cryptic message; after an additional 19 days of service, parameter information (e.g., waveforms and alarms) was no longer displayed. This happened because an internal timer that had been used to remind users to perform preventive maintenance had rolled over. This recall is both software related and due to a software error.
To perform this analysis, each recall in the database was assessed manually to determine if the recall was related to software, and, if possible, which were due to software errors.
Often, use source documentation not available to the public was necessary to make a determination. This includes information that may be received during the course of voluntary recalls, mandatory recalls, corrections and removals of medical devices,12 adverse event reports, and any other information requested by or provided to the FDA. For example, publicly available information for one recall indicated that a component used for total hip arthroplasty was recalled because the product packaging listed an incorrect sterilization method. This device (the component and labeling) contains no software, and the reported problem might not appear to be software related. However, the firm reported that “a software error occurred during the label printing operation that led to the labels for this lot being incorrect. … [T]he program used to identify data characteristics did not find the data correctly and consequently default data was used. … The problem was not detected during the label printing and verification operation… a software patch correcting this issue and additionally to detect and prevent labels with missing or defaulted data characteristics from being printed was put in to place.”* Analysis based only on publicly available information may underestimate identification or lead to incorrect identification of recalls that are associated with software or actually due to software errors.
A research database was created by querying the public interface to the RES database for all product recalls posted for each month between January 2005 and December 2011. The following fields were identified: recall number (Z number), recall class (I, II, III), trade name/product, recalling manufacturer, date posted and the public reason for recall. Because the recall numbers increase sequentially for each product recall, observed gaps suggested the possibility of missing or deleted records. Two additional queries were performed on internal FDA databases to extract more comprehensive recall information and identify the source of any record gaps. The following fields were identified: classification date, recall event ID, recall number, center classification, root cause, root cause narrative, public reason for recall, complete reason for recall, and recall strategy.
The results from these queries were merged manually using the recall number and each record assessed for consistency and completeness. Twenty-one corrections were made to erroneous recall numbers, device classifications, and classification dates (possibly due to ongoing recall activities). Twelve recalls were removed for missing or incomplete dates, cancellations or reclassifications (no longer Class I, II, or III recalls) during the construction of the research database. Thirty-two recalls were unique to either the internal or external database query results and were retained for analysis. The resulting database contains 5,792 Class I, Class II, and Class III medical device recalls classified by the center between 2005 through 2011.
Assessment of the research database included a combination of keyword searches, manual review and assessment of individual reports, automated recall identification for inclusion criteria, comparisons between databases, and establishment of guidelines for consistent evaluation such that the criteria for “software related” had been met.
Included in this analysis were 5,792 medical device recalls that were classified between 2005 and 2011 and that met the criteria described in the Methodology section. Over this seven-year period, 1,122 (19.4%) of the recalls were determined to be software related. The yearly breakdown is shown in Table 1.
Recalls related to software were also broken down into seven device categories, which capture the 19 device classification panels as defined in 21 CFR 862–89213 and used elsewhere.1,5,7,8 These are grouped as follows:
Anesthesiology: devices providing anesthesia functionality
Cardiovascular: devices such as pacemakers and blood pressure monitors
Imaging/Radiology: devices such as magnetic resonance imaging (MRI) systems and ultrasound devices; treatment planning software
In Vitro Diagnostics: devices for chemistry, hematology, immunology, microbiology, pathology, and toxicology
General Hospital: general purpose devices
Surgery: devices such as electrosurgical, cutting, and coagulation devices
Other: devices used for dental, ear nose and throat, gastroenterology and urology, neurology, obstetrics and gynecology, ophthalmic, orthopedic, and physical medicine
Figure 1 shows the breakdown of software-related recalls by device category. Nearly half (48.6%) were reported for imaging and radiological devices, followed by in vitro diagnostics (19.3%), cardiovascular (12.7%), other (7.8%), general hospital (7.5%), anesthesiology (2.7%), and surgery (1.4%).
The resulting database contains 5,792 Class I, Class II, and Class III medical device recalls classified by the center between 2005 through 2011.
The following examples were extracted from the FDA's recall records to illustrate a range of problem types and phases of the product development life cycle. All but the last are due to software errors, although the actual defect or fault for each may not have been determined or reported.
Linear accelerator imaging software was recalled due to inadequate software requirements. The firm reported, “Defects in the auto launch functionality make it possible for a mismatch of patient data … resulting in a missed misdiagnosis or inappropriate treatment decision.” Additional records stated, “When users change patient context while a study is being automatically launched, the system could display images or reporting options for the incorrect patient. … [t]he root cause of the code defect was determined to be incomplete software requirements documentation which failed to address possible exception pathways. … [It is a] defect that is highly dependent upon individual user workflow habits.”
Radiotherapy treatment planning software used with linear accelerators was recalled due to inadequate validation of off-the-shelf software. When an optional module used for radiation therapy in cancer patients is enabled, “a software bug will allow the [Creatine Clearance dose calculation] algorithm to assume a patient's age to be in years when it is entered in months … which may result in patients less than 2 years old to be seriously overdosed with radiation.”
Vision-correction software for an excimer laser system was recalled due to inadequate device software coding and inadequate installation software for use in upgrades. “Two software-caused errors; in combination … will result in an erroneous treatment calculation (overcorrection) in patients. (1) Installation of software for a certain brand of computer … can cause a software registry setting for an algorithm that identifies image reflection to be erroneously set in the “off” position upon installation …, and (2) Software error in the [product] application produces erroneous treatment calculations.” Additional records stated that it “allows invalid data to be used for calculating treatment when that data should be ignored.”
A cardiac and vascular fluoroscopic X-ray imaging system was recalled for both a hardware defect and an inadequate software mitigation. “During an acquisition …an image [can become] “frozen” on the [product] live monitor screen. In such cases, the system continued to send out X-ray without reporting an error message…[and] an operator could be misled to believe that the “frozen” image is instead a live dynamic image.” Additional records stated, “a defect has been identified on the backplane of the bootable chassis … several resistors need to be changed …there is a watchdog mechanism that will lead to [image] blacken[ing] only in case of image frozen for the first frame. The mechanism does not work during acquisition.” Although the firm reported that the defect lay in hardware, the control measure was implemented incorrectly in software, and it did not place the system in a safe state.
A bedside patient monitor was recalled because “[u]se of these devices at their maximum volume setting may result in the premature failure of the internal speaker,” which may contribute to a “delay in responding to a SpO2 or arrhythmia alarm.” Additional records stated, “[these monitors] allow the device speaker volume to be adjusted by the user. ... If the speaker is routinely operated at its maximum volume setting of 10, premature failure of the speaker may occur. The default alarm volume setting is 6. [The firm] will distribute a software upgrade that lowers the maximum volume.” Software was most likely not a cause or contributing factor, as the firm reported that the software upgrade was released to mitigate the unexpected hardware vulnerability.
Between 1983 and 2011, recalls related to software have been trending upwards: 5.9% (fiscal 1983–1991),1 7.6% (fiscal 1992–1998),5 11.3% (1999–2005),8 and 19.4% (2005–2011) (this study). Over this same time period, device categories presented in Table 2 show a downward trend for cardiology and anesthesiology and an upward trend for radiology.
Directly comparing results from different analyses, such as those presented above, should be done with care because not all terms or methodologies are adequately defined or described. This fact makes understanding the scope of any analysis and interpreting and comparing results difficult. Accessibility plays a significant role; not all recall information is available publicly, and what is available may not contain enough information to make a definitive assessment regarding causes and contributing factors.† The public RES interface provides access to recall records, but the basic search capabilities are not robust enough to support the complex automated queries over multiple years of data needed for this type of forensic research. On a positive note, a recent upgrade of the FDA Enforcement Report interface has improved significantly the ability to assess recall event information. Keyword searches provide little context, and the use of the term “software” in recall records cannot be used as an indication that a recall was caused by a software error. Another challenge is deciding which interface is most appropriate for the proposed analysis. FDA Enforcement Reports list recall events and quality problems, whereas the RES interface lists every product affected by a particular recall event.
Between 1983 and 2011, recalls related to software have been trending upwards: 5.9% (fiscal 1983–1991), 7.6% (fiscal 1992–1998), 11.3% (1999–2005), and 19.4% (2005–2011) (this study).
No fields in the recall database could be used to consistently identify software's possible involvement in the reported problems. The most enlightening information (when available) came from the firm's own root cause or hazard analyses, communications to customers, work instructions, and unedited logged communications between the firm and the customer during active investigation of a reported problem. This “firsthand” information often provides critical details that may be lost in summarizations. Without consistent and complete causal information, it is difficult to recommend where to focus analysis efforts or how to automate the type of manually intensive and unsustainable forensic analysis performed here. At times, firms may identify the product development phase or process at which point the defect was introduced or detected, such as “missing requirement” or “inadequate verification and validation.” While this method is useful to explore process failures, it provides little insight into the defect itself. The FDA recently began efforts to improve the recall process, including systematically assessing recall information to identify trends and underlying causes of recalls.
No fields in the recall database could be used to consistently identify software's possible involvement in the reported problems.
To begin addressing these challenges, guidelines were provided to categorize recalls: those related to software, recalls not related to software, and the subset of software-related recalls due to software errors. Performing this first-level stratification can guide further decomposition of problems into common defect classes and by processes that have allowed these defects to be introduced into the device. Identification of the exact root cause is not always required to perform this activity. For example, software-related recalls that are not caused by software errors are often associated with inadequate risk-management activities and with corrective actions for previously unidentified non-software-related vulnerabilities. Problems with mixed patient data often are associated with database access errors and unexpected workflow situations. Problems attributed to use or user error can occur when design requirements are not appropriate to address the intended use of the device.
While each of the 5,792 recall events was assessed to determine if it was software-related, forensic analysis of each recall record to determine all causes and contributing factors has not been completed for the 1,122 software-related recalls due to the manually intensive nature of the analysis. Still, it is possible to discern trends in the types of software errors that have occurred (common design errors and coding errors), and trends in the types of problems observed (common problem types).
Common design errors include architectural and real-time characteristics such as timing, context switching, interfaces, and messaging and state transitions.
Common design errors and coding errors
Common coding errors include inadequate bounds checking, variable usage/initialization, logic, variable conversion/units, and error detection and recovery. Common design errors include architectural and real-time characteristics such as timing, context switching, interfaces, and messaging and state transitions. The following are examples the from FDA's recall records that illustrate a range of errors from simple coding mistakes to defects in architectural design.
An image storage device was recalled because timestamps after Jan. 1, 2010 restarted at Oct. 1, 2001.
A printer-driver module in slide maker software did not recognize lower case characters and symbols, and printed barcodes incorrectly.
A product update CD-ROM could not be loaded because a search function used in generating the new CD-ROM Configuration Master File failed.
Imaging system software was recalled because left and right annotations were reversed, and incorrect measurement values displayed in exam review may not match live measurement values.
A database conversion utility for a radiation treatment planning system was recalled because it corrupted existing patient records by setting any non-zero gantry start values to zero instead of the originally prescribed value.
A patient monitor was recalled because the host sent hemoglobin values in the wrong units, preventing user-updated values from being used in calculations.
An error in software logic that controlled common heart rate alarm limits caused limits on a patient monitor to be changed or reset to factory defaults.
A vital signs monitor was recalled after ceasing to function because an upgraded firmware-based component sent unsigned data rather than signed data in brown-out conditions.
Patient data was lost because a buffer overflow occurred after multiple ultrasound images were deleted from an image repository and archiving system.
An unexpected shutdown of a medical device was caused because an interrupt service routine cleared all pending interrupts rather than just the one that had been serviced.
A radiographic tilting table was recalled because software controlling two solid state relays (SSRs) used for driving and braking tabletop movement did not provide a time delay when switching SSRs; this allowed an overcurrent situation that could damage the SSRs and disable or significantly reduce the speed of the tabletop movement.
A computed tomography (CT) scanner was recalled because incorrect real time operating system task priorities allowed image reconstruction of large volumes of data on a secondary console to delay the start of a helical scan.
An implantable cardioverter defibrillator was recalled after it failed to charge high voltage capacitors. After an unexpected series of events, an interrupt service routine was disabled, which prevented a flag that enables battery measurements from being reset.
Common problem types
Interfacing between devices and information management systems (incorrect/mixed/missing patient data, database access problems, data refresh issues, device/module interfaces and communications)
Use and environment (not considering workflow or all states of operation)
Maintaining and upgrading devices (incorrect software, configuration data or calibration data loaded/installed, inadequate software installation or software upgrade procedures, and loaded/installed software does not run)
Algorithms and calculations (incorrect or inadequate dose calculators, and battery charging algorithms)
Imaging processing (including image merging, annotation, and scaling)
Error detection and recovery (inadequate or missing error detection and recovery methods, or mitigations for non-software vulnerabilities)
Results are available for nearly 30 years of recall data. It is difficult to quantify how environmental factors may have influenced the causes, occurrences, and types of recalls over this time frame. Technology has changed radically, including the increasing use of product design software tools, increased complexity and interconnectedness of medical devices, and the concomitant challenges in cybersecurity. The role that improved tools for software development and testing, industry standards, and FDA guidance documents play in reducing the incidence of software-related recalls has not been quantified. However, from the analysis and the examples presented here, it is clear that opportunities exist for improvement in the quality of medical devices containing software. Straightforward design and coding defects continue to be observed as causal factors in recalls. These are defects that often could be detected using basic tools known to increase software quality.
A specific goal at the FDA is to identify systemic trends more proactively, reinforce software quality practices and detect and communicate factors that contribute to recalls. The results presented here required significant analysis, and additional manual effort would be required to complete a quantitative assessment of all failure modes. Higher quality cause-related information would help regulators target areas of poor software quality and establish better quality metrics. Development of a more automated process (and proper data for metric-based analysis) would also support understanding the role and effect of process failures and inadequacies in the system design and risk management processes. This is an area that deserves more attention, but may be overlooked because it requires a systems-level view rather than an isolated hardware or software view. Better information-gathering and analysis methods to assess recall causes and trends will help meet these goals.
Straightforward design and coding defects continue to be observed as causal factors in recalls. These are defects that often could be detected using basic tools known to increase software quality.
*Here and later in the paper, the author quotes from recall documents filed with the FDA, but has not provided references, so as to avoid identifying companies by name. The recalls are examples of problems that are widespread, and the focus of this paper is not intended to be on specific manufacturers.
†The FDA is required under the Freedom of Information Act (5 U.S.C. § 552) to delete, prior to public disclosure, portions of records containing trade secrets, and confidential, commercial, or financial information; and is permitted by FOIA to withhold any personal, medical, and similar files whose release would constitute a clearly unwarranted invasion of personal privacy.
About the Author
Lisa Simone, PhD, is a biomedical and software engineer with the Center for Devices and Radiological Health at the U.S. Food and Drug Administration. E-mail: firstname.lastname@example.org