What is Fuzz Testing?
“The opportunity to secure ourselves against defeat lies in our own hands.”
—Sun Tzu, The Art of War
Fuzz testing is a type of negative software testing. In contrast to positive software testing, during which one tests whether the software is behaving as it should, negative testing seeks to check whether the software doesn't behave the way it's not supposed to. Fuzz testing typically applies test vectors that are almost correct, such as an invalid packet-length field in an otherwise perfectly-formed IP packet. This method could be compared with someone telling a story that has enough valid facts to make it believable but also contains a few parts that are incorrect. The listener hears and accepts the entire story (or data packet) without questioning it. In fuzz testing, the “test” is to see if these almost-correct packets cause the device to behave unacceptably.
To learn about applying fuzz testing and features of a good fuzzer, please refer to the article by Knudsen1 on page 48 of this issue of Horizons.
Why Fuzz Testing?
Imagine an enemy with a weapon that threatens someone you care about. What do you do? Find a way to neutralize that weapon. For Luke Skywalker in the movie Star Wars, that meant shooting a proton torpedo down a thermal exhaust port to blow up the Death Star. When the enemies are unseen hackers, it means reducing their opportunity to exploit your software. One method to do this is to use the same tool used by the hackers: fuzzing.
Imagine an enemy with a weapon that threatens someone you care about. What do you do? Find a way to neutralize that weapon.
Fuzzing is a type of negative testing that bombards interfaces with malformed inputs. The most effective fuzzing technique is generational fuzzing, in which the fuzzing tool is aware of the data structure and systematically alters each field, pushing boundaries on data types, injecting over- or underflows, flipping signs, breaking checksums, sending messages out of sequence, and so forth. For instance, altering a single field of a message could cause a server to crash. A fuzzer with these features is sometimes called a “smart fuzzer.”
Earlier this year, the National Institute of Standards and Technology, the Department of Homeland Security, and the Government Accountability Office warned of the vulnerabilities to medical devices.2 In response, the U.S. Food and Drug Administration (FDA) reported that it will start a cybersecurity lab and the first tool it will use is Defensics—a fuzz testing platform provided by Codenomicon.3 (Note: Fuzz testing is sometimes referred to as fuzzing, and a device that performs fuzz testing is sometimes referred to as a fuzzer.) Fuzzing is one of the most common weapons used by hackers, and the FDA likely wants to investigate the efficacy of fuzz testing to detect vulnerabilities in medical devices. As of this writing, the agency has not released any information—not even a draft guidance document—to offer insight into its plans for the cybersecurity lab. Nonetheless, if the FDA is moving into active testing of medical devices, finding and mitigating some of the security vulnerabilities, in order to comply with agency requirements, would be helpful.
How Does Fuzz Testing Work?
Fuzz testing can be applied at multiple levels. Consider a device with an 802.11 radio and a web interface. In the OSI model, 802.11 is implemented in layers 1 and 2 (the physical layer and MAC sublayer of the data link layer). Moving up the stack, we reach the network layer and, eventually, layer 7 (the application layer), where the HTML file is generated or decoded. Each layer is represented by some code in the device—code that can contain vulnerabilities. Packets can be generated that fuzz each interface and layer. Fuzzing of 802.11 layer 2 could include a beacon packet with bits defined in a reserved field or a packet with an undefined value in one of the fields. Does the receiver properly ignore the invalid packet? If something goes wrong, the test vector and the target's response (e.g., freezing, rebooting) are recorded to provide feedback for the development team. Testing the IP layer involves using completely valid 802.11 protocols but with invalid fields in the IP layer. This method continues up to the application layer and is repeated for each interface (e.g., Ethernet, USB, 802.11, cellular), so that all layers for all interfaces are tested.
Because generating an infinite number of improperly formed packets is possible, believing that any form of testing, including negative testing, will find all bugs in the software is unrealistic. However, because fuzzing creates properly formed packets for the layers it is not testing, it exercises more code per test vector compared with that gained by pure random testing of vectors. Fuzzing does have difficulty reaching bugs that are “hidden” behind multiple conditional statements4—a place where another tool, such as static analysis,5 shines. For the specific protocol layer under test, generational fuzzing is extremely effective because all aspects of the protocol can be exercised. The term “generational fuzzing” describes testing in which the fuzzing tool has complete knowledge of the protocol or file format being fuzzed. It understands the rules for how to have a conversation in that protocol and appears as a full-fledged network endpoint to the target software. Fuzzing delivers inputs that are mostly correct and therefore able to travel down code pathways and burrow into the software before the anomaly reaches the target code.
Because fuzzing creates properly formed packets for the layers it is not testing, it exercises more code per test vector compared with that gained by pure random testing of vectors.
One may wonder: How is fuzzing accomplished when nothing is known about the communication protocol, as was likely the case for infusion pump6 and implantable defibrillator7 attacks? In this case, template fuzzing is used to detect vulnerabilities. The hacker records the communication to and from the target device while it is in normal operation. He then tweaks (fuzzes) the captured packets and sends them back to the target device. The hacker can gain insight into where the different layer boundaries are within a packet by generating a cyclic redundancy code for different sections of the packet, then comparing the result with the remainder of the packet. In this way, different layers of the communication can be fuzzed. Of course, any part of the protocol that the hacker doesn't record can't be fuzzed, which is a weakness of this method. During template fuzz testing of one's own device, forcing the device to communicate with a rich set of its messages to yield a larger set of test vectors is possible.
What Devices and Software Are Vulnerable?
Some believe that software with secret protocols is not vulnerable, that proprietary security solutions are strong, and/or that long-established protocols have mature software that is robust. However, these assertions are not supported by the evidence. Barnaby Jack, a hacker and accomplished security expert, showed that security through obscurity failed for insulin pumps and implantable defibrillators and that proprietary security mechanisms used for ATMs are easily compromised.8
The hacker's beachhead is eliminated only by removing a vulnerability. To remove a vulnerability, however, first requires the knowledge of its existence.
In addition, consider the vulnerabilities found in software from other industries: Of five commercially available network attached storage devices tested, none performed well in fuzz testing.9 The worst offenders crashed, didn't recover and were “highly likely” to allow personal data to be accessed. The “best” performers came back up after the attack, but exploits allowing access to personal data were still “highly likely.” Web browsers10 and wireless access points11 had essentially the same statistics, with fuzz testing results ranging from bad to ugly (Table 1). Even the well-known Adobe Reader has vulnerabilities that have been “exploited in the wild”12 and could “potentially allow an attacker to take control of the affected system.”13 The hacker's beachhead is eliminated only by removing a vulnerability. To remove a vulnerability, however, first requires the knowledge of its existence.
Readers might be surprised to learn that some vulnerabilities exist by design or are knowingly left in place. What happens if fuzzing exposes them? Consider the similar cases of an insulin pump and a portable lightweight GPS receiver (PLGR) targeting system. The insulin pump incorrectly calculates the amount of insulin to dispense after a battery change.14 Because the manufacturer reports that the owner's manual contains instructions for recalibrating the pump, we can infer that this is a known operating feature and that the manufacturer decided it is reasonable to have the user update the insulin used upon battery change (perhaps not considering the worst case of children and/or hypoglycemic users). Considering the high likelihood that fuzz testing would cause a reboot that in turn has a 90% probability of requiring a pump recalibration (Jay Radcliffe, personal communication, Aug. 21, 2013), then a proper response15 is to reconsider whether the vulnerability should be mitigated. In an analogous case, upon battery change, a PLGR targeting system initialized to its own current location, resulting in a bomb drop on the combat controller's current location instead of the enemy's.16,17 The military's response was, “We need to know how our equipment works”18 (translation: read the manual). Eliminating these vulnerabilities would both improve the usability of the device and prevent hackers from exploiting it. The point of these examples is that unless the device manufacturer has an inclination to improve the software, nothing will happen.
Fuzz testing exposes software faults not uncovered with traditional testing, and remediation of these faults makes the device more robust. Therefore, believing that the only reason for fuzz testing is to thwart hackers is not accurate. The mindset may change if the cost of the fuzz testing and fixing detected vulnerabilities is less than the cost of leaving the bugs in place. To explore this, we need to know what tools are available and gain some idea of the complexity.
Fuzz Testing Options and Costs
One option is to develop a home-brewed fuzzer: code the framework and write all the test vectors. Free framework tools such as Peach (http://peachfuzzer.com), Spike (www.immunitysec.com/resourcesfreesoftware.shtml), and Protos (www.ee.oulu.fi/research/ouspg/Protos) are available, along with a few test vectors. With these tools, the user needs to create the full set of test vectors and apply the framework, then the tool records the results. The testing is limited by the quality of the test vectors; therefore, the developers need to have full understanding of the protocols being tested. Otherwise, the tool development and testing are for naught. If considering this option, developers must ensure that they have the resources for not only developing the test system, but also maintaining it. To estimate the number of test cases, consider that the Microsoft Security Development Lifecycle process requires a minimum of 500,000 iterations per file parser.19 Assuming 500k iterations per protocol and per file format, we see that the number of test cases can easily reach many tens of millions.
A Forrester Total Economic Impact Study prepared for Codenomicon (available upon request from firstname.lastname@example.org) indicated that the payback period for the customer studied was six months. Most of this was attributable to avoiding $2.16 million in remediation costs of fixing defects after software release. The savings for not having to write the test scripts was about $400,000, while the license and subscription costs together were about $600,000. If in-house expertise and bandwidth exist to support a custom fuzz-testing solution, that may be a good option. Otherwise, viable alternatives include outsourcing or using an off-the-shelf solution.
Fuzz testing need not be restricted to final device testing. In fact, preferably, fuzzing should be used even when selecting software components. Imagine if part of the selection criteria for a software component included automated testing that detects vulnerabilities in that component. If cybersecurity test results are published, health delivery organizations could reasonably shy away from medical devices found to be vulnerable. Large hospital chains might follow the lead of communication companies and include fuzz testing as part of their purchase decision. Note that fuzz testing should not be performed on equipment currently in use or on equipment that will be used for patient care.
Note that fuzz testing should not be performed on equipment currently in use or on equipment that will be used for patient care.
Former Secretary of Defense Donald H. Rumsfeld made the now famous statement about “unknown unknowns” in his February 12, 2002, press briefing.20 The unknown unknowns are the most difficult ones to address. With these, we cannot begin to understand the hazards that medical devices present to the patients they are designed to help. Fuzz testing decreases the uncertainty by moving many unknown unknowns to known issues that can be addressed.
About the Author
Steven D. Baker, PhD, is a senior principal engineer at Welch Allyn in Beaverton, OR. E-mail: email@example.com