Death by Rounding Error

I took a numerical analysis course in grad school where the professor opened with this story about the Patriot missile failure and how a simple rounding error cost 28 lives. On February 25, 1991, an Iraqi Scud missile struck a U.S. Army barracks in Dhahran, Saudi Arabia. 28 soldiers died and around 100 others were injured. The Patriot missile defense system that was supposed to protect them failed to fire. The cause was a rounding error in how the system counted time. Math errors don't usually kill people, but this one did.

What Happened

The Patriot system was originally designed to track and shoot down Soviet aircraft moving at around Mach 2. During the Gulf War, they repurposed it to intercept Iraqi Scud missiles traveling at Mach 5(yes, nearly three times faster than its original design spec, surely nothing can go wrong).


The system tracked targets using radar, predicting where an incoming missile would appear next based on its velocity and the time since the last radar sweep. Time was kept by the system's internal clock, which counted in tenths of a second and stored that count as an integer. To calculate the missile's predicted position, the software needed to multiply the velocity by time. But velocity was a real number (like 3750.2563 mph), and time was an integer. So the software had to convert the integer time to a real number by multiplying by 0.1.


The software used a 24-bit fixed-point register for this calculation. But here's the problem: the number 0.1 cannot be represented exactly in binary.

The 0.1 Problem

We use base 10 in our daily lives. When we write 0.5, we mean "five tenths" or 510\frac{5}{10}. Computers, however, work in base 2. They can only work with powers of two: halves, quarters, eighths, sixteenths, and so on.


Many decimal fractions translate nicely into binary. Like 0.5 is just 12\frac{1}{2}, which is 212^{-1}, so in binary it's simply 0.1. The number 0.25 is 14=22\frac{1}{4} = 2^{-2}, so it's 0.01 in binary.


But 0.1 is 110\frac{1}{10}. And 10 isn't a power of 2. So when you try to write it in binary, you get this:

0.110=0.0001100110011001100110011...20.1_{10} = 0.0001100110011001100110011..._{2}

That pattern 0011 repeats forever. The computer has to stop somewhere and round off. And that's where the trouble begins.

0.5 in binary

0.5 = 1/2

Binary: 0.1

Half is just one 'half-bit'. Clean.

✓ Exact

Mathematically, the binary expansion of 110\frac{1}{10} is:

110=124+125+128+129+1212+1213+\frac{1}{10} = \frac{1}{2^4} + \frac{1}{2^5} + \frac{1}{2^8} + \frac{1}{2^9} + \frac{1}{2^{12}} + \frac{1}{2^{13}} + \cdots

The Patriot's 24-bit register stored 0.00011001100110011001100 instead of the infinite sequence. The chopped-off part introduced an error of about 0.000000095 per conversion. Eight zeros after the decimal point before you hit anything. Negligible, you'd think.

How It's Actually Stored

Modern systems use IEEE 754 , a standard for floating-point arithmetic. A 64-bit number (the kind most programming languages use by default) gets split into three parts: 1 bit for the sign, 11 bits for the exponent, and 52 bits for the actual digits. The Patriot used older 24-bit fixed-point arithmetic, but the core issue is the same-finite bits, infinite decimal.

64-bit IEEE 754 representation

Sign (1 bit)
0
(positive)
Exp (11 bits)
01111111011
= -4 (biased: 1019)
Frac (52 bits)
1001100110011001100110011001100110011001100110011010

What the computer actually stores:

0.10000000000000001

When you store 0.1 in a 64-bit float, the actual value is:

0.1000000000000000055511151231257827021181583...

That tiny error at the end is the rounding error. For everyday purposes, nobody cares about the 55 quadrillionths of error. But the Patriot wasn't doing everyday arithmetic.

Decimal value

0.1

100 Hours

If the error in representing 0.1 is so tiny, why did it matter? Because the Patriot system at Dhahran had been running continuously for about 100 hours when the Scud hit. Every tenth of a second, the system's clock ticked, and every tick meant another conversion from integer time to real time, each one slightly wrong.


100 hours is 360,000 seconds, or 3,600,000 tenths of a second. Multiply that by the per-tick error:

3,600,000×0.0000000950.34 seconds3{,}600{,}000 \times 0.000000095 \approx 0.34 \text{ seconds}

A third of a second. That's the accumulated drift.

100 times

Adding 0.1 + 0.1 + 0.1 + ...

0 × 0.1 = 0.000000000000000

Each addition introduces a tiny error. Over 100 iterations, these errors compound into something measurable.

The Range Gate

A Scud missle travels at approximately Mach 5, or about 1,676 meters per second. In 0.34 seconds, it covers over 500 meters. But the actual problem was more subtle than just "the clock was off."


The Patriot used something called a range gate, which was basically a window in the sky where it expected to find the target on the next radar sweep. Think of it like looking for a car through a narrow gap in a fence. If the gap is pointed at the wrong spot, you won't see the car even though it's right there on the road.


According to the GAO (Government Accountability Office) report, a 20% shift in the range gate was significant enough to degrade accuracy. A 50% shift meant complete failure to t rack. The Israelis, who were also using Patriots, noticed problems after 8 hours of continuous operation. They had data recorders attached to their systems. The U.S. didn't because some commanders worried recorders might cause unexpected shutdowns.


The following table is from the actual GAO report (Appendix II of IMTEC-92-26):

Data from GAO Report IMTEC-92-26, Appendix II

HoursTime (sec)CalculatedInaccuracy (sec)Range Gate Shift (m)
000.00000.00000
13,6003599.99660.00347
828,80028799.97250.027555
2072,00071999.93130.0687137
48172,800172799.83520.1648330
72259,200259199.75280.2472494
100360,000359999.66670.3433687

8 hours: Israeli forces reported targeting problems

20 hours: System can no longer track targets

100 hours: Dhahran incident

At 100 hours, the range gate had shifted 687 meters. The system detected the Scud initially, predicted where it would be next, pointed the range gate there, and found nothing. It concluded the first detection was noise. It never fired.

The Timeline

On February 11, 1991, two weeks before the Dhahran attack, the Patriot Project Office got the Israeli data showing the range get problem.


Army officials analyzed the Israeli data, confirmed the problem, and on February 16 released a software patch.


On February 21, four days before the attack, the Patriot Project Office sent a message to all Patriot users warning that "very long run times could cause a shift in the range gate." The message said a fix was coming. But the message didn't define what "very long" meant. Officials later said they assumed nobody would run the system continuously long enough for it to matter. They didn't think more specific guidance was needed.


On February 25, Alpha Battery(this was the name of the missle battery that was protecting the air base) at Dhahran had been online for over 100 hours. The Scud hit.


On February 26, 1991, one day after the attack, the patch arrived.


There's one more detail from the post-incident analysis: the timing calculation had been fixed in some parts of the code but not others. If the whole system had been consistently wrong, the errors might have canceled out. Instead, the partial fix made things worse.

Why This Continues to Get Taught

The Patriot missle failure shows up in numerical analysis courses, software engineering courses, and even ethics courses. The GAO report (IMTEC-92-26) is publicly available and provides a detailed post-mortem. It's only 16 pages and definitely worth reading if you work on anything safety-critical.


Part of what makes it useful pedagogically is that every link in the chain was completely reasonable in isolation. Using fixed-point arithmetic? Normal for embedded systems in the 1970s. Storing time as an integer and converting? Standard practice. Not defining "very long run time" in the warning message? People were busy, it was a war, and everyone assumed someone else would handle the details. Keeping the system online for 100 hours straight? Dude, Scuds were falling, rebooting meant 60-90 seconds of vulnerability. Nobody wanted to be the one who took the defense system offline when a missile hit.


The system was designed to be mobile-operate for a few hours, move to a new position, and reboot in the process. Desert Storm turned it into a fixed installation running continuously, and nobody fully worked through the implications of this.


Floating-point errors compound over time. That's the direct technical lesson. But there's a human lesson too: the bug was known, a fix existed, the people who could deploy it didn't have the information they needed to understand the urgency. Software systems fail inside organizational systems.

Further Reading

Most of the technical details come from GAO Report IMTEC-92-26. If you want the full story with the actual assembly code analysis, it's there. Also, credit to my numerical analysis professor at TCU, Dr. Hanson, who gave a compelling lecture on this story and inspired me to write this post.


If you want to dig deeper into floating-point arithmetic in general, Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" is the standard reference. Its dense, but comprehensive.


And if you're curious about other software bugs that had serious consequences, look into the Therac-25 radiation therapy incidents, the Ariane 5 Flight 501 explosion, or the Boeing 737 MAX MCAS failures. Different failure modes, same lessons about testing assumptions. Thanks for reading!