The anomaly of cheap complexity

Freedom to Tinker 2022-08-04

Why are our computer systems so complex and so insecure?  For years I’ve been trying to explain my understanding of this question. Here’s one explanation–which happens to be in the context of voting computers, but it’s a general phenomenon about all our computers:

There are many layers between the application software that implements an electoral function and the transistors inside the computers that ultimately carry out computations. These layers include the election application itself (e.g., for voter registration or vote tabulation); the user interface; the application runtime system; the operating system (e.g., Linux or Windows); the system bootloader (e.g., BIOS or UEFI); the microprocessor firmware (e.g., Intel Management Engine); disk drive firmware; system-on-chip firmware; and the microprocessor’s microcode. For this reason, it is difficult to know for certain whether a system has been compromised by malware. One might inspect the application-layer software and confirm that it is present on the system’s hard drive, but any one of the layers listed above, if hacked, may substitute a fraudulent application layer (e.g., vote-counting software) at the time that the application is supposed to run. As a result, there is no technical mechanism that can ensure that every layer in the system is unaltered and thus no technical mechanism that can ensure that a computer application will produce accurate results. 

[Securing the Vote, page 89-90]

So, computers are insecure because they have so many complex layers.

But that doesn’t explain why there are so many layers, and why those layers are so complex–even for what “should be a simple thing” like counting up votes.

Recently I came across a really good explanation: a keynote talk by Thomas Dullien entitled “Security, Moore’s law, and the anomaly of cheap complexity” at CyCon 2018, the 10th International Conference on Cyber Conflict, organized by NATO.

Thomas Dullien’s talk video is here, but if you want to just read the slides, they are here.

As Dullien explains,

A modern 2018-vintage CPU contains a thousand times more transistors than a 1989-vintage microprocessor.  Peripherals (GPUs, NICs, etc.) are objectively getting more complicated at a superlinear rate. In his experience as a cybersecurity expert, the only thing that ever yielded real security gains was controlling complexity.  His talk examines the relationship between complexity and failure of security, and discusses the underlying forces that drive both.

Transistors-per-chip is still increasing every year; there are 3 new CPUs per human per year.  Device manufacturers are now developing their software even before the new hardware is released.  Insecurity in computing is growing faster than security is improving.

The anomaly of cheap complexity.  For most of human history, a more complex device was more expensive to build than a simpler device.  This is not the case in modern computing. It is often more cost-effective to take a very complicated device, and make it simulate simplicity, than to make a simpler device.  This is because of economies of scale: complex general-purpose CPUs are cheap.  On the other hand, custom-designed, simpler, application-specific devices, which could in principle be much more secure, are very expensive.  

This is driven by two fundamental principles in computing: Universal computation, meaning that any computer can simulate any other; and Moore’s law, predicting that each year the number of transistors on a chip will grow exponentially.  ARM Cortex-M0 CPUs cost pennies, though they are more powerful than some supercomputers of the 20th century.

The same is true in the software layers.  A (huge and complex) general-purpose operating system is free, but a simpler, custom-designed, perhaps more secure OS would be very expensive to build.  Or as Dullien asks, “How did this research code someone wrote in two weeks 20 years ago end up in a billion devices?”

Then he discusses hardware supply-chain issues: “Do I have to trust my CPU vendor?”  He discusses remote-management infrastructures (such as the “Intel Management Engine” referred to above):  “In the real world, ‘possession’ usually implies ‘control’. In IT, ‘possession’ and ‘control’ are decoupled. Can I establish with certainty who is in control of a given device?”

He says, “Single bitflips can make a machine spin out of control, and the attacker can carefully control the escalating error to his advantage.”  (Indeed, I’ve studied that issue myself!)

Dullien quotes the science-fiction author Robert A. Heinlein:

“How does one design an electric motor? Would you attach a bathtub to it, simply because one was available? Would a bouquet of flowers help? A heap of rocks? No, you would use just those elements necessary to its purpose and make it no larger than needed — and you would incorporate safety factors. Function controls design.” 

 Heinlein, The Moon Is A Harsh Mistress

and adds, “Software makes adding bathtubs, bouquets of flowers, and rocks, almost free. So that’s what we get.”

Dullien concludes his talk by saying, “When I showed the first [draft of this talk] to some coworkers they said, ‘you really need to end on a more optimistic note.”  So Dullien gives optimism a try, discussing possible advances in cybersecurity research; but still he gives us only a 10% chance that society can get this right.


Postscript:  Voting machines are computers of this kind.  Does their inherent insecurity mean that we cannot use them for counting votes?  No. The consensus of election-security experts, as presented in the National Academies study, is: we should use optical-scan voting machines to count paper ballots, because those computers, when they are not hacked, are much more accurate than humans.  But we must protect against bugs, against misconfigurations, against hacking, by always performing risk-limiting audits, by hand, of an appropriate sample of the paper ballots that the voters marked themselves.