Timing Leaks Everything

Gödel’s Lost Letter and P=NP 2018-03-12

Facing the awful truth that computers are physical machines

As moderator of RSA 2016 panel

Paul Kocher is the lead author on the second of two papers detailing a longstanding class of security vulnerability that was recognized only recently. He is an author on the first paper. Both papers credit his CRYPTO 1996 paper as originating the broad kind of attack that exploits the vulnerability. That paper was titled, “Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems.” Last week, the world learned that timing attacks can jeopardize entire computing systems, smartphones, the Cloud, everything.

Today we give a high-level view of the Meltdown and Spectre security flaws described in these papers.

Both flaws are at processor level. They are ingrained in the way modern computers operate. They are not the kind of software vulnerabilities that we have discussed several times before. Both allow attackers to read any memory location that can be mapped to the kernel—which on most computers allows targeting any desired memory contents. Meltdown can be prevented by software patches—at least as we know it—but apparently no magic bullet can take out Spectre.

Kocher was mentioned toward the end of a 2009 post by Dick titled, “Adaptive Sampling and Timed Adversaries.” This post covered Dick’s 1993 paper with Jeff Naughton on using timing to uncover hash functions. Trolling for hash collisions and measuring the slight delays needed to resolve them required randomized techniques and statistical analysis to extract the information. No such subtlety is needed for Meltdown and Spectre—only the ability to measure time in bulky units.

Running On Spec

The attacks work because modern processors figuratively allow cars—or trolleys—to run red lights. An unauthorized memory access will raise an exception but subsequent commands will already have been executed “on spec.” Or if all the avenue lights are green but the car needs to turn at some point, they will still zoom it ahead at full speed—and weigh the saving of not pausing to check each cross-street versus the minimal backtrack needed to find a destination that is usually somewhere downtown.

Such speculative execution leverages extra processing capacity relative to other components to boost overall system speed. The gain in time from having jumped ahead outweighs the loss from computing discarded continuations. The idea of “spec” is easiest to picture when the code has an if-else branch. The two branches usually have unequal expected frequencies: the lesser one may close a long-running loop that the other continues, or may represent failure of an array-bounds test that generally succeeds. So the processor applies the scientific principle of induction to jump always onto the fatter branch, backtracking when (rarely) needed.

Meltdown applies to the red-light situation, Spectre to branches. Incidentally, this is why the ghost in the logo for Spectre is holding a branch:

The logos were designed by Natascha Eibl of Graz, Austria, whose artistic website is here. Four authors of both papers are on the faculty of the Graz University of Technology, which hosts the website for the attacks. The Graz team are mostly responsible for the fix to Meltdown called KPTI for “kernel page-table isolation,” but the Spectre attack is different in ways that make it inapplicable.

There have been articles like this decrying the spectre of a meltdown of the whole chip industry. We’ll hold off on speculating about impending executions and stay with describing how the attacks work.

Meltdown Example in C-like Code

The Meltdown paper gives details properly in machine code, but we always try to be different, so we’ve expressed its main example in higher-level C code to convey how an attacker can really pull this off.

To retrieve the byte value {b} at a targeted location {x} in the kernel’s virtual memory map {K}, the attacker can create a fresh array {A} of objects {Q} whose width is known to be the page size of the cache. The contents of {A} don’t matter, only that initially no member has been read from, hence not cached. The attacker then submits the following code using a process fork or try-catch block:

  object Q;      //loaded into chip memory  byte b = 0;  while (b == 0) {     b = K[x];    //violates privilege---so raises an exception   }   Q = A[b];      //should not be executed but usually is  //continue process after subprocess dies or exception is caught:   int T[256];   for (int i = 0; i < 256; i++) {     T[i] = the time to find A[i];   }      if T has a clear minimum T[i] output i, else output 0.  

What happens? Let’s first suppose {b \neq 0}. By “spec” the while-loop exits and the read from {A[b]} happens before the exception kills the child. This read generally causes the contents of {A[b]} to be read into the cache and causes the system to note this fact about the index {b}. This note survives when the second part of the code is (legally) executed and causes the measured time {T[i]} to be markedly low only when {i = b}, because only that page is cached. Thus the value of {b} is manifested in the attacker’s code, which can merrily continue executing to get more values.

The reason for the special handling of {b = 0} is that a “race condition” exists whose outcome can zero out {b} or leave its initial {b = 0} value untouched. The while-loop keeps trying until the race is won. If the secret value {b} really is zero then the loop will either raise the exception or iterate until a segmentation fault occurs. The latter causes Q = A[0] not to be executed, but then the initial condition that no page of {A} is cached still holds, so no time {T[i]} is markedly lower, so {0} is returned after all.

A second key point is that Intel processors allow the speculatively executed code the privilege needed to read {b}. Again, the Meltdown paper has all the details expressed properly, including how cache memory is manipulated and how to measure each {T[i]} without dislodging {A[b]} from its cache. The objects {Q} need not be as big as the page size {P}—they only need to be spaced {P} apart in {A}. This and some other tunings enabled the Meltdown paper’s experiment to read over 500,000 supposedly-protected bytes per second.

Spectre

The Spectre attack combines “spec” with the old bugaboo of buffer overflows. Enforcing array bounds is not only for program correctness but also for securing boundaries between processes. The attacker uses an array {B} with size bound {s} and an auxiliary array {A} of size {256s}. The attacker needs to discover some facts about the victim’s code and arrange that addresses {B[x]} for overflowing {x} will map into the victim’s targeted memory. The first idea is to induce enough accesses {B[x]} with valid {x < s} to train the branch predictor to presume compliance, so that it will execute in advance the body of the critical line of code:

  if (x < s) { y = A[256*B[x]]; }

To create the delay during which the body will execute speculatively, the attacker next thrashes the cache so that {s} will likely be evicted from it. Finally, the attacker chooses some {x \geq s} and re-enters the critical line. The bounds check {x < s} causes a cache miss so that the "spec" runs. Not only will it deliver the targeted byte {b = B[x]}, it will cache it at the spaced-out location {256b} (the page size {P} is not involved). Then the value of {b} is recovered much as with Meltdown.

Spectre is more difficult to exploit but what makes it scarier is that now the out-of-bounds access is not treated as guarded by a higher privilege level: it involves the attacker’s own array {B}. Even if the attacker has limited access to {B}, there is a way: Call the critical line with random valid {x' < s} a few hundred times. There is a good chance that a valid value {B[x']} that equals {b} will be found. Since {A[b]} is in the cache, the line y = A[256*B[x']] will execute faster than the other cases. Detecting the faster access again leaks {b}.

The paper concludes with variants that allow similar timing attacks to operate under even weaker conditions. One does not even need to manipulate the cache beforehand. The last one is titled:

Leveraging arbitrary observable effects.

That is, let {O(y)} stand for “something using {y} that is observable when speculatively executed.” Then the following lines of code are all it takes to compromise the victim’s data:

  if (x < s) {     y = A[256*B[x]];     O(y);   } 

Oy.

Open Problems

We’ve talked about timing attacks, but can we possibly devise a concept of timing defenses? On first reflections the answer is no. Ideas of scrambling data in memory space improve security in many cases, but scrambling computations in time seems self-defeating. Changing reported timings randomly and slightly in the manner of differential privacy is useless because the timing difference of the cached item is huge. Computers need to assure fine-grained truth in timing anyway.

Besides timing, there are physical effects of power draw and patterns of electric charge and even acoustics in chips that have been exploited. Is there any way the defenders can keep ahead of the attackers? Can the issues only be fixed by a whole new computing model?

[some word and format changes, qualified remark on software nature in intro and linked more related posts]