Last week, developers on OpenBSD—the open source operating system that prioritizes security—disabled hyperthreading on Intel processors. Project leader Theo de Raadt said that a research paper due to be presented at Black Hat in August prompted the change, but he would not elaborate further.
The situation has since become a little clearer.
In a proof of concept, researchers ran a program calculating cryptographic signatures using the Curve 25519 EdDSA algorithm implemented in libgcrypt on one logical core and their attack program on the other logical core. The attack program could determine the 256-bit encryption key used to calculate the signature with a combination of two milliseconds of observation, followed by 17 seconds of machine-learning-driven guessing and a final fraction of a second of brute-force guessing.
Those observations are made using a side channel. Side channels are features of computer systems that inadvertently leak information due to the way the system has been implemented. Side channels have long been a concern for cryptography software, where attributes such as the power draw of the processor or the behavior of the processor’s cache can be used to reveal encryption keys. Side channels are also key elements when using the Spectre and Meltdown exploits revealed earlier this year; in those attacks, features of the processor’s speculative execution machinery can be used to make measurable changes to the cache.
A different kind of cache
The previous focus for side-channel attacks has been the processor’s data cache, a small piece of high-speed memory that’s used to hold recently used data. TLBleed uses a new side channel: the processor’s translation lookaside buffer (TLB).
The TLB is a special kind of cache that’s used by the processor to determine memory addresses. Specifically, it holds mappings from virtual memory addresses to physical memory addresses. Every byte of RAM in a system has a physical address: a number that uniquely identifies the byte. For several reasons—both security and convenience—programs and operating systems don’t use these physical addresses. Instead, they use virtual addresses. Each running program has its own set of virtual addresses, and the operating system maintains a set of mappings from these virtual addresses to the underlying physical memory. These mappings typically have a granularity of 4kB, though 2MB and 1GB are also supported.
These mappings are big and complex; they involve multiple lookups in multiple tables, with each table yielding part of the physical address that corresponds to a given virtual address. Because this is relatively slow, the processor stores the most recently used mappings. That way, if it has to repeatedly look up the same handful of addresses, it can do so near-instantly rather than having to trawl through all the tables. This store of recent mappings is the TLB. Like other caches, processors have a hierarchy of TLBs. Unlike other caches, which have their size measured in bytes, TLBs are instead measured in terms of how many mappings they can store. For example, each physical core on the Ryzen processor I’m using at the moment has a 64-entry level 1 TLB and a 1536-entry level 2 TLB. Only if a mapping isn’t found in one of these does the processor have to look at the tables.
This structure is what provides the side channel. Looking up a memory address that’s in the TLB will be fast—just a handful of processor cycles. Looking up a memory address that’s in any TLB will be much slower—hundreds of processor cycles, possibly even more. These performance differences can be measured, allowing inferences to be made about the mappings currently stored in the TLB. Each new mapping that the processor looks up means that an existing mapping has to be discarded from the TLB, though the policy on how exactly the processor chooses the entry to discard to make space for a new one will vary from processor to processor.
More specific details of TLBleed aren’t (yet) available, but we can guess that TLBleed probably works in a similar fashion to other cache-based side channels. The attacking program will prime the TLB in some way: it will try to access a range of memory addresses that pre-populate the TLB with the mappings for those addresses. The encryption program, which performs its own accesses to memory addresses, will cause some of those TLB entries to be evicted and replaced with new mappings. The attacker can then attempt to access its range of addresses again, and it can time how long each access takes.
If an access is “slow,” the attacker knows that the TLB entry holding that mapping was discarded. For conventional cache attacks, there are a number of variations of this basic approach. For example, instead of looking for a “slow” lookup, the attack might look for a “fast” lookup (indicating that a specific piece of data was accessed).
In both regular cache side channels and TLBleed, these minute performance variations allow inferences to be made about exactly which data (for cache side channels) or memory addresses (for TLBleed) has been accessed by the victim program. Precisely how these inferences can be used to determine a victim program’s encryption keys is yet to be disclosed, but the researchers told The Register that a key element was not TLB entries were changed but rather those TLB entries changed— looking up a particular address goes from fast to slow (or vice versa).
The other feature of TLBleed is the one that caused the OpenBSD changes to be made. Processors with simultaneous multithreading (SMT) support two or more logical cores, each of which can run a thread, on each physical core. These logical cores share the physical core’s resources, including the caches and the TLB. With the attacker program running on the same physical core as the victim program, these attacker can detect changes to the TLB as they’re made.
The research looked at Intel processors with hyperthreading (Intel’s name for SMT), but others may well be affected, too. AMD’s Ryzen processors also have SMT, for example, and depending on how Ryzen’s TLB works, they may be susceptible to similar attacks.
Closing the side channels
The ability for side channels to leak encryption keys has been known for a long time. The response by the developers of cryptography software is to implement their algorithms in such a way as to defend against these data leaks. For example, a naive encryption algorithm might test one bit from the encryption key; it might do one thing if the bit is 1 and another thing if the bit is 0. This, in turn, can cause measurable differences in execution time. The solution is to write the algorithm in such a way that it takes the same path and performs the same instructions, regardless of bits of the key.
Similarly, many encryption algorithms include lookups in data tables, with different data looked up depending on the encryption key, which can leak information through cache side channels. Again, the solution is to ensure that the pattern of data lookups remains consistent regardless of the bits of the key.
The Spectre and Meltdown attacks are particularly significant because they can be used to attack cloud infrastructure and because they’ve prompted hardware modifications to prevent the data leaks and close the side channels.
TLBleed doesn’t seem likely to do the same. Ben Gras, one of the researchers, tweeted “don’t panic” and said that TLBleed “is not the new Spectre.” Just as was done for previously known side channels, encryption algorithms can be implemented such that their pattern of data accesses is the same regardless of the encryption key. Removing this variation removes the side channel. Gras said that implementations that do this are currently rare, but if it were done, TLBleed would no longer work.
This has provoked a rather dismissive response from Intel. Although the company operates a bug bounty program, it has declared that this problem isn’t eligible for an award. Intel argues that countermeasures against cache side channels can also protect against TLBleed.
Are we likely to see the same kind of industry-wide mobilization to address TLBleed as we saw with Spectre and Meltdown? If history is any guide, the answer here is “no.” Gras tweeted that while TLBleed is a new side channel, it’s not fundamentally any more powerful than cache side channels. Since at least 2005, it’s been known that SMT makes cache-based side channels much easier to exploit. The industry’s response to this?
Neither processors nor operating systems have been modified in response. Williams’ suggestion—that operating systems not schedule processes belonging to different users on the same physical core— be used to close these cache side channels (albeit with the same concerns as de Raadt has raised). It hasn’t been. Implementations of crypto algorithms have been developed to ensure that their data accesses don’t have a dependence on the encryption key, and operating systems and processors have continued to work the same as they always have. It’s a problem for crypto; it’s probably not a problem for everyone.
As such, the OpenBSD solution—disabling the use of SMT entirely—looks unlikely to proliferate.