It’s been almost a full year since we published the last part of our UEFI blog posts series. During that period, the firmware security community has been was more active than ever and produced several high-quality publications. Notable examples include the discovery of new UEFI implants such as MoonBounce and ESPecter, and the recent disclosure of no less than 23 high-severity BIOS vulnerabilities by Binarly.
Here at SentinelOne, we haven’t been sitting idle either. In the past year, we tried our hand at hunting down and exploiting SMM vulnerabilities. After spending several months doing so, we noticed some repetitive anti-patterns in SMM code and developed a pretty good intuition regarding the potential exploitability of bugs. Eventually, we managed to conclude 2021 after having disclosed 13 such vulnerabilities, affecting most of the well-known OEMs in the industry. In addition, several more vulnerabilities are still moving through the responsible disclosure pipeline and should go public soon.
In this blog post, we would like to share the knowledge, tools, and methods we developed to help uncover these SMM vulnerabilities. We hope that by the time you finish reading this article, you too will be able to find such firmware vulnerabilities yourselves. Please note that this article assumes a solid knowledge of SMM terminology and internals, so if your memory needs a refresher we highly recommend reading the articles in the Further Reading section before proceeding. And now, let’s get started.
While in theory SMM code is isolated from the outside world, in reality, there are many circumstances in which non-SMM code can trigger and even affect code running inside SMM. Because SMM has a complex architecture with lots of “moving parts” in it, the attack surface is pretty vast and contains among other things data passed in communication buffers, NVRAM variables, DMA-capable devices, and so on.
In the following section, we will go through some of the more common SMM security vulnerabilities. For each vulnerability type, we will provide a brief description, some recommended mitigations as well as a strategy for detecting it while reversing. Note that the list of vulnerabilities is not exhaustive and contains only vulnerabilities that are specific to the SMM environment. For that reason, it will not include more generic bugs such as stack overflows and double-frees.
The most basic SMM vulnerability class is known as an “SMM callout”. This occurs whenever SMM code calls a function located outside of the SMRAM boundaries (as defined by the SMRRs). The most common callout scenario is an SMI handler that tries to invoke a UEFI boot service or runtime service as part of its operation. Attackers with OS-level privileges can modify the physical pages where these services live prior to triggering the SMI, thus hijacking the privileged execution flow once the affected service is called.
Besides the obvious approach of not writing such faulty code in the first place, SMM callouts can also be mitigated at the hardware level. Starting from the 4th generation of the Core microarchitecture (Haswell) Intel CPUs support a security feature called SMM_Code_Chk_En. If this security feature is turned on, the CPU is prohibited from executing any code located outside the SMRAM region once it enters SMM. One can think of this feature as the SMM equivalent of Supervisor Mode Execution Prevention (SMEP).
Querying for the status of this mitigation can be done by executing the smm_code_chk module from CHIPSEC.
Static detection of SMM callouts is pretty straightforward. Given an SMM binary, we should analyze it while looking for SMI handlers that have some execution flow that leads to calling a UEFI boot or runtime service. This way, the problem of finding SMM callouts is reduced to the problem of searching the call graph for certain paths. Luckily for us, no additional effort is required at all since this heuristic is already implemented by the excellent efiXplorer IDA plugin.
As we mentioned in previous posts in the series, efiXplorer is a one-stop-shop and serves as the de-facto standard way of analyzing UEFI binaries with IDA. Among other things, it takes care of the following:
LocateProtocol()or its SMM counterpart
A note to Ghidra users: We also want to add that the Ghidra plugin efiSeek takes care of all the changes in the list above. However, it doesn’t include the UI elements like the protocols window and the vulnerability detection capabilities offered by efiXplorer.
After analysis of the input file is complete, efiXplorer will move on to inspect all calls carried out by SMI handlers, which yields a curated listing of potential callouts:
For the most part, this heuristic works great, but we’ve encountered several edge cases where it might generate some false positives as well. The most common one is caused due to the usage of
EFI_SMM_RUNTIME_SERVICES_TABLE. This is a UEFI configuration table that exposes the exact same functionality as the standard
EFI_RUNTIME_SERVICES_TABLE, with the only significant difference being that, unlike its “standard” counterpart, it resides in SMRAM and is therefore suitable to be consumed by SMI handlers. Many SMM binaries often re-map the global
RuntimeServices pointer to the SMM-specific implementation after completing some boilerplate initialization tasks:
Calling runtime services via the re-mapped pointer yields a situation that appears to be a callout at first glance, though a closer examination will prove otherwise. To overcome this, analysts should always search the SMM binary for the GUID identifying
EFI_SMM_RUNTIME_SERVICES_TABLE. If this GUID is found, chances are that most of the callouts involving UEFI runtime services are false positives. This does not apply to callouts involving boot services, though.
Another source of potential false positives is various wrapper functions which are “dual-mode”, meaning they can be called from both SMM and non-SMM contexts. Internally, these functions dispatch a call to an SMM service if the caller is executing in SMM, and dispatches a call to the equivalent boot/runtime service otherwise. The most common example we’ve seen in the wild is
FreePool() from EDK2, which calls
gSmst->SmmFreePool() if the buffer to be freed resides in SMRAM, or calls
As this example demonstrates, bug hunters should be aware of the fact that static code analysis techniques are having a hard time determining that certain code paths won’t be executed in practice, and as such are likely to flag this as a callout. Some tips and tricks for identifying this function in compiled binaries will be conveyed in the Identifying Library Functions section.
Under normal circumstances, the communication buffer used to pass arguments to the SMI handler must not overlap with SMRAM. The rationale for this restriction is quite simple: if that wasn’t the case, any time the SMI handler would write some data into the comm buffer — for example, in order to return a status code to the caller — it would also modify some portion of SMRAM along the way, which is undesirable.
In EDK2, the function responsible for checking whether or not a given buffer overlaps with SMRAM is called
SmmIsBufferOutsideSmmValid(). This function gets called on the communication buffer upon each SMI invocation in order to enforce this restriction.
Alas, since the size of the communication buffer is also under the attacker’s control this check on its own is not enough to guarantee sound protection and some additional responsibilities lay on the shoulders of the firmware developers. As we will see shortly, many SMI handlers fail here and leave a gap attackers can exploit to violate this restriction and corrupt the bottom portion of SMRAM. To understand how, let’s take a closer look at a concrete example:
Above we have a real-life, very simple SMI handler. We can divide its operation into 4 discrete steps:
MSR_IDT_MCR5register into a local variable.
The astute reader might be aware of the fact that during step 3, an 8-byte value is written to the Comm Buffer, but nowhere during step 1 does the code check for the prerequisite that the buffer is at least 8 bytes long. Because this check is omitted, an attacker can exploit this by:
As far as
SmmEntryPoint is concerned, the Comm Buffer is just 1 byte long and does not overlap with SMRAM. Because of that,
SmmIsBufferOutsideSmmValid() will succeed and the actual SMI handler will be called. During step 3, the handler will blindly write a QWORD value into the Comm Buffer, and by doing so it will unintentionally write over the lower 7 bytes of SMRAM as well.
Based on EDK2, the bottom portion of TSEG (the de-facto standard location for SMRAM), contains a structure of type
SMM_S3_RESUME_STATE whose job is to control recovery from the S3 sleep state. As can be seen below, this structure contains a plethora of members and function pointers whose corruption can benefit the attacker.
To mitigate this class of vulnerabilities, SMI handlers must explicitly check the size of the provided communication buffer and bailout in case the actual size differs from the expected size. This can be achieved in one of two ways:
CommBufferSizeargument and then comparing it to the expected size. This method works because we already saw that
SmmIsBufferOutsideSmmValid(CommBuffer, *CommBufferSize), which guarantees
*CommBufferSizebytes of the buffer are located outside of SMRAM.
SmmIsBufferOutsideSmmValid()on the Comm Buffer again, this time with the concrete size expected by the handler.
To detect this class of vulnerabilities, we should be looking for SMI handlers that don’t properly check the size of the Comm Buffer. That suggests the handler does not perform any of the following:
SmmIsBufferOutsideSmmValid()on the communication buffer.
Condition 1 is straightforward to check because efiXplorer already takes care of locating SMI handlers and assigning them their correct function prototype. Condition 2 is also easy to validate, but the crux is this: since
SmmIsBufferOutsideSmmValid() is statically linked to the code, we must be able to identify it in the compiled binary. Some tips and tricks for doing so can be found in the next section.
While certainly a big step forward in our analysis of SMM vulnerabilities, the previous bug class still suffers from several significant limitations that hinder it from being easily exploited in real-life scenarios. A better, more powerful exploitation primitive will allow us to corrupt arbitrary locations within SMRAM, not only those that are adjacent to the bottom.
Such exploitation primitives can often be found in SMI handlers whose communication buffers contain nested pointers. Since the internal layout of the communication buffer is not known apriori, it is the responsibility of the SMI handler itself to correctly parse and sanitize it, which usually boils down to calling
SmmIsBufferOutsideSmmValid() on nested pointers and bailing out if one of them happens to overlap with SMRAM. A textbook example for properly checking these conditions can be found in the
SmmLockBox driver from EDK2:
To report back to the OS that certain best practices have been implemented in SMM, a modern UEFI firmware usually creates and populates an ACPI table called the
Windows SMM Mitigations Table, or
WSMT for short. Among other things, the WSMT maintains a flag called
COMM_BUFFER_NESTED_PTR_PROTECTION that, if present, asserts that no nested pointers are used by SMI handlers without prior sanitization. This table can be dumped and parsed using the chipsec module common.wsmt:
Unfortunately, practice has shown that more often than not, the correlation between reported mitigations and reality is scarce at best. Even when the WSMT is present and reports all the supported mitigations as active, it’s not uncommon to discover SMM drivers that completely forget to sanitize the communication buffer. Leveraging this, attackers can trigger the vulnerable SMI with a nested pointer pointing to SMRAM memory. Depending on the nature of the particular handler, this can result in either corruption of the specified address or disclosure of sensitive information read from that address. Let’s take a look at an example.
In the snippet above, we have an SMI handler that gets some arguments via the communication buffer. Based on the decompiled pseudocode, we can deduce that the first byte of the buffer is interpreted as an OpCode field that instructs the handler what it should do next (1). As can be seen (2), valid values for this field are either 0, 2, or 3. If the actual value differs from those, the default clause (3) will be executed. In this clause, an error code is written to the memory location pointed to by the 2nd field of the comm buffer. Since this field is under the attacker’s control along with the entire contents of the communication buffer, he or she can set it up as follows prior to triggering the SMI:
As the handler executes, the value of the OpCode field will force it to fall back into the default clause, while the address field will be selected in advance by the attacker depending on the exact portion of SMRAM he or she wants to corrupt.
To mitigate this class of vulnerabilities, the SMI handler must sanitize any pointer value passed in the communication buffer prior to using it. The pointer validation can be performed in one of two ways:
SmmIsBufferOutsideSmmValid(): As was already mentioned,
SmmIsBufferOutsideSmmValid()is a utility function provided by EDK2 that checks whether or not a given buffer overlaps with SMRAM. Using it is the recommended way to sanitize external input pointers.
SmmIsBufferOutsideSmmValid(), but rather expose a similar functionality via a dedicated protocol called
AMI_SMM_BUFFER_VALIDATION_PROTOCOL. Besides the semantic differences of calling a function versus utilizing a UEFI protocol, both approaches work roughly the same. Please check out the next section to learn how to correctly import this protocol definition into IDA.
The basic idea to detect this class of vulnerabilities is to look for SMI handlers that don’t call
SmmIsBufferOutsideSmmValid() or utilize the equivalent
AMI_SMM_BUFFER_VALIDATION_PROTOCOL. However, some edge cases must also be taken into consideration. Failing to do so might introduce unwanted false positives or false negatives.
SmmIsBufferOutsideSmmValid()on the comm buffer itself: this merely guarantees that the comm buffer does not overlap with SMRAM (see Low SMRAM corruption below), but it says nothing about the nested pointers. As a result, when trying to assess the robustness of a handler against rouge pointer values, these cases should not be taken into consideration.
SmmIsBufferOutsideSmmValid()simply because the communication buffer does not hold any nested pointers, but rather other data types such as integers, boolean flags, etc. To distinguish between this benign case from the vulnerable case, we must be able to figure out the internal layout of the communication buffer.
While this can be done manually as part of the reverse engineering process, fortunately for us, nowadays automatic type reconstruction is far from being science fiction, and various tools for doing so are readily available as off-the-shelf solutions. The two most prominent and successful IDA plugins in this category are
HexRaysCodeXplorer. Using any of these tools lets you transform raw pointer access notation such as the following:
Into a more friendly and comprehensible point-to-member notation:
Even more importantly, these plugins keep track of how individual fields are being accessed. Based on the access pattern, they are fully capable of reconstructing the layout of the containing structure. This includes extrapolating the number of members, their respective sizes, types, attributes, and so on. When applied to the Comm Buffer, this method lets you quickly discover if it holds any nested pointers.
Sometimes, even calling
SmmIsBufferOutsideSmmValid() on nested pointers is not enough to make an SMI handler fully secure. The reason for this is that SMM was not designed with concurrency in mind and as a result, it suffers from some inherent race conditions, the most prominent one being TOCTOU attacks against the communication buffer. Because the comm buffer itself resides outside of SMRAM, its contents can change while the SMI handler is executing. This fact has serious security implications as it means double-fetches from it won’t necessarily yield the same values.
In an attempt to remedy this, SMM in multiprocessing environments follows what’s known as an “SMI rendezvous”. In a nutshell, once a CPU enters SMM a dedicated software preamble will send an Inter-Processor-Interrupt (IPI) to all other processors in the system. This IPI will cause them to enter SMM as well and wait there for the SMI to complete. Only then can the first processor safely call the handler function to actually service the SMI.
This scheme is highly effective in preventing other processors from meddling with the communication buffer while it is being used, but of course, CPUs are not the only entities that have access to the memory bus. As any OS 101 course teaches you, nowadays many hardware devices are capable of acting as DMA agents, meaning they can read/write memory without going through the CPU at all. These are great news performance-wise but are terribly bad news as far as firmware security is concerned.
To see how DMA operations can assist exploitation, let’s take a look at the following snippet taken from a real-life SMI handler:
As can be seen, this handler references a nested pointer that we named
field_18 in at least 3 different locations:
SmmIsBufferOutsideSmmValid()is called on the local variable to make sure it does not overlap SMRAM.
CopyMem()as the destination argument.
As was mentioned earlier, nothing guarantees consecutive reads from the comm buffer will necessarily yield the same value. That means an attacker can issue this SMI with the pointer referencing a perfectly safe location outside of SMRAM:
However, right after the SMI validates the nested pointer and just before it is being fetched again, there exists a small window of opportunity where a DMA attack can modify its value to point somewhere else. Knowing that the pointer will soon be passed to
CopyMem(), the attacker could make it point to an address in SMRAM he wants to corrupt.
If configured properly by the firmware, SMRAM should be shielded from tampering by DMA devices. To make sure that’s the case on your machine, run the smm_dma module from CHIPSEC.
Because of that, mitigating TOCTOU vulnerabilities can be performed merely by copying data from the communication buffer into local variables that reside in SMRAM. Like always, a good reference for the proper coding style is EDK2:
Once all the required pieces of data are copied into SMRAM that way, DMA attacks won’t be able to influence the execution flow of SMI handlers:
Detecting TOCTOU vulnerabilities in SMI handlers requires reconstructing the internal layout of the communication buffer, then counting how many times each field is being fetched. If the same field is being fetched twice or more by the same execution flow, chances are the respective handler is susceptible to such attacks. The severity of these issues greatly depends on the types of individual fields, with pointer fields being the most acute ones. Again, properly reconstructing the structure of the Comm Buffer greatly helps in assessing the potential risk.
As was mentioned by previous posts in the series, the de-facto standard location for SMRAM memory is the “Top Memory Segment”, often abbreviated as TSEG. Still, on many machines, a separate SMRAM region called CSEG (Compatibility Segment) co-exists with TSEG for compatibility reasons. Unlike TSEG whose location in physical memory can be programmed by the BIOS, the location of the CSEG region is fixed to the address range 0xA0000-0xBFFFF. Some legacy SMI handlers were designed with only CSEG in mind, a fact that can be abused by attackers. Below is an example of one such handler:
Unlike the handlers we reviewed so far, this SMI handler does not get its arguments via the communication buffer. Instead, it uses the
EFI_SMM_CPU_PROTOCOL to read registers from the SMM save state, created automatically by the CPU upon entering SMM. Therefore, the potential attack surface in this example is not the communication buffer, but rather the general-purpose registers of the CPU, whose values can be set almost arbitrarily prior to issuing the SMI.
The handler goes as follows:
EBXregisters from the save state.
16 * ES + (EBX & 0xFFFF).
Note that the handler essentially re-implements common utility functions such as
SmmIsBufferOutsideSmmValid(), only it does so in a poor way that completely neglects SMRAM segments other than CSEG. Theoretically, attackers can set the ES and BX registers such that the computed linear address will point to some other SMRAM region such as TSEG and will surely pass the safety checks imposed by the handler.
In practice, however, chances are this vulnerability is not realistically exploitable. The reason for this is that the maximal linear address we can reach is limited to 16 * 0xFFFF + 0xFFFF == 0x10FFEF, and experience shows that TSEG is usually located at much higher addresses. Nevertheless, it is a good thing to be aware of such handlers and the danger they impose.
Mitigating these vulnerabilities is entirely up to the developers of the SMI handler.
A good strategy to pinpoint these cases is to look for SMI handlers that make use of “magic numbers” that reference some unique characteristics of CSEG. These include immediate values such as 0xA0000 (the physical base address of CSEG), 0x1FFFF (its size), and 0xBFFFF (last addressable byte). Based on our experience, a function that uses two or more of these values is likely to have some CSEG-specific behavior and must be examined carefully to assess its potential risk.
All the bug classes described so far were centered around hijacking the SMM execution flow and corrupting SMM memory. Yet another very important category of vulnerabilities revolves around disclosing the contents of SMRAM. It is a known fact that SMRAM cannot be read from outside of SMM, which is why it is sometimes used by the firmware to store secrets that must be kept hidden from the outside world. In addition to that, disclosing the contents of SMRAM can also help with the exploitation of other vulnerabilities that require accurate knowledge of the memory layout.
A common scenario for SMRAM disclosure happens when SMM code tries to update the contents of an NVRAM variable. In UEFI, updating an NVRAM variable is not an atomic operation, but rather a composite one made out of the following steps:
GetVariable()service to read the contents of the variable into the stack buffer.
SetVariable()service to write the modified stack buffer back to NVRAM.
GetVariable(), note that the 4th parameter is used as an input-output argument. Upon entry, this argument signifies the number of bytes the caller is interested in reading, while on return it is set to the number of bytes that were read from NVRAM in practice. In case the actual size of the variable matches the expected one, both values should be the same.
A problem arises when developers implicitly assume the size of a variable to be immutable. Due to this assumption, they completely ignore the number of bytes read by
GetVariable() and just pass a hardcoded size to
SetVariable() when writing the updated contents:
Since the contents of some NVRAM variables (at least those that have the
EFI_VARIABLE_RUNTIME_ACCESS attribute) can be modified from the operating system, they can be abused to trigger information disclosures in SMM while also serving simultaneously as the exfiltration channel. Let’s see how this can be done in practice.
First, the attacker would use an OS-provided API function such as SetFirmwareEnvironmentVariable() to truncate the variable, thus making it shorter than expected. Then, it will move on to trigger the vulnerable SMI handler. The SMI handler will:
GetVariable()service to read the contents of the variable into the stack buffer. Normally, the size of the variable is equal to the size of the stack buffer, but since the attacker just truncated the variable in NVRAM, the buffer is surely longer. This in turn means it will continue to hold some uninitialized bytes even after
SetVariable()service to write back the modified stack buffer into NVRAM. Because this call is done using the hardcoded, constant size of the stack buffer, it will also write to NVRAM its uninitialized part.
To complete the process, the attacker can now use an API function such as
GetFirmwareEnvironmentVariable() to fully disclose the contents of the variable, including the bytes that originate from the uninitialized portion.
The moral of this story is that NVRAM variables are not to be trusted blindly and should be taken into account when reasoning about the attack surface of the handler. If applicable, use compiler flags such as InitAll to make sure stack buffers will be zero-initialized. More tactically, when updating the contents of NVRAM variables the code must always take into account the actual size of the variable and not rely on a static, pre-computed value.
Yet another possible direction to mitigate these issues is to limit access to NVRAM variables. This can be done either by removing the EFI_VARIABLE_RUNTIME_ACCESS attribute entirely or using a protocol such as
EDKII_VARIABLE_LOCK_PROTOCOL to make variables read-only.
It’s reasonable to assume that an NVRAM variable update operation will take place during the course of one function. That means we can usually ignore scenarios in which one function reads the variable and another one writes it. To locate these functions, after analyzing the input file with efiXplorer, navigate to the “services” tab and search for pairs of calls where
SetVariable() is immediately followed by
For each such pair of calls, check that:
SetVariable()is an immediate value
This post freely references library functions such as
SmmIsBufferOutsideSmmValid() and naively assumes we can locate them without any hassle. The problem is these functions are statically linked to the binary, and normally SMM images are stripped of any debug symbols before being shipped to end-users. Due to that, locating them inside the IDA database is quite challenging.
During our work, we researched multiple approaches to tackle this problem, including automated diffing using Diaphora as well as experimentation with some lesser-known plugins such as rizzo and fingermatch. Eventually, we decided to stick to the KISS principle and perform the matching using plain and simple heuristics that take into consideration some of the unique characteristics of the target function. Below are some rules-of-thumb for matching the functions referenced earlier. Note that we assume the binary was already analyzed by efiXplorer, which makes things a bit easier.
FreePool() is pretty straightforward. All it takes is to scan the IDA database for a function that:
gSmst->FreePool()(but never both)
SmmIsBufferOutsideSmmValid() is a bit trickier. To successfully pull this off, we need to have some background information about a UEFI protocol called
EFI_SMM_ACCESS2_PROTOCOL. This protocol is used to manage and query the visibility of SMRAM on the platform. As such, it exposes the respective methods to open, close, and lock SMRAM.
In addition to those, this protocol also exports a method called
GetCapabilities(), which can be used by clients to figure out exactly where SMRAM lives in physical memory.
Upon return, this function fills an array of
EFI_SMRAM_DESCRIPTOR structures that tell the caller what regions of SMRAM are available, what is their size, state, etc.
In EDK2, the common practice is to store these
EFI_SMRAM_DESCRIPTORS as global variables so that other functions could easily access them in the future. As you probably guessed, one of these functions is no other than
SmmIsBufferOutsideSmmValid(), which iterates over the descriptors list to decide if the caller-provided buffer is safe:
Taking this into consideration, our strategy to identify
SmmIsBufferOutsideSmmValid() would be that of reverse lookup – first, we’ll find the global SMRAM descriptors initialized by
EFI_SMM_ACCESS2_PROTOCOL and only then, based on the functions that use them, deduce who’s the most promising candidate to be
Technically, one can do so by following these simple steps:
EFI_SMM_ACCESS2_PROTOCOL. This will cause IDA to jump to the location where this GUID is utilized (usually the call to
EfiSmmAccess2Protocol) and hit ‘x’ to search for its xrefs:
GetCapabilities(), check if the 3rd parameter (the SMRAM descriptor) is a global variable. If it is, do the following:
SmramDescriptor_XXX, where XXX is an ordinal) to allow for easy reference in the future
If all three conditions are met, chances are the function you’re looking at is actually
Currently, efiXplorer does not support the definition of
AMI_SMM_BUFFER_VALIDATION_PROTOCOL out of the box, so we must import the protocol definition separately.
To accomplish this, follow these steps:
#defined manually before importing it.
“The more you look, the more you see.”
– Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance
Firmware-level attacks seem to pose a significant challenge to the security community. As part of the everlasting cat-and-mouse game between attackers and defenders, threat actors are starting to shift their spotlight to the firmware, considered by many the soft belly of the IT stack. In recent years, awareness of firmware threats is constantly increasing and some promising approaches are emerging to combat them:
Even in the face of these advancements, firmware security still bears lots and lots of issues, design flaws, and of course vulnerabilities to uncover. The ability of the security community to successfully pull this off depends on three fundamental pillars: knowledge, tooling, and diligence.
In this blog post, we were focused on promoting knowledge by shedding light on unfamiliar territory. In the next post, we’ll cover tooling and reveal:
As for diligence, unfortunately, no known recipe exists for producing such human qualities. It is, therefore, the responsibility of each and every one of us to just try our best and make sure that no stone is left unturned in this exciting and challenging domain.