An out-of-bounds read/write in FreeBSD’s bhyve hypervisor. The vulnerability here is in the E82545 gigabit ethernet controller’s emulator, specifically e82545_transmit()
. As the name suggests, it’s responsible for transmitting packets, and will iterate a ring buffer containing packet descriptors and write out iovecs. The two types of descriptors that are important are (d)ata descriptors that contain payload data, and (c)ontext descriptors which describe header length, payload length, and checksum offsets. If TCP segment offloading is enabled, the packet header length from the context descriptor gets used.
e82545_transmit()
has to validate that the checksum offset fields (ck_off
) don’t go beyond the bounds of the header. In the case of TCP packets, it’s validated. In non-TCP packets (such as UDP), the checksum offset isn’t validated and OOB R/W is possible. OOB read occurs when the pseudo-header checksum is saved, and an OOB write happens when it goes to carry the checksum over. Effectively, this gives them a constrained 2 byte write primitive at an offset of 0-255 (as of the offset is only 8-bits wide).
Exploitation was a bit tricky as the header is too far away from the return address or saved base pointer on the stack. However, the pointer to the header itself (hdr
) can be corrupted, and is used in two iteration loops later on to construct packet data and update IP headers. While they demonstrate the first loop can be used to leak stack contents, it’s unnecessary as FreeBSD 13’s bhyve doesn’t have ASLR by default. The second loop can be used to get a more useful relative write of a DWORD to smash the return address and ROP. As the function that calls e82545_transmit()
doesn’t return, they couldn’t target the saved frame pointer to easily stack pivot, so they used their write primitive multiple times to build a small ROP chain that pivots the stack to their controlled hdr
which will continue the chain to call system()
.
The Vulnerability
A fairly complex exploit of a use-after-free in netfilter. The vuln is detailed more in other posts linked off by exodus, but effectively the bug is a lifetime issue with netfilter sets that don’t have the NFT_EXPR_STATEFUL
flag set but contain a reference to another set (such as lookup
and dynset
expressions). If the expression associated to the set doesn’t have the NFT_EXPR_STATEFUL
flag set, it aborts and destroys the expression, but the referenced set’s binding list isn’t updated to remove the reference. Whenever the binding list is updated (ie. to add or remove something), UAF occurs as it’ll dereference a dangling pointer to update the doubly linked list. This gives an attacker the ability to write a node in the linked list into a freed space.
In kernel v5.18, these objects are allocated in the kmalloc-cg-*
caches as they’re allocated with GFP_KERNEL_ACCOUNT
flag set. This makes exploitation a little trickier than previous kernel versions where it was in a generic cache and could be targeted with more universal objects for primitives.
Infoleak heap pointer
The exploit chain involves three stages. Infoleak to leak a heap pointer, to leak a .text pointer to defeat kASLR, and finally code execution. The infoleak involves triggering the UAF on a dynset
expression and overlapping with a msg_msg
object (which was moved to the kmalloc-cg
cache in 5.14). A pointer to the netfilter set
object gets written into an overlapped msg_msg
’s msg_data
which will be infoleaked when msgrcv()
is called.
Infoleak kernel text pointer
A kernel text pointer is much trickier, as the ops
field of dynset
can’t be leaked since it falls inside of the msg_msg
header upon overlap. For leaking a text pointer, a lookup
object is used for the initial UAF and it’s overlapped with an fdtable
, which aligns the linked list entry with the tablesopen_fds
pointer. It’s a good candidate object because they can spray by forking child processes and trigger a free on open_fds
by terminating the child processes to get a partial free. They then trigger another UAF on a dynset
object, get that pointer overwritten into fdtable->open_fds
, and spray msg_msg
objects to occupy the now free’d dynset
. Finally, the child processes are terminated and get the dynset
free’d, which frees part of the sysv msg_msg
, which they then replace with a time_namespace
object that contains an ops
table pointer.
Code exec
Code execution is achieved by using the partial free via fdtable
to corrupt a set
object’s ops
pointer. This kickstarts a ROP chain to overwrite MODPROBE_PATH
to get root.