07 Jan 2025 - Posted by Norbert Szetei
At Doyensec, we decided to perform a vulnerability research activity on the SMB3 Kernel Server (ksmbd), a component of the Linux kernel. Initially, it was enabled as an experimental feature, but in the kernel version 6.6, the experimental flag was removed, and it remains stable.
Ksmbd splits tasks to optimize performance, handling critical file operations in kernel space and non-performance-related tasks, such as DCE/RPC and user account management, in user space via ksmbd.mountd
. The server uses a multi-threaded architecture to efficiently process SMB requests in parallel, leveraging kernel worker threads for scalability and user-space integration for configuration and RPC handling.
Ksmbd is not enabled by default, but it is a great target for learning the SMB protocol while also exploring Linux internals, such as networking, memory management, and threading.
The ksmbd kernel component binds directly to port 445 to handle SMB traffic. Communication between the kernel and the ksmbd.mountd
user-space process occurs via the Netlink interface, a socket-based mechanism for kernel-to-user space communication in Linux. We focused on targeting the kernel directly due to its direct reachability, even though ksmbd.mountd
operates with root privileges.
The illustrative diagram of the architecture can be found here in the mailing list and is displayed below:
|--- ...
--------|--- ksmbd/3 - Client 3
|-------|--- ksmbd/2 - Client 2
| | ____________________________________________________
| | |- Client 1 |
<--- Socket ---|--- ksmbd/1 <<= Authentication : NTLM/NTLM2, Kerberos |
| | | | <<= SMB engine : SMB2, SMB2.1, SMB3, SMB3.0.2, |
| | | | SMB3.1.1 |
| | | |____________________________________________________|
| | |
| | |--- VFS --- Local Filesystem
| |
KERNEL |--- ksmbd/0(forker kthread)
---------------||---------------------------------------------------------------
USER ||
|| communication using NETLINK
|| ______________________________________________
|| | |
ksmbd.mountd <<= DCE/RPC(srvsvc, wkssvc, samr, lsarpc) |
^ | <<= configure shares setting, user accounts |
| |______________________________________________|
|
|------ smb.conf(config file)
|
|------ ksmbdpwd.db(user account/password file)
^
ksmbd.adduser ------------|
Multiple studies on this topic have been published, including those by Thalium and pwning.tech. The latter contains a detailed explanation on how to approach fuzzing from scratch using syzkaller. Although the article’s grammar is quite simple, it provides an excellent starting point for further improvements we built upon.
We began by intercepting and analyzing legitimate communication using a standard SMB client. This allowed us to extend the syzkaller grammar to include additional commands implemented in smb2pdu.c.
During fuzzing, we encountered several challenges, one of which was addressed in the pwning.tech article. Initially, we needed to tag packets to identify the syzkaller instance (procid). This tagging was required only for the first packet, as subsequent packets shared the same socket connection. To solve this, we modified the first (negotiation) request by appending 8 bytes representing the syzkaller instance number. Afterward, we sent subsequent packets without tagging.
Another limitation of syzkaller is its inability to use malloc()
for dynamic memory allocation, complicating the implementation of authentication in pseudo syscalls. To work around this, we patched the relevant authentication (NTLMv2) and packet signature verification checks, allowing us to bypass negotiation and session setup without valid signatures. This enabled the invocation of additional commands, such as ioctl processing logic.
To create more diverse and valid test cases, we initially extracted communication using strace
, or manually crafted packets. For this, we used Kaitai Struct, either through its web interface or visualizer. When a packet was rejected by the kernel, Kaitai allowed us to quickly identify and resolve the issue.
During our research, we identified multiple security issues, three of which are described in this post. These vulnerabilities share a common trait - they can be exploited without authentication during the session setup phase. Exploiting them requires a basic understanding of the communication process.
During KSMBD initialization (whether built into the kernel or as an external module), the startup function create_socket()
is called to listen for incoming traffic:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/transport_tcp.c#L484
ret = kernel_listen(ksmbd_socket, KSMBD_SOCKET_BACKLOG);
if (ret) {
pr_err("Port listen() error: %d\n", ret);
goto out_error;
}
The actual data handling occurs in the ksmbd_tcp_new_connection()
function and the spawned per-connection threads (ksmbd:%u
). This function also allocates the struct ksmbd_conn
, representing the connection:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/transport_tcp.c#L203
static int ksmbd_tcp_new_connection(struct socket *client_sk)
{
// ..
handler = kthread_run(ksmbd_conn_handler_loop,
KSMBD_TRANS(t)->conn,
"ksmbd:%u",
ksmbd_tcp_get_port(csin));
// ..
}
The ksmbd_conn_handler_loop
is crucial as it handles reading, validating and processing SMB protocol messages (PDUs). In the case where there are no errors, it calls one of the more specific processing functions:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/connection.c#L395
if (default_conn_ops.process_fn(conn)) {
pr_err("Cannot handle request\n");
break;
}
The processing function adds a SMB request to the worker thread queue:
// ksmbd_server_process_request
static int ksmbd_server_process_request(struct ksmbd_conn *conn)
{
return queue_ksmbd_work(conn);
}
This occurs inside queue_ksmbd_work
, which allocates the ksmbd_work
structure that wraps the session, connection, and all SMB-related data, while also performing early initialization.
In the Linux kernel, adding a work item to a workqueue requires initializing it with the INIT_WORK()
macro, which links the item to a callback function to be executed when processed. Here, this is performed as follows:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/server.c#L312
INIT_WORK(&work->work, handle_ksmbd_work);
ksmbd_queue_work(work);
We are now close to processing SMB PDU operations. The final step is for handle_ksmbd_work
to extract the command number from the request
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/server.c#L213
rc = __process_request(work, conn, &command);
and execute the associated command handler.
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/server.c#L108
static int __process_request(struct ksmbd_work *work, struct ksmbd_conn *conn,
u16 *cmd)
{
// ..
command = conn->ops->get_cmd_val(work);
*cmd = command;
// ..
cmds = &conn->cmds[command];
// ..
ret = cmds->proc(work);
Here is the list of the procedures that are invoked:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/smb2ops.c#L171
[SMB2_NEGOTIATE_HE] = { .proc = smb2_negotiate_request, },
[SMB2_SESSION_SETUP_HE] = { .proc = smb2_sess_setup, },
[SMB2_TREE_CONNECT_HE] = { .proc = smb2_tree_connect,},
[SMB2_TREE_DISCONNECT_HE] = { .proc = smb2_tree_disconnect,},
[SMB2_LOGOFF_HE] = { .proc = smb2_session_logoff,},
[SMB2_CREATE_HE] = { .proc = smb2_open},
[SMB2_QUERY_INFO_HE] = { .proc = smb2_query_info},
[SMB2_QUERY_DIRECTORY_HE] = { .proc = smb2_query_dir},
[SMB2_CLOSE_HE] = { .proc = smb2_close},
[SMB2_ECHO_HE] = { .proc = smb2_echo},
[SMB2_SET_INFO_HE] = { .proc = smb2_set_info},
[SMB2_READ_HE] = { .proc = smb2_read},
[SMB2_WRITE_HE] = { .proc = smb2_write},
[SMB2_FLUSH_HE] = { .proc = smb2_flush},
[SMB2_CANCEL_HE] = { .proc = smb2_cancel},
[SMB2_LOCK_HE] = { .proc = smb2_lock},
[SMB2_IOCTL_HE] = { .proc = smb2_ioctl},
[SMB2_OPLOCK_BREAK_HE] = { .proc = smb2_oplock_break},
[SMB2_CHANGE_NOTIFY_HE] = { .proc = smb2_notify},
After explaining how the PDU function is reached, we can move on to discussing the resulting bugs.
The vulnerability stems from improper synchronization in the management of the sessions_table
in ksmbd. Specifically, the code lacks a sessions_table_lock
to protect concurrent access during both session expiration and session registration. This issue introduces a race condition, where multiple threads can access and modify the sessions_table
simultaneously, leading to a Use-After-Free (UAF) in cache kmalloc-512
.
The sessions_table
is implemented as a hash table and it stores all active SMB sessions for a connection, using session identifier (sess->id
) as the key.
During the session registration, the following flow happens:
ksmbd_expire_session
to remove expired sessions to avoids stale sessions consuming resources.Operations on this table, such as adding (hash_add
) and removing sessions (hash_del
), lack proper synchronization, creating a race condition.
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/smb2pdu.c#L1663
int smb2_sess_setup(struct ksmbd_work *work)
{
// ..
ksmbd_conn_lock(conn);
if (!req->hdr.SessionId) {
sess = ksmbd_smb2_session_create(); // [1]
if (!sess) {
rc = -ENOMEM;
goto out_err;
}
rsp->hdr.SessionId = cpu_to_le64(sess->id);
rc = ksmbd_session_register(conn, sess); // [2]
if (rc)
goto out_err;
conn->binding = false;
At [1]
, the session is created, by allocating the sess
object:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L381
sess = kzalloc(sizeof(struct ksmbd_session), GFP_KERNEL);
if (!sess)
return NULL;
At this point, during a larger number of simultaneous connections, some sessions can expire. As the ksmbd_session_register
at [2]
is invoked, it calls ksmbd_expire_session
[3]
:
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L192
int ksmbd_session_register(struct ksmbd_conn *conn,
struct ksmbd_session *sess)
{
sess->dialect = conn->dialect;
memcpy(sess->ClientGUID, conn->ClientGUID, SMB2_CLIENT_GUID_SIZE);
ksmbd_expire_session(conn); // [3]
return xa_err(xa_store(&conn->sessions, sess->id, sess, GFP_KERNEL));
}
Since there is no table locking implemented, the expired sess
object could be removed from the table ([4]
) and deallocated ([5]
):
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L173
static void ksmbd_expire_session(struct ksmbd_conn *conn)
{
unsigned long id;
struct ksmbd_session *sess;
down_write(&conn->session_lock);
xa_for_each(&conn->sessions, id, sess) {
if (atomic_read(&sess->refcnt) == 0 &&
(sess->state != SMB2_SESSION_VALID ||
time_after(jiffies,
sess->last_active + SMB2_SESSION_TIMEOUT))) {
xa_erase(&conn->sessions, sess->id);
hash_del(&sess->hlist); // [4]
ksmbd_session_destroy(sess); // [5]
continue;
}
}
up_write(&conn->session_lock);
}
However, in another thread, the cleanup could be invoked when the connection is terminated in ksmbd_server_terminate_conn
by calling ksmbd_sessions_deregister
, operating on the same table and without the appropriate lock ([6]
):
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/mgmt/user_session.c#L213
void ksmbd_sessions_deregister(struct ksmbd_conn *conn)
{
struct ksmbd_session *sess;
unsigned long id;
down_write(&sessions_table_lock);
// .. ignored, since the connection is not binding
up_write(&sessions_table_lock);
down_write(&conn->session_lock);
xa_for_each(&conn->sessions, id, sess) {
unsigned long chann_id;
struct channel *chann;
xa_for_each(&sess->ksmbd_chann_list, chann_id, chann) {
if (chann->conn != conn)
ksmbd_conn_set_exiting(chann->conn);
}
ksmbd_chann_del(conn, sess);
if (xa_empty(&sess->ksmbd_chann_list)) {
xa_erase(&conn->sessions, sess->id);
hash_del(&sess->hlist); // [6]
ksmbd_session_destroy(sess);
}
}
up_write(&conn->session_lock);
}
One possible flow is outlined here:
Thread A | Thread B
---------------------------------|-----------------------------
ksmbd_session_register |
ksmbd_expire_session |
| ksmbd_server_terminate_conn
| ksmbd_sessions_deregister
ksmbd_session_destroy(sess) | |
| | |
hash_del(&sess->hlist); | |
kfree(sess); | |
| hash_del(&sess->hlist);
When enabling KASAN, the issue was manifested by the following crashes:
BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_init include/linux/list.h:1016 [inline]
BUG: KASAN: slab-use-after-free in hash_del include/linux/hashtable.h:107 [inline]
BUG: KASAN: slab-use-after-free in ksmbd_sessions_deregister+0x569/0x5f0 fs/smb/server/mgmt/user_session.c:247
Write of size 8 at addr ffff888126050c70 by task ksmbd:51780/39072
BUG: KASAN: slab-use-after-free in hlist_add_head include/linux/list.h:1034 [inline]
BUG: KASAN: slab-use-after-free in __session_create fs/smb/server/mgmt/user_session.c:420 [inline]
BUG: KASAN: slab-use-after-free in ksmbd_smb2_session_create+0x74a/0x750 fs/smb/server/mgmt/user_session.c:432
Write of size 8 at addr ffff88816df5d070 by task kworker/5:2/139
Both issues result in an out-of-bounds (OOB) write at offset 112.
The vulnerability was introduced in the commit 7aa8804c0b, when implementing the reference count for sessions to avoid UAF:
// https://github.com/torvalds/linux/blob/7aa8804c0b67b3cb263a472d17f2cb50d7f1a930/fs/smb/server/server.c
send:
if (work->sess)
ksmbd_user_session_put(work->sess);
if (work->tcon)
ksmbd_tree_connect_put(work->tcon);
smb3_preauth_hash_rsp(work); // [8]
if (work->sess && work->sess->enc && work->encrypted &&
conn->ops->encrypt_resp) {
rc = conn->ops->encrypt_resp(work);
if (rc < 0)
conn->ops->set_rsp_status(work, STATUS_DATA_ERROR);
}
ksmbd_conn_write(work);
Here, the ksmbd_user_session_put
decrements the sess->refcnt
and if the value reaches zero, the kernel is permitted to free the sess
object ([7]
):
// https://github.com/torvalds/linux/blob/7aa8804c0b67b3cb263a472d17f2cb50d7f1a930/fs/smb/server/mgmt/user_session.c#L296
void ksmbd_user_session_put(struct ksmbd_session *sess)
{
if (!sess)
return;
if (atomic_read(&sess->refcnt) <= 0)
WARN_ON(1);
else
atomic_dec(&sess->refcnt); // [7]
}
The smb3_preauth_hash_rsp
function ([8]
) that follows accesses the sess
object without verifying if it has been freed ([9]
):
// https://github.com/torvalds/linux/blob/7aa8804c0b67b3cb263a472d17f2cb50d7f1a930/fs/smb/server/smb2pdu.c#L8859
if (le16_to_cpu(rsp->Command) == SMB2_SESSION_SETUP_HE && sess) {
__u8 *hash_value;
if (conn->binding) {
struct preauth_session *preauth_sess;
preauth_sess = ksmbd_preauth_session_lookup(conn, sess->id);
if (!preauth_sess)
return;
hash_value = preauth_sess->Preauth_HashValue;
} else {
hash_value = sess->Preauth_HashValue; // [9]
if (!hash_value)
return;
}
ksmbd_gen_preauth_integrity_hash(conn, work->response_buf,
hash_value);
}
This can result in a use-after-free (UAF) condition when accessing the freed object, as detected by KASAN:
BUG: KASAN: slab-use-after-free in smb3_preauth_hash_rsp (fs/smb/server/smb2pdu.c:8875)
Read of size 8 at addr ffff88812f5c8c38 by task kworker/0:9/308
After reporting the bugs and confirming the fix, we identified another issue when sending a large number of packets. Each time queue_ksmbd_work
is invoked during a socket connection, it allocates data through ksmbd_alloc_work_struct
// https://elixir.bootlin.com/linux/v6.11/source/fs/smb/server/ksmbd_work.c#L21
struct ksmbd_work *ksmbd_alloc_work_struct(void)
{
struct ksmbd_work *work = kmem_cache_zalloc(work_cache, GFP_KERNEL);
// ..
}
In SMB, credits are designed to control the number of requests a client can send. However, the affected code executed before enforcing the credit limits.
After approximately two minutes of sending these packets through a remote socket, the system consistently encountered a kernel panic and restarted:
[ 287.957806] Out of memory and no killable processes...
[ 287.957813] Kernel panic - not syncing: System is deadlocked on memory
[ 287.957824] CPU: 2 UID: 0 PID: 2214 Comm: ksmbd:52086 Tainted: G B 6.12.0-rc5-00181-g6c52d4da1c74-dirty #26
[ 287.957848] Tainted: [B]=BAD_PAGE
[ 287.957854] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 287.957863] Call Trace:
[ 287.957869] <TASK>
[ 287.957876] dump_stack_lvl (lib/dump_stack.c:124 (discriminator 1))
[ 287.957895] panic (kernel/panic.c:354)
[ 287.957913] ? __pfx_panic (kernel/panic.c:288)
[ 287.957932] ? out_of_memory (mm/oom_kill.c:1170)
[ 287.957964] ? out_of_memory (mm/oom_kill.c:1169)
[ 287.957989] out_of_memory (mm/oom_kill.c:74 mm/oom_kill.c:1169)
[ 287.958014] ? mutex_trylock (./arch/x86/include/asm/atomic64_64.h:101 ./include/linux/atomic/atomic-arch-fallback.h:4296 ./include/linux/atomic/atomic-long.h:1482 ./include/linux/atomic/atomic-instrumented.h:4458 kernel/locking/mutex.c:129 kernel/locking/mutex.c:152 kernel/locking/mutex.c:1092)
The reason was that the ksmbd kept creating threads, and after forking more than 2000 threads, the ksmbd_work_cache
depleted available memory.
This could be confirmed by using slabstat
or inspecting /proc/slabinfo
. The number of active objects steadily increased, eventually exhausting kernel memory and causing the system to restart:
# ps auxww | grep -i ksmbd | wc -l
2069
# head -2 /proc/slabinfo; grep ksmbd_work_cache /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ksmbd_work_cache 16999731 16999731 384 21 2 : tunables 0 0 0 : slabdata 809511 809511 0
This issue was not identified by syzkaller but was uncovered through manual testing with the triggering code.
Even though syzkaller identified and triggered two of the vulnerabilities, it failed to generate a reproducer, requiring manual analysis of the crash reports. These issues were accessible without authentication and further improvements in fuzzing are likely to uncover additional bugs either from complex locking mechanisms that are difficult to implement correctly or other factors. Due to time constraints, we did not attempt to create a fully working exploit for the UAF.