How the MSVC Compiler Generates XFG Function Prototype Hashes
2020-11-12 08:00:00 Author: blog.quarkslab.com(查看原文) 阅读量:161 收藏

Microsoft is currently working on Xtended Flow Guard (XFG), an evolved version of Control Flow Guard (CFG), their own control flow integrity implementation. XFG works by restricting indirect control flow transfers based on type-based hashes of function prototypes. This blog post is a deep dive into how the MSVC compiler generates those XFG function prototype hashes.

Introduction

In 2014, Microsoft introduced a Control Flow Integrity (CFI) solution called Control Flow Guard (CFG). CFG has been extensively studied in the past. Over time, a number of ways to bypass CFG were devised; some of these bypasses relied on implementation issues (such as the integration with JIT compilers, or the availability of sensitive APIs that were subject to abuse), but as such they were eventually addressed. But on the contrary, one design issue remained alive: CFG didn't offer any granularity over the valid call targets. Any protected indirect call was allowed to call any valid call target. In large binaries, valid call targets could easily be in the thousands, giving attackers plenty of flexibility to bypass CFG by chaining valid C++ virtual functions (see for example the exploitation technique known as Counterfeit Object-oriented Programming (COOP)).

Fast forward a few years. Microsoft has been working on an improved version of CFG, called Xtended Flow Guard (XFG). XFG offers a finer-grained CFI, by restricting indirect calls/jumps through type signature checks. The key concept behind XFG is that a type signature-based hash is assigned at compile time to those functions which can be the destination of an indirect call/jump. Then, at XFG-instrumented indirect call sites, a hash check is performed: only functions with the expected signature hash are allowed.

Some weeks ago, researcher Connor McGarr published a blog post named Exploit Development: Between a Rock and a (Xtended Flow) Guard Place: Examining XFG explaining how XFG works, as well as its potential weaknesses. This sparked my curiosity, so I decided to fire up IDA Pro and Windbg to understand how XFG hashes are generated.

As of this writing, XFG is present in Windows 10 Insider Preview builds, under the Dev Channel. In order to compile programs with XFG support, you need Visual Studio 2019 Preview.

The analysis in this blog post is based on the following versions of the binaries from Visual Studio 2019 Preview, version 16.8.0 Preview 2.1:

  • c1.dll version 19.28.29213.0
  • c2.dll version 19.28.29213.0

This blog post focuses on how XFG hashes are generated for C source code. Although the hashing algorithm for C++ code looks similar at first glance, we haven't looked into its specifics. Since this is a rather long article, the content is divided into several sections: first, we start with a quick primer on XFG hashes. Then, we analyze how functions are hashed, followed by a detailed view of how different C types are hashed. Finally, we inspect some final transformations that are applied to the computed hashes, and we conclude with a hands-on hash calculation exercise.

A short primer on XFG hashes

Let's start with a very simple C program defining a function pointer type named FPTR ([1]), which declares a function taking two float arguments and returning another float. Function main declares a function pointer variable named fptr, of type FPTR, which is set to the address of function foo ([2]), whose prototype matches the FPTR type. Finally, at [3], the function to which fptr points is called, passing values 1.00001 and 2.00002 as parameters.

    #include <stdio.h>

[1] typedef float (* FPTR)(float, float);


    float foo(float val1, float val2){
        printf("I received float values %f and %f\n", val1, val2);
        return (val2 - val1);
    }


    int main(int argc, char **argv){
[2]     FPTR fptr = foo;

        printf("Calling function pointer...\n");
[3]     fptr(1.00001, 2.00002);
        return 0;
    }

We compile the source code above from the x64 Native Tools Command Prompt for VS 2019 Preview with the following command line. Notice that we are using the /guard:xfg flag to enable XFG.

> cl /Zi /guard:xfg example1.c

The disassembly of the resulting main function is shown below:

main      ; int __cdecl main(int argc, const char **argv, const char **envp)
main
main      var_18          = qword ptr -18h
main      var_10          = qword ptr -10h
main      arg_0           = dword ptr  8
main      arg_8           = qword ptr  10h
main
main          mov     [rsp+arg_8], rdx
main+5        mov     [rsp+arg_0], ecx
main+9        sub     rsp, 38h
main+D        lea     rax, foo
main+14       mov     [rsp+38h+var_18], rax
main+19       lea     rcx, aCallingFunctio ; "Calling function pointer...\n"
main+20       call    printf
main+25       mov     rax, [rsp+38h+var_18]
main+2A       mov     [rsp+38h+var_10], rax
main+2F       mov     r10, 99743F3270D52870h
main+39       movss   xmm1, cs:__real@40000054
main+41       movss   xmm0, cs:__real@3f800054
main+49       mov     rax, [rsp+38h+var_10]
main+4E       call    cs:__guard_xfg_dispatch_icall_fptr
main+54       xor     eax, eax
main+56       add     rsp, 38h
main+5A       retn
main+5A   main            endp

We can see at main+0x2F that the R10 register is set to the expected type-based hash (0x99743F3270D52870) for the function pointer call that follows at main+0x4E. The function to be called through the function pointer is foo, and we can verify that its prototype hash (given by the 8 bytes preceding the beginning of the function) matches the expected one, meaning that function foo is a valid target for the indirect call at main+0x4E. Well, to be precise the prototype hash located 8 bytes before the foo function (0x99743F3270D52871) matches the expected hash we have seen in the R10 register (0x99743F3270D52870) except for the bit 0:

.text:0000000140001008                 dq 99743F3270D52871h
foo
foo      ; =============== S U B R O U T I N E ================================
foo      ; float __fastcall foo(float val1, float val2)
foo      foo             proc near               ; DATA XREF: main+D
foo
foo      arg_0           = dword ptr  8
foo      arg_8           = dword ptr  10h
foo
foo          movss   [rsp+arg_8], xmm1
foo+6        movss   [rsp+arg_0], xmm0
foo+C        sub     rsp, 28h
foo+10       cvtss2sd xmm0, [rsp+28h+arg_8]
foo+16       cvtss2sd xmm1, [rsp+28h+arg_0]
foo+1C       movaps  xmm2, xmm0
foo+1F       movq    r8, xmm2
foo+24       movq    rdx, xmm1
foo+29       lea     rcx, _Format    ; "I received float values %f and %f\n"
foo+30       call    printf
foo+35       movss   xmm0, [rsp+28h+arg_8]
foo+3B       subss   xmm0, [rsp+28h+arg_0]
foo+41       add     rsp, 28h
foo+45       retn
foo+45   foo             endp

But don't worry about this discrepancy, because at the very beginning of the XFG dispatch function (ntdll!LdrpDispatchUserCallTargetXFG) the bit 0 of R10 is set, resulting in the difference on bit 0 between the expected hash and the function hash not being meaningful:

LdrpDispatchUserCallTargetXFG      LdrpDispatchUserCallTargetXFG proc near
LdrpDispatchUserCallTargetXFG      ; __unwind { // LdrpICallHandler
LdrpDispatchUserCallTargetXFG          or      r10, 1
LdrpDispatchUserCallTargetXFG+4        test    al, 0Fh
LdrpDispatchUserCallTargetXFG+6        jnz     short loc_180094337
LdrpDispatchUserCallTargetXFG+8        test    ax, 0FFFh
LdrpDispatchUserCallTargetXFG+C        jz      short loc_180094337
LdrpDispatchUserCallTargetXFG+E        cmp     r10, [rax-8]
LdrpDispatchUserCallTargetXFG+12       jnz     short loc_180094337
LdrpDispatchUserCallTargetXFG+14       jmp     rax

Hashing function types

The MSVC compiler is composed of two stages: a front end and a back end. The front end is language-specific: it reads in source code, lexes, parses, does semantic analysis and emits an IL (intermediate language). The back end is specific to the target architecture: it reads the IL generated by the front end, it performs optimizations and generates code for a given architecture.

The generation of the function prototype hash is left to the language front end. This means that when compiling C code, the C front end (c1.dll) is in charge of generating the prototype hash, while when compiling C++ code, the C++ front end (c1xx.dll) is charged with this task.

Once the prototype hash has been produced by the corresponding language front end, some final transformations are performed by the compiler back end (the x64 back end in our case, c2.dll). In the following sections we'll detail every step of the creation of the prototype hashes while compiling C code.

When compiling C source code with the /guard:xfg flag, the compiler front end calls the c1!XFGHelper__ComputeHash_1 function in order to calculate the prototype hash of a function being processed.

The c1!XFGHelper__ComputeHash_1 function creates an object of type XFGHelper::XFGHasher, which is in charge of collecting type information for the function being processed, and producing the prototype hash, based on the collected type information. The XFGHelper::XFGHasher uses an instance of std::vector to store all the type information that will be hashed, and it offers a number of methods that are called throughout the process of building the hash:

  • XFGHelper::XFGHasher::add_function_type()
  • XFGHelper::XFGHasher::add_type()
  • XFGHelper::XFGHasher::get_hash()
  • XFGHelper::XFGTypeHasher::compute_hash()
  • XFGHelper::XFGTypeHasher::hash_indirection()
  • XFGHelper::XFGTypeHasher::hash_tag()
  • XFGHelper::XFGTypeHasher::hash_primitive()

After initializing an instance of XFGHelper::XFGHasher, the XFGHelper__ComputeHash_1 function calls XFGHelper::XFGHasher::add_function_type(), passing as parameters the instance of XFGHelper::XFGHasher and a Type_t object containing the type information about the function being hashed.

XFGHelper__ComputeHash_1      XFGHelper__ComputeHash_1 proc near
XFGHelper__ComputeHash_1
XFGHelper__ComputeHash_1      arg_0           = qword ptr  8
XFGHelper__ComputeHash_1      arg_8           = qword ptr  10h
XFGHelper__ComputeHash_1      arg_10          = qword ptr  18h
[...]
XFGHelper__ComputeHash_1+79        xorps   xmm0, xmm0
XFGHelper__ComputeHash_1+7C        movdqu  cs:xfg_hasher, xmm0 ; zero inits xfg_hasher
[...]
XFGHelper__ComputeHash_1+B1        mov     rdx, rbp        ; rdx = Type_t containing function information
XFGHelper__ComputeHash_1+B4        lea     rbp, xfg_hasher
XFGHelper__ComputeHash_1+BB        mov     rcx, rbp
XFGHelper__ComputeHash_1+BE        call    XFGHelper::XFGHasher::add_function_type(Type_t const *,XFGHelper::VirtualInfoFromDeclspec)
XFGHelper__ComputeHash_1+C3        mov     rdx, rsi        ; rdx = function->return_type (struct Type_t *)
XFGHelper__ComputeHash_1+C6        mov     rcx, rbp        ; this
XFGHelper__ComputeHash_1+C9        call    XFGHelper::XFGHasher::add_type(Type_t const *) ; (step 5)

Function XFGHelper::XFGHasher::add_function_type will retrieve 4 pieces of information about the function being hashed, and after returning from XFGHelper::XFGHasher::add_function_type one more piece of information is added via a call to XFGHelper::XFGHasher::add_type, as we can see at XFGHelper__ComputeHash_1+C9 in the disassembly listing above. These pieces of information are stored in the std::vector owned by the XFGHelper::XFGHasher instance:

  1. 4 bytes indicating the number of parameters of the function;
  2. 8 bytes per function parameter, holding the hash of the type of said parameter;
  3. 1 byte indicating whether the function is variadic or not (i.e. if takes a variable number of arguments);
  4. 4 bytes specifying the calling convention used by the function;
  5. 8 bytes holding the hash of the return type of the function.

Component 1: Number of parameters

The XFGHelper::XFGHasher::add_function_type function starts by adding a DWORD to the std::vector indicating the number of parameters of the function. Notice that this number can be influenced by the function accepting a variable number of arguments, or having virtual information from __declspec (I suspect that this may be some reused code from the XFG implementation for C++, and thus it doesn't really apply to C code, although I haven't confirmed it). In short, the number of parameters considered here will be the real number of parameters declared in the function prototype, minus 1 if the function takes a variable number of arguments, minus 1 again if the function has virtual information from __declspec.

XFGHelper::XFGHasher::add_function_type+18        mov     rsi, [rdx+10h]  ; rsi = function_info->FunctionTypeInfo
XFGHelper::XFGHasher::add_function_type+1C        mov     rbx, rcx
XFGHelper::XFGHasher::add_function_type+1F        mov     rcx, rsi        ; this
XFGHelper::XFGHasher::add_function_type+22        movzx   r14d, r8b
XFGHelper::XFGHasher::add_function_type+26        mov     r15, rdx
XFGHelper::XFGHasher::add_function_type+29        call    FunctionTypeInfo_t::RealNumberOfParameters(void)
XFGHelper::XFGHasher::add_function_type+2E        mov     rcx, rsi        ; this
XFGHelper::XFGHasher::add_function_type+31        mov     r9d, eax        ; r9 = real_number_of_params
XFGHelper::XFGHasher::add_function_type+34        call    FunctionTypeInfo_t::IsVarArgsFunction(void)
XFGHelper::XFGHasher::add_function_type+39        mov     rdx, [rbx+8]
XFGHelper::XFGHasher::add_function_type+3D        lea     rbp, [r9-1]     ; rbp = real_number_of_params - 1
XFGHelper::XFGHasher::add_function_type+41        test    al, al          ; is variadic function?
XFGHelper::XFGHasher::add_function_type+43        mov     rcx, rbx
XFGHelper::XFGHasher::add_function_type+46        cmovz   rbp, r9         ; if not variadic, rbp = real_number_of_params
XFGHelper::XFGHasher::add_function_type+4A        test    r8b, r8b        ; does it have virtual info from __declspec?
XFGHelper::XFGHasher::add_function_type+4D        lea     r9, [rsp+48h+arg_14]
XFGHelper::XFGHasher::add_function_type+52        lea     r8, [rsp+48h+arg_10]
XFGHelper::XFGHasher::add_function_type+57        lea     eax, [rbp-1]    ; number of params = rbp - 1
XFGHelper::XFGHasher::add_function_type+5A        cmovz   eax, ebp        ; if no virtual info from __declspec, number of params = rbp
XFGHelper::XFGHasher::add_function_type+5D        mov     [rsp+48h+arg_10], eax ; value to add = number of params (dword)
XFGHelper::XFGHasher::add_function_type+5D                     ; [step 1]
XFGHelper::XFGHasher::add_function_type+61        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Component 2: Type hash of each parameter

Next, XFGHelper::XFGHasher::add_function_type enters a loop in which it computes a hash of the type of each function parameter, adding each type hash (8 bytes) to the std::vector.

There's special handling for a couple of edge cases (type & 0x10f == 0x103, type & 0x103 == 0x101), but for most parameter types it will fall back to loc_180105541. At that location, the Type_t object representing the type of the parameter being processed is cleaned of qualifiers (such as const (0x800) and volatile (0x40)) if needed (call to Type_t::clearModifiersAndQualifiers) and then the 8-byte hash of the parameter type is added to the std::vector, via the call to XFGHelper::XFGHasher::add_type that we can see below at XFGHelper::XFGHasher::add_function_type+CC. If you're wondering how exactly XFGHelper::XFGHasher::add_type computes a hash for a given Type_t, you'll find the details later, under the "Hashing types" section.

Finally, if there are more parameters to hash, it jumps back to the beginning of the loop.

XFGHelper::XFGHasher::add_function_type+6E   loc_1801054F6:
XFGHelper::XFGHasher::add_function_type+6E        mov     rax, [rsi]      ; rax = &function_info->params
XFGHelper::XFGHasher::add_function_type+71        mov     rcx, [rax+rdi*8] ; rcx = function_info->params[i] (Type_t)
XFGHelper::XFGHasher::add_function_type+75        mov     edx, [rcx]      ; edx = params[i].type
XFGHelper::XFGHasher::add_function_type+77        mov     eax, edx
XFGHelper::XFGHasher::add_function_type+79        and     eax, 10Fh
XFGHelper::XFGHasher::add_function_type+7E        cmp     eax, 103h       ; params[i].type & 0x10f == 0x103 ?
XFGHelper::XFGHasher::add_function_type+83        jnz     short loc_18010552C
XFGHelper::XFGHasher::add_function_type+85        cmp     edx, 8103h      ; params[i].type == 0x8103 ?
XFGHelper::XFGHasher::add_function_type+8B        jz      short loc_18010554E
XFGHelper::XFGHasher::add_function_type+8D        mov     r8d, [rcx+4]
XFGHelper::XFGHasher::add_function_type+91        lea     edx, [rax-1]
XFGHelper::XFGHasher::add_function_type+94        mov     rcx, [rcx+8]
XFGHelper::XFGHasher::add_function_type+98        btr     r8d, 1Fh
XFGHelper::XFGHasher::add_function_type+9D        call    Type_t::createType(Type_t const *,uint,mod_t,bool)
XFGHelper::XFGHasher::add_function_type+A2        jmp     short loc_18010554B
XFGHelper::XFGHasher::add_function_type+A4   ; --------------------------------------------------------------
XFGHelper::XFGHasher::add_function_type+A4
XFGHelper::XFGHasher::add_function_type+A4   loc_18010552C:
XFGHelper::XFGHasher::add_function_type+A4        and     edx, 103h
XFGHelper::XFGHasher::add_function_type+AA        cmp     edx, 101h       ; params[i].type & 0x103 == 0x101 ?
XFGHelper::XFGHasher::add_function_type+B0        jnz     short loc_180105541
XFGHelper::XFGHasher::add_function_type+B2        call    Type_t::decayFunctionType(void)
XFGHelper::XFGHasher::add_function_type+B7        jmp     short loc_18010554B
XFGHelper::XFGHasher::add_function_type+B9   ; --------------------------------------------------------------
XFGHelper::XFGHasher::add_function_type+B9
XFGHelper::XFGHasher::add_function_type+B9   loc_180105541:
XFGHelper::XFGHasher::add_function_type+B9        mov     edx, 8C0h       ; discards qualifiers 0x800 (const) | 0x80 | 0x40 (volatile)
XFGHelper::XFGHasher::add_function_type+BE        call    Type_t::clearModifiersAndQualifiers(mod_t)
XFGHelper::XFGHasher::add_function_type+C3
XFGHelper::XFGHasher::add_function_type+C3   loc_18010554B:
XFGHelper::XFGHasher::add_function_type+C3                     ; XFGHelper::XFGHasher::add_function_type+B7↑j
XFGHelper::XFGHasher::add_function_type+C3        mov     rcx, rax
XFGHelper::XFGHasher::add_function_type+C6
XFGHelper::XFGHasher::add_function_type+C6   loc_18010554E:
XFGHelper::XFGHasher::add_function_type+C6        mov     rdx, rcx        ; struct Type_t *
XFGHelper::XFGHasher::add_function_type+C9        mov     rcx, rbx        ; this
XFGHelper::XFGHasher::add_function_type+CC        call    XFGHelper::XFGHasher::add_type(Type_t const *) ; adds hash of params[i] type
XFGHelper::XFGHasher::add_function_type+CC                     ; [step 2]
XFGHelper::XFGHasher::add_function_type+D1        inc     rdi
XFGHelper::XFGHasher::add_function_type+D4        cmp     rdi, rbp        ; counter < number_of_params ?
XFGHelper::XFGHasher::add_function_type+D7        jb      short loc_1801054F6 ; if so, loop

Component 3: Variadic function

The next step is adding a single byte to the std::vector, indicating whether the function accepts a variable number of arguments or not. In most cases, when the function does not contain virtual information from __declspec, the following code path is taken:

XFGHelper::XFGHasher::add_function_type+D9        mov     rcx, rsi        ; this = functioninfo
XFGHelper::XFGHasher::add_function_type+DC        call    FunctionTypeInfo_t::IsVarArgsFunction(void)
XFGHelper::XFGHasher::add_function_type+E1        mov     r8b, al         ; r8b = is_var_args_function
XFGHelper::XFGHasher::add_function_type+E4        test    r14b, r14b      ; contains virtual info from __declspec?
XFGHelper::XFGHasher::add_function_type+E7        jz      short loc_1801055EB
[...]
XFGHelper::XFGHasher::add_function_type+163  loc_1801055EB:
XFGHelper::XFGHasher::add_function_type+163        mov     rdx, [rbx+8]
XFGHelper::XFGHasher::add_function_type+167        lea     r9, [rsp+48h+arg_10+1]
XFGHelper::XFGHasher::add_function_type+16C        mov     byte ptr [rsp+48h+arg_10], r8b ; value to add = is_var_args_function (byte)
XFGHelper::XFGHasher::add_function_type+16C        ; [step 3]
XFGHelper::XFGHasher::add_function_type+171        mov     rcx, rbx
XFGHelper::XFGHasher::add_function_type+174        lea     r8, [rsp+48h+arg_10]
XFGHelper::XFGHasher::add_function_type+179        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Component 4: Calling convention

Finally, XFGHelper::XFGHasher::add_function_type adds a 4-byte value to the std::vector, indicating the calling convention used by the function. There are not a lot of calling conventions on the Intel x64 architecture (unlike its x86 counterpart): the default x64 calling convention passes integer arguments in registers RCX, RDX, R8, and R9, while floating point arguments are passed through XMM0-XMM3. This default calling convention is internally represented by the value 0x201, but since it is masked with & 0x0F before saving it to the std::vector (see disassembly below), you will most likely see a DWORD with value 0x00000001 for the calling convention.

For the record, although the MSVC x64 compiler typically ignores specifiers such as __cdecl and __stdcall, there's at least one way to obtain a value different than 0x201 for the calling convention: the __vectorcall calling convention is internally represented by value 0x208, meaning that after being masked with & 0x0F, a DWORD with value 0x00000008 will be written to the std::vector.

The code in charge of adding the calling convention data to the std::vector is show below.

XFGHelper::XFGHasher::add_function_type+17E        mov     eax, [r15+4]    ; eax = function_info->calling_convention
XFGHelper::XFGHasher::add_function_type+182        lea     r9, [rsp+48h+arg_14]
XFGHelper::XFGHasher::add_function_type+187        mov     rdx, [rbx+8]
XFGHelper::XFGHasher::add_function_type+18B        lea     r8, [rsp+48h+arg_10]
XFGHelper::XFGHasher::add_function_type+190        and     eax, 0Fh        ; eax = calling_convention & 0xF
XFGHelper::XFGHasher::add_function_type+193        mov     rcx, rbx
XFGHelper::XFGHasher::add_function_type+196        mov     [rsp+48h+arg_10], eax ; value to add = calling_convention & 0xF (size = dword)
XFGHelper::XFGHasher::add_function_type+196                      ; [step 4]
XFGHelper::XFGHasher::add_function_type+19A        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Component 5: Hash of return type

The fifth and final component of the data that will be used to obtain the function prototype hash is not retrieved within the XFGHelper::XFGHasher::add_function_type; instead, it is added right after returning from it. As you can see in the code below, it calls XFGHelper::XFGHasher::add_type, which computes an 8-byte hash for the Type_t representing the return type, and adds those 8 bytes of the hash to the std::vector.

XFGHelper__ComputeHash_1+BE        call    XFGHelper::XFGHasher::add_function_type(Type_t const *,XFGHelper::VirtualInfoFromDeclspec)
XFGHelper__ComputeHash_1+C3        mov     rdx, rsi        ; rdx = function->return_type (struct Type_t *)
XFGHelper__ComputeHash_1+C6        mov     rcx, rbp        ; this
XFGHelper__ComputeHash_1+C9        call    XFGHelper::XFGHasher::add_type(Type_t const *) ; (step 5)

Final step: hashing the collected prototype data

If the function contains virtual information from __declspec, an additional 8-byte type hash is generated from that information and added to the std::vector. However, I wasn't able to hit this special case during my tests; as stated before, virtual information probably doesn't apply to C code.

Regardless of the presence or absence of virtual information from __declspec, the XFGHelper__ComputeHash_1 function finishes by calling the XFGHelper::XFGHasher::get_hash function:

XFGHelper__ComputeHash_1+CE        test    rbx, rbx        ; contains virtual info from __declspec?
XFGHelper__ComputeHash_1+D1        jz      short loc_1801052EF
[...]
XFGHelper__ComputeHash_1+103  loc_1801052EF:
XFGHelper__ComputeHash_1+103                  mov     rcx, rbp        ; this
XFGHelper__ComputeHash_1+106                  mov     rbx, [rsp+38h+arg_0]
XFGHelper__ComputeHash_1+10B                  mov     rbp, [rsp+38h+arg_8]
XFGHelper__ComputeHash_1+110                  mov     rsi, [rsp+38h+arg_10]
XFGHelper__ComputeHash_1+115                  add     rsp, 30h
XFGHelper__ComputeHash_1+119                  pop     rdi
XFGHelper__ComputeHash_1+11A                  jmp     XFGHelper::XFGHasher::get_hash(void)
XFGHelper__ComputeHash_1+11A  XFGHelper__ComputeHash_1 endp

XFGHelper::XFGHasher::get_hash hashes the type data that has been collected in the std::vector. The hashing algorithm of choice is SHA256, and as we can observe below at XFGHelper::XFGHasher::get_hash+5F, it only returns the first 8 bytes of the resulting SHA256 digest:

XFGHelper::XFGHasher::get_hash(void)      public: unsigned __int64 XFGHelper::XFGHasher::get_hash(void)const proc near
[...]
XFGHelper::XFGHasher::get_hash(void)+18        mov     dl, 3           ; algorithm_ids[3] == CALG_SHA_256
XFGHelper::XFGHasher::get_hash(void)+1A        lea     rcx, [rsp+58h+hHash] ; phHash
XFGHelper::XFGHasher::get_hash(void)+1F        call    HashAPIWrapper::HashAPIWrapper(uchar)
XFGHelper::XFGHasher::get_hash(void)+24        nop
XFGHelper::XFGHasher::get_hash(void)+25        mov     r8, [rbx+8]
XFGHelper::XFGHasher::get_hash(void)+29        sub     r8, [rbx]       ; dwDataLen
XFGHelper::XFGHasher::get_hash(void)+2C        xor     r9d, r9d        ; dwFlags
XFGHelper::XFGHasher::get_hash(void)+2F        mov     rdx, [rbx]      ; pbData
XFGHelper::XFGHasher::get_hash(void)+32        mov     rcx, [rsp+58h+hHash] ; hHash
XFGHelper::XFGHasher::get_hash(void)+37        call    cs:__imp_CryptHashData
XFGHelper::XFGHasher::get_hash(void)+3D        test    eax, eax
XFGHelper::XFGHasher::get_hash(void)+3F        jnz     short loc_180105822
[...]
XFGHelper::XFGHasher::get_hash(void)+4A   loc_180105822:
XFGHelper::XFGHasher::get_hash(void)+4A        mov     r8d, 20h ; ' '  ; unsigned int
XFGHelper::XFGHasher::get_hash(void)+50        lea     rdx, [rsp+58h+sha256_digest] ; unsigned __int8 *
XFGHelper::XFGHasher::get_hash(void)+55        lea     rcx, [rsp+58h+hHash] ; this
XFGHelper::XFGHasher::get_hash(void)+5A        call    HashAPIWrapper::GetHash(uchar *,ulong)
XFGHelper::XFGHasher::get_hash(void)+5F        mov     rbx, qword ptr [rsp+58h+sha256_digest] ; *** only returns first 8 bytes of SHA256 hash
XFGHelper::XFGHasher::get_hash(void)+64        mov     rcx, [rsp+58h+hHash] ; hHash
XFGHelper::XFGHasher::get_hash(void)+69        call    cs:__imp_CryptDestroyHash
XFGHelper::XFGHasher::get_hash(void)+6F        test    eax, eax
XFGHelper::XFGHasher::get_hash(void)+71        jnz     short loc_180105854
[...]
XFGHelper::XFGHasher::get_hash(void)+7C   loc_180105854:
XFGHelper::XFGHasher::get_hash(void)+7C        mov     rax, rbx
XFGHelper::XFGHasher::get_hash(void)+7F        mov     rcx, [rsp+58h+var_10]
XFGHelper::XFGHasher::get_hash(void)+84        xor     rcx, rsp        ; StackCookie
XFGHelper::XFGHasher::get_hash(void)+87        call    __security_check_cookie
XFGHelper::XFGHasher::get_hash(void)+8C        add     rsp, 50h
XFGHelper::XFGHasher::get_hash(void)+90        pop     rbx
XFGHelper::XFGHasher::get_hash(void)+91        retn

Hashing types

So far we know that a function prototype hash is built based on 5 pieces of information. Three of them are plain values (number of parameters, a boolean value indicating if the function is variadic, and a number representing the calling convention in use), but the other two components are type hashes themselves (type hash for each function parameter, and hash of the return type). In this section we'll see how types (represented internally by the compiler with a Type_t object) are hashed.

Types are hashed within the XFGHelper::XFGHasher::add_type function. It calls XFGHelper__GetHashForType, which returns an 8-byte hash of the type, and then that 8-byte hash is stored in the std::vector via a call to std::vector::_Insert_range().

.text:00000001801056A0 public: void XFGHelper::XFGHasher::add_type(class Type_t const *) proc near
.text:00000001801056A0 arg_0           = qword ptr  8
.text:00000001801056A0 arg_8           = byte ptr  10h
.text:00000001801056A0
.text:00000001801056A0        push    rbx
.text:00000001801056A2        sub     rsp, 30h
.text:00000001801056A6        mov     rbx, rcx
.text:00000001801056A9        mov     rcx, rdx        ; rcx = Type_t
.text:00000001801056AC        call    XFGHelper__GetHashForType
.text:00000001801056B1        mov     rdx, [rbx+8]
.text:00000001801056B5        lea     r9, [rsp+38h+arg_8]
.text:00000001801056BA        lea     r8, [rsp+38h+arg_0]
.text:00000001801056BF        mov     [rsp+38h+arg_0], rax ; value to add = hash (qword)
.text:00000001801056C4        mov     rcx, rbx
.text:00000001801056C7        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
.text:00000001801056CC        add     rsp, 30h
.text:00000001801056D0        pop     rbx
.text:00000001801056D1        retn

Let's see how XFGHelper__GetHashForType generates an 8-byte hash for a given Type_t. First of all, it checks if the hash for the given type already exists in a cache that it holds, via the call to std:Tree::emplace() that we can observe at XFGHelper__GetHashForType+AF. If that is the case, it simply returns the cached type hash; this way it avoids computing over and over again the hash for types that have already been calculated.

On the other hand, if the type hash is not found in the cache, it proceeds to compute it from scratch by calling XFGHelper::XFGTypeHasher::compute_hash, which builds an std::vector with the type data to be hashed, and finally calls XFGHelper::XFGHasher::get_hash, which as we already know from the previous section, produces a SHA256 digest of the data contained in the std::vector and returns only the first 8 bytes of that digest.

XFGHelper__GetHashForType      XFGHelper__GetHashForType proc near
[...]
XFGHelper__GetHashForType+A3        lea     r9, [rbp+arg_8]
XFGHelper__GetHashForType+A7        lea     r8, [rbp+Type_t]
XFGHelper__GetHashForType+AB        lea     rdx, [rbp+xfg_type_hasher]
XFGHelper__GetHashForType+AF        call    std::_Tree<std::_Tmap_traits<Type_t const *,unsigned __int64,std::less<Type_t const *>,std::allocator<std::pair<Type_t const * const,unsigned __int64>>,0>>::_Emplace<Type_t const * &,int>(Type_t const * &,int &&)
XFGHelper__GetHashForType+B4        mov     rbx, qword ptr [rbp+xfg_type_hasher]
XFGHelper__GetHashForType+B8        cmp     byte ptr [rbp+xfg_type_hasher+8], 0 ; hash for type was found in cache?
XFGHelper__GetHashForType+BC        jz      short loc_18010544D ; if so, just return the cached hash
XFGHelper__GetHashForType+BE        xor     edi, edi        ; otherwise, compute the hash of the type
XFGHelper__GetHashForType+C0        xorps   xmm0, xmm0
XFGHelper__GetHashForType+C3        movdqu  [rbp+xfg_type_hasher], xmm0
XFGHelper__GetHashForType+C8        and     [rbp+var_10], rdi
XFGHelper__GetHashForType+CC        mov     [rbp+var_8], 1
XFGHelper__GetHashForType+D0        mov     rdx, [rbp+Type_t] ; struct Type_t *
XFGHelper__GetHashForType+D4        lea     rcx, [rbp+xfg_type_hasher] ; this
XFGHelper__GetHashForType+D8        call    XFGHelper::XFGTypeHasher::compute_hash(Type_t const *)
XFGHelper__GetHashForType+DD        nop
XFGHelper__GetHashForType+DE        cmp     [rbp+var_8], dil
XFGHelper__GetHashForType+E2        jz      short loc_180105434
XFGHelper__GetHashForType+E4        lea     rcx, [rbp+xfg_type_hasher] ; this
XFGHelper__GetHashForType+E8        call    XFGHelper::XFGHasher::get_hash(void)
[...]

These are the pieces of information that XFGHelper::XFGTypeHasher::compute_hash collects about a given type:

  1. 1 byte value derived from the type qualifiers (fetched from offset 4 of the Type_t object);
  2. 1 byte indicating what kind of type it is (pointer, union/struct/enum, or primitive type);
  3. some type-specific data, depending on which one of the three type groups mentioned in 2) (pointer, union/struct/enum, or primitive type) the type belongs to.

We'll dig into the details of these three pieces of information in the following sub-sections.

Component 1: Type qualifiers

The first piece of information about a type is its qualifiers, which are stored as a DWORD at offset 4 of a Type_t object. In particular, information about the const (0x800) and volatile (0x40) qualifiers are combined into a single byte that is written to the std::vector. The first bit of this new byte indicates if the const qualifier is present, while the second bit indicates if the volatile qualifier is present.

XFGHelper::XFGTypeHasher::compute_hash+1B        call    Type_t::getFirstNonArrayType(void)
XFGHelper::XFGTypeHasher::compute_hash+20        mov     rcx, rdi        ; this
XFGHelper::XFGTypeHasher::compute_hash+23        mov     r8d, [rax+4]    ; r8d = Type_t->qualifiers
XFGHelper::XFGTypeHasher::compute_hash+27        shr     r8d, 0Bh
XFGHelper::XFGTypeHasher::compute_hash+2B        and     r8b, 1
XFGHelper::XFGTypeHasher::compute_hash+2F        movzx   r9d, r8b        ; r9d = (Type_t->qualifiers >> 0xB) & 1 (has_const_qualifier)
XFGHelper::XFGTypeHasher::compute_hash+33        call    Type_t::getFirstNonArrayType(void)
XFGHelper::XFGTypeHasher::compute_hash+38        lea     r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+3C        mov     edx, [rax+4]    ; edx = Type_t->qualifiers
XFGHelper::XFGTypeHasher::compute_hash+3F        mov     al, r9b         ; al = has_const_qualifier
XFGHelper::XFGTypeHasher::compute_hash+42        or      al, 2           ; al = has_const_qualifier | 2
XFGHelper::XFGTypeHasher::compute_hash+44        and     dl, 40h         ; dl = Type_t->qualifiers & 0x40 (has_volatile_qualifier)
XFGHelper::XFGTypeHasher::compute_hash+47        movzx   ecx, al         ; qualifiers_info = has_const_qualifier | 2
XFGHelper::XFGTypeHasher::compute_hash+4A        mov     rdx, [rbx+8]
XFGHelper::XFGTypeHasher::compute_hash+4E        cmovz   ecx, r9d        ; if it doesn't have volatile qualifier, then
XFGHelper::XFGTypeHasher::compute_hash+4E                     ; qualifiers_info = has_const_qualifier
XFGHelper::XFGTypeHasher::compute_hash+52        lea     r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+56        mov     [rbp+arg_0], cl ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::compute_hash+59        mov     rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+5C        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Component 2: Type group

If the type value stored in Type_t has 0x100 set, then it is a pointer. This is signaled by writing a byte with value 3 to the std::vector.

XFGHelper::XFGTypeHasher::compute_hash+61        test    dword ptr [rdi], 100h ; *Type_t & 0x100 == 0 ?
XFGHelper::XFGTypeHasher::compute_hash+67        jz      short loc_180105762
XFGHelper::XFGTypeHasher::compute_hash+69        mov     rdx, [rbx+8]    ; if not, it's a pointer
XFGHelper::XFGTypeHasher::compute_hash+6D        lea     r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+71        lea     r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+75        mov     [rbp+arg_0], 3  ; value to insert: POINTER_TYPE (3)
XFGHelper::XFGTypeHasher::compute_hash+79        mov     rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+7C        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

If the type is not a pointer, it then checks if it's a union, a struct or an enum, by checking if the type value stored in Type_t & 0x600 is not 0. Note that 0x600 is built upon 0x200 | 0x400, where 0x200 identifies enum types and 0x400 identifies structs and unions. If this is the case, a byte with value 2 is written to the std::vector.

XFGHelper::XFGTypeHasher::compute_hash+8E   loc_180105762:
XFGHelper::XFGTypeHasher::compute_hash+8E        test    dword ptr [rdi], 600h ; *Type_t & (0x400 | 0x200) == 0 ?
XFGHelper::XFGTypeHasher::compute_hash+94        jz      short loc_180105790
XFGHelper::XFGTypeHasher::compute_hash+96        mov     rdx, [rbx+8]    ; if not, it's a union/struct/enum
XFGHelper::XFGTypeHasher::compute_hash+9A        lea     r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+9E        lea     r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+A2        mov     [rbp+arg_0], 2  ; value to insert: UNION_STRUCT_OR_ENUM_TYPE (2)
XFGHelper::XFGTypeHasher::compute_hash+A6        mov     rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+A9        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Finally, if the type isn't a pointer nor a union/struct/enum, the default case is taken. If the type is generic, then nothing is written to the std::vector (but this is an edge case, affecting only those types with value 0x1000 set, and the type identified with value 0x8103). Otherwise, for the vast majority of primitive types, a byte with value 1 is added to the std::vector.

XFGHelper::XFGTypeHasher::compute_hash+BC   loc_180105790:
XFGHelper::XFGTypeHasher::compute_hash+BC        mov     rcx, rdi        ; this
XFGHelper::XFGTypeHasher::compute_hash+BF        call    Type_t::isGeneric(void)
XFGHelper::XFGTypeHasher::compute_hash+C4        test    al, al
XFGHelper::XFGTypeHasher::compute_hash+C6        jz      short loc_1801057A2
XFGHelper::XFGTypeHasher::compute_hash+C8        mov     byte ptr [rbx+18h], 0
XFGHelper::XFGTypeHasher::compute_hash+CC        jmp     short epilog
XFGHelper::XFGTypeHasher::compute_hash+CE   loc_1801057A2:
XFGHelper::XFGTypeHasher::compute_hash+CE        mov     rdx, [rbx+8]
XFGHelper::XFGTypeHasher::compute_hash+D2        lea     r9, [rbp+arg_1]
XFGHelper::XFGTypeHasher::compute_hash+D6        lea     r8, [rbp+arg_0]
XFGHelper::XFGTypeHasher::compute_hash+DA        mov     [rbp+arg_0], 1  ; value to insert: PRIMITIVE_TYPE (1)
XFGHelper::XFGTypeHasher::compute_hash+DE        mov     rcx, rbx
XFGHelper::XFGTypeHasher::compute_hash+E1        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Component 3: Type-specific data

Hashing of pointer types

For pointer types, after writing a byte with value 3 to the std::vector, the XFGHelper::XFGTypeHasher::hash_indirection function is called. Have in mind that the definition of pointer here is a bit broader, since it includes all those Type_t objects whose values have 0x100 set. Besides regular C pointers, that includes a kind of internal function object (referenced by function pointers), and arrays.

XFGHelper::XFGTypeHasher::compute_hash+81        mov     rdx, rdi        ; struct Type_t *
XFGHelper::XFGTypeHasher::compute_hash+84        mov     rcx, rbx        ; this
XFGHelper::XFGTypeHasher::compute_hash+87        call    XFGHelper::XFGTypeHasher::hash_indirection
XFGHelper::XFGTypeHasher::compute_hash+8C        jmp     short epilog

As its name implies, function XFGHelper::XFGTypeHasher::hash_indirection adds the hash of the type referenced by a pointer to the std::vector. Its behavior varies depending on the type of pointer it's dealing with:

  • If it's either a function pointer (Type_t value of 0x106) or a "general" pointer with Type_t value 0x102 (used for pointers of most types, except for function pointers), it adds the hash of the Type_t referenced by the pointer by calling XFGHelper::XFGHasher::add_type, plus a byte with value 2. In the case of function pointers, the Type_t referenced by the pointer is a kind of internal function object with Type_t value of 0x101, which means that it's also handled within XFGHelper::XFGTypeHasher::hash_indirection.
XFGHelper::XFGTypeHasher::hash_indirection+15        mov     ecx, [rdx]      ; ecx = *Type_t
XFGHelper::XFGTypeHasher::hash_indirection+17        mov     eax, ecx
XFGHelper::XFGTypeHasher::hash_indirection+19        and     eax, 10Fh
[...]
XFGHelper::XFGTypeHasher::hash_indirection+25        sub     eax, 1          ; case 0x102 (general pointer):
XFGHelper::XFGTypeHasher::hash_indirection+28        jz      short loc_1801058E3
[...]
XFGHelper::XFGTypeHasher::hash_indirection+2F        cmp     eax, 3          ; case 0x106 (function pointer):
XFGHelper::XFGTypeHasher::hash_indirection+32        jz      short loc_1801058E3
[...]
XFGHelper::XFGTypeHasher::hash_indirection+6B   loc_1801058E3:
XFGHelper::XFGTypeHasher::hash_indirection+6B        mov     dil, 2          ; will be written to std::vector
XFGHelper::XFGTypeHasher::hash_indirection+6E        jmp     short loc_1801058F6
[...]
XFGHelper::XFGTypeHasher::hash_indirection+7E   loc_1801058F6:
XFGHelper::XFGTypeHasher::hash_indirection+7E        mov     rdx, [rsi+8]    ; rdx = ptr to the Type_t referenced by the pointer
XFGHelper::XFGTypeHasher::hash_indirection+7E                     ; (return type in the case of functions)
XFGHelper::XFGTypeHasher::hash_indirection+82        mov     rcx, rbx        ; this
XFGHelper::XFGTypeHasher::hash_indirection+85        call    XFGHelper::XFGHasher::add_type
XFGHelper::XFGTypeHasher::hash_indirection+8A        mov     rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+8E        lea     r9, [rsp+38h+arg_8+1]
XFGHelper::XFGTypeHasher::hash_indirection+93        lea     r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+98        mov     byte ptr [rsp+38h+arg_8], dil ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::hash_indirection+9D        mov     rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+A0        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
  • If it's a function object (identified by a Type_t value of 0x101, typically referenced by a function pointer with Type_t value of 0x106), it adds the hash of the function prototype by calling the XFGHelper::XFGHasher::add_function_type function, whose inner workings we have already dissected, plus the hash of the return type of the function, plus a byte with value 1.
XFGHelper::XFGTypeHasher::hash_indirection+15        mov     ecx, [rdx]      ; ecx = *Type_t
XFGHelper::XFGTypeHasher::hash_indirection+17        mov     eax, ecx
XFGHelper::XFGTypeHasher::hash_indirection+19        and     eax, 10Fh
XFGHelper::XFGTypeHasher::hash_indirection+1E        sub     eax, 101h       ; case 0x101 (function):
XFGHelper::XFGTypeHasher::hash_indirection+23        jz      short loc_1801058E8
[...]
XFGHelper::XFGTypeHasher::hash_indirection+70        xor     r8d, r8d
XFGHelper::XFGTypeHasher::hash_indirection+73        mov     rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+76        mov     dil, 1          ; this is written to std::vector at the end of this function
XFGHelper::XFGTypeHasher::hash_indirection+79        call    XFGHelper::XFGHasher::add_function_type(Type_t const *,XFGHelper::VirtualInfoFromDeclspec)
XFGHelper::XFGTypeHasher::hash_indirection+7E
XFGHelper::XFGTypeHasher::hash_indirection+7E   loc_1801058F6:
XFGHelper::XFGTypeHasher::hash_indirection+7E                     ; XFGHelper::XFGTypeHasher::hash_indirection+6E↑j
XFGHelper::XFGTypeHasher::hash_indirection+7E        mov     rdx, [rsi+8]    ; rdx = ptr to the Type_t referenced by the pointer
XFGHelper::XFGTypeHasher::hash_indirection+7E                     ; (return type in the case of functions)
XFGHelper::XFGTypeHasher::hash_indirection+82        mov     rcx, rbx        ; this
XFGHelper::XFGTypeHasher::hash_indirection+85        call    XFGHelper::XFGHasher::add_type
XFGHelper::XFGTypeHasher::hash_indirection+8A        mov     rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+8E        lea     r9, [rsp+38h+arg_8+1]
XFGHelper::XFGTypeHasher::hash_indirection+93        lea     r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+98        mov     byte ptr [rsp+38h+arg_8], dil ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::hash_indirection+9D        mov     rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+A0        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
  • Finally, if it's an array (identified by Type_t value 0x103), it writes a QWORD with the number of elements in the array, plus the hash of the type of the array elements, plus a single byte with value 6.
XFGHelper::XFGTypeHasher::hash_indirection+15        mov     ecx, [rdx]      ; ecx = *Type_t
XFGHelper::XFGTypeHasher::hash_indirection+17        mov     eax, ecx
XFGHelper::XFGTypeHasher::hash_indirection+19        and     eax, 10Fh
[...]
XFGHelper::XFGTypeHasher::hash_indirection+2A        sub     eax, 1          ; case 0x103 (array passed by pointer):
XFGHelper::XFGTypeHasher::hash_indirection+2D        jz      short loc_1801058B2
[...]
XFGHelper::XFGTypeHasher::hash_indirection+3A   loc_1801058B2:
XFGHelper::XFGTypeHasher::hash_indirection+3A        lea     eax, [rcx-4103h]
XFGHelper::XFGTypeHasher::hash_indirection+40        mov     dil, 6          ; will be written to std::vector
XFGHelper::XFGTypeHasher::hash_indirection+43        test    eax, 0FFFFBFFFh
XFGHelper::XFGTypeHasher::hash_indirection+48        jz      short loc_1801058AC
XFGHelper::XFGTypeHasher::hash_indirection+4A        mov     rax, [rdx+10h]  ; rax = number of elems in array
XFGHelper::XFGTypeHasher::hash_indirection+4E        lea     r9, [rsp+38h+arg_10]
XFGHelper::XFGTypeHasher::hash_indirection+53        mov     rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+57        lea     r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+5C        mov     rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+5F        mov     [rsp+38h+arg_8], rax ; value to insert: number of elems in array (size = qword)
XFGHelper::XFGTypeHasher::hash_indirection+64        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
XFGHelper::XFGTypeHasher::hash_indirection+69        jmp     short loc_1801058F6
[...]
XFGHelper::XFGTypeHasher::hash_indirection+7E   loc_1801058F6
XFGHelper::XFGTypeHasher::hash_indirection+7E        mov     rdx, [rsi+8]    ; rdx = ptr to the Type_t referenced by the pointer
XFGHelper::XFGTypeHasher::hash_indirection+7E                     ; (return type in the case of functions)
XFGHelper::XFGTypeHasher::hash_indirection+82        mov     rcx, rbx        ; this
XFGHelper::XFGTypeHasher::hash_indirection+85        call    XFGHelper::XFGHasher::add_type
XFGHelper::XFGTypeHasher::hash_indirection+8A        mov     rdx, [rbx+8]
XFGHelper::XFGTypeHasher::hash_indirection+8E        lea     r9, [rsp+38h+arg_8+1]
XFGHelper::XFGTypeHasher::hash_indirection+93        lea     r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_indirection+98        mov     byte ptr [rsp+38h+arg_8], dil ; value to insert (size = byte)
XFGHelper::XFGTypeHasher::hash_indirection+9D        mov     rcx, rbx
XFGHelper::XFGTypeHasher::hash_indirection+A0        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Hashing of union/struct/enum types

When dealing with unions/structs/enums, after writing a byte with value 2 to the std::vector, function XFGHelper::XFGTypeHasher::compute_hash calls XFGHelper::XFGTypeHasher::hash_tag, passing as argument in RDX a pointer to a Symbol_t object containing the human-readable name of the union/struct/enum type.

XFGHelper::XFGTypeHasher::compute_hash+AE        mov     rdx, [rdi+10h]  ; struct Symbol_t *
XFGHelper::XFGTypeHasher::compute_hash+B2        mov     rcx, rbx        ; this
XFGHelper::XFGTypeHasher::compute_hash+B5        call    XFGHelper::XFGTypeHasher::hash_tag(Symbol_t *)

XFGHelper::XFGTypeHasher::hash_tag calls XFGHelper::XFGHasher::add_string, which adds the name of the union/struct/enum to the std::vector (if the union/struct/enum is a named one). On the contrary, if the union/struct/enum is an anonymous one, it adds the string "<unnamed>" to the std::vector.

XFGHelper::XFGHasher::add_string      public: void XFGHelper::XFGHasher::add_string(class Symbol_t *) proc near
XFGHelper::XFGHasher::add_string           sub     rsp, 38h
XFGHelper::XFGHasher::add_string+4         cmp     byte ptr [rdx+11h], 4
XFGHelper::XFGHasher::add_string+8         jnz     short loc_18010568B
XFGHelper::XFGHasher::add_string+A         mov     r8, [rdx]
XFGHelper::XFGHasher::add_string+D         mov     eax, [r8+10h]
XFGHelper::XFGHasher::add_string+11        shr     eax, 16h
XFGHelper::XFGHasher::add_string+14        test    al, 1           ; union/struct/enum is named?
XFGHelper::XFGHasher::add_string+16        jz      short loc_180105674
XFGHelper::XFGHasher::add_string+18        lea     r9, aUnnamed+9  ; ""
XFGHelper::XFGHasher::add_string+1F        lea     r8, aUnnamed    ; "<unnamed>"
XFGHelper::XFGHasher::add_string+26
XFGHelper::XFGHasher::add_string+26   loc_180105666:
XFGHelper::XFGHasher::add_string+26        mov     rdx, [rcx+8]
XFGHelper::XFGHasher::add_string+2A        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)
XFGHelper::XFGHasher::add_string+2F        add     rsp, 38h
XFGHelper::XFGHasher::add_string+33        retn
XFGHelper::XFGHasher::add_string+34   ; ---------------------------------------------------------------------------
XFGHelper::XFGHasher::add_string+34
XFGHelper::XFGHasher::add_string+34   loc_180105674:
XFGHelper::XFGHasher::add_string+34        mov     r8, [r8+8]      ; r8 = union/struct/enum name
XFGHelper::XFGHasher::add_string+38        or      r9, 0FFFFFFFFFFFFFFFFh
XFGHelper::XFGHasher::add_string+3C
XFGHelper::XFGHasher::add_string+3C   loc_18010567C:
XFGHelper::XFGHasher::add_string+3C        inc     r9
XFGHelper::XFGHasher::add_string+3F        cmp     byte ptr [r8+r9], 0
XFGHelper::XFGHasher::add_string+44        jnz     short loc_18010567C
XFGHelper::XFGHasher::add_string+46        add     r9, r8          ; r9 points to end of string
XFGHelper::XFGHasher::add_string+49        jmp     short loc_180105666

After that, there's a code branch in function XFGHelper::XFGTypeHasher::hash_tag that can add the string "<local>" to the data to be hashed under some condition. I didn't investigate much into this, but it probably handles the case of locally-scoped unions/structs/enums.

XFGHelper::XFGTypeHasher::hash_tag+4D        mov     rbx, [rbx+18h]
XFGHelper::XFGTypeHasher::hash_tag+51        test    rbx, rbx
XFGHelper::XFGTypeHasher::hash_tag+54        jnz     short loc_180105A16
XFGHelper::XFGTypeHasher::hash_tag+56        jmp     short loc_180105A76
XFGHelper::XFGTypeHasher::hash_tag+58   ; ---------------------------------------------------------------------------
XFGHelper::XFGTypeHasher::hash_tag+58
XFGHelper::XFGTypeHasher::hash_tag+58   loc_180105A5C:
XFGHelper::XFGTypeHasher::hash_tag+58        mov     rdx, [rdi+8]
XFGHelper::XFGTypeHasher::hash_tag+5C        lea     r9, aLocal+7    ; ""
XFGHelper::XFGTypeHasher::hash_tag+63        lea     r8, aLocal      ; "<local>"
XFGHelper::XFGTypeHasher::hash_tag+6A        mov     rcx, rdi
XFGHelper::XFGTypeHasher::hash_tag+6D        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Hashing of primitive types

When handling primitive types (those who don't have 0x100, 0x200 nor 0x400 set in its Type_t value), after writing a byte with value 1 to the std::vector, function XFGHelper::XFGTypeHasher::compute_hash calls XFGHelper::XFGTypeHasher::hash_primitive.

XFGHelper::XFGTypeHasher::hash_primitive is basically a big switch statement, mapping Type_t values to a different set of constants representing primitive types. The resulting constant (a single byte) is then added to the std::vector. For example, for the float type, represented by Type_t 0x26, this function adds a byte with value 0x0B to the std::vector.

XFGHelper::XFGTypeHasher::hash_primitive      private: void XFGHelper::XFGTypeHasher::hash_primitive(class Type_t const *) proc near
XFGHelper::XFGTypeHasher::hash_primitive           sub     rsp, 38h
XFGHelper::XFGTypeHasher::hash_primitive+4         mov     eax, [rdx]
XFGHelper::XFGTypeHasher::hash_primitive+6         mov     r10, rcx
XFGHelper::XFGTypeHasher::hash_primitive+9         and     eax, 1FFFh
XFGHelper::XFGTypeHasher::hash_primitive+E         cmp     eax, 40h ; '@'
XFGHelper::XFGTypeHasher::hash_primitive+11        ja      loc_1801059D4
XFGHelper::XFGTypeHasher::hash_primitive+17        jz      loc_1801059D0   ; case 0x40:
XFGHelper::XFGTypeHasher::hash_primitive+1D        cmp     eax, 1Ah
XFGHelper::XFGTypeHasher::hash_primitive+20        ja      short loc_18010599E
[...]
XFGHelper::XFGTypeHasher::hash_primitive+6E   loc_18010599E:
XFGHelper::XFGTypeHasher::hash_primitive+6E        sub     eax, 1Bh        ; case 0x1B:
XFGHelper::XFGTypeHasher::hash_primitive+71        jz      short loc_1801059CC
XFGHelper::XFGTypeHasher::hash_primitive+73        sub     eax, 1          ; case 0x1C:
XFGHelper::XFGTypeHasher::hash_primitive+76        jz      short loc_1801059C8
XFGHelper::XFGTypeHasher::hash_primitive+78        sub     eax, 2          ; case 0x1E:
XFGHelper::XFGTypeHasher::hash_primitive+7B        jz      short loc_1801059C4
XFGHelper::XFGTypeHasher::hash_primitive+7D        sub     eax, 8          ; case 0x26 (float):
XFGHelper::XFGTypeHasher::hash_primitive+80        jz      short loc_1801059C0
[...]
XFGHelper::XFGTypeHasher::hash_primitive+90   loc_1801059C0:
XFGHelper::XFGTypeHasher::hash_primitive+90        mov     cl, 0Bh         ; primitive_type = 0xB (float)
XFGHelper::XFGTypeHasher::hash_primitive+92        jmp     short loc_1801059DE
[...]
XFGHelper::XFGTypeHasher::hash_primitive+AE   loc_1801059DE:
XFGHelper::XFGTypeHasher::hash_primitive+AE        mov     rdx, [r10+8]
XFGHelper::XFGTypeHasher::hash_primitive+B2        lea     r9, [rsp+38h+arg_9]
XFGHelper::XFGTypeHasher::hash_primitive+B7        mov     [rsp+38h+arg_8], cl ; value to add: primitive_type
XFGHelper::XFGTypeHasher::hash_primitive+BB        lea     r8, [rsp+38h+arg_8]
XFGHelper::XFGTypeHasher::hash_primitive+C0        mov     rcx, r10
XFGHelper::XFGTypeHasher::hash_primitive+C3        call    std::vector<uchar>::_Insert_range<uchar const *>(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<uchar>>>,uchar const *,uchar const *,std::forward_iterator_tag)

Final transformations to the hash

So far we have described in depth how the C compiler front end calculates the hash of a function prototype for XFG purposes. If we had to summarize it with some Python-like pseudo-code, we could say that the hash of a function is built this way:

hash =  sha256(number_of_params +

              type_hash(params[0]) +
              type_hash(params[...]) +
              type_hash(params[n]) +

              is_variadic +

              calling_convention +

              type_hash(return_type)
        )[0:8]

XFG function hashes are a truncated version of a SHA256 digest (only the first 8 bytes are kept), and so their collision resistance is reduced compared to a full SHA256 hash, but we could expect different XFG hashes to reasonably keep the avalanche effect of hashing functions and look unrelated, right?

However, if you inspect a set of XFG hashes on a given binary (I picked ntdll.dll), you'll notice that they definitely don't seem to have 64 bits of entropy:

function 0x180001a30 -> prototype hash: 0x8d952e0d365aa071
function 0x180001b50 -> prototype hash: 0xe2198f4a3c515871
function 0x180001dc0 -> prototype hash: 0xbeac2e06165fc871
function 0x180001de0 -> prototype hash: 0xfaec0e7f70d92371
function 0x180001fc0 -> prototype hash: 0xc5d11eb750d75871
function 0x180002030 -> prototype hash: 0xe8bcaf9a10586871
function 0x180002040 -> prototype hash: 0xc3110f087e584871
function 0x1800020b0 -> prototype hash: 0xdbc1261858d2f871
function 0x1800023a0 -> prototype hash: 0xda690f3e36531a71

The reason behind this is that the truncated SHA256 hashes produced by the compiler front end (c1.dll) receive a final transformation by the compiler back end (c2.dll) before being actually written to the resulting object file. To be precise, the XfgIlVisitor::visit_I_XFG_HASH function in c2.dll applies two masks to the truncated SHA256 hashes:

XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+5B        mov     rcx, 8000060010500070h
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+65        mov     r13, 0FFFDBFFF7EDFFB70h
[...]
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+E9        mov     rdx, [rax]      ; rdx = 8 bytes of SHA256 hash
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+EC        add     rax, 8
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F0        and     rdx, r13        ; hash &= 0FFFDBFFF7EDFFB70h
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F3        mov     [rbx], rax
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F6        or      rdx, rcx        ; hash |= 8000060010500070h
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+F9        mov     ecx, r9d        ; this
XfgIlVisitor::visit_I_XFG_HASH(tagILMAP *)+FC        call    XFG::TiSetHash(ulong,unsigned __int64,tagMOD *)

That is the reason why XFG hashes don't look completely random, despite being based on SHA256. I don't know why these masks are applied, though.

A hands-on hash calculation exercise

To verify that we have properly understood how XFG hashes are generated, let's try to calculate an XFG hash by hand. Let's say that we want to calculate the hash for a function with the following prototype:

void *memcpy(
   void *dest,
   const void *src,
   size_t count
);

We need to find out the 5 pieces of data that compose a function prototype:

  1. number of parameters;
  2. type hash for each parameter;
  3. is it a variadic function or not?;
  4. calling convention;
  5. type hash of the return type.

Components 1, 3 and 4 are trivial:

  1. number of parameters -> DWORD with value 3;
  1. is it a variadic function? -> byte with value 0;
  2. calling convention -> default (DWORD with value 0x201 & 0xF == 0x1).

So let's compute the more complex parts: the type hash of each parameter, and the type hash of the return type.

Type hash of parameter 1

The type of the first parameter is void *. That type is represented by a Type_t with the following content:

00000102 00000200 [+ pointer to referenced Type_t]

We need to find out the 3 pieces of data to produce a type hash:

  1. type qualifiers -> byte with value 0;
  2. type group: it is a pointer -> byte with value 3;
  3. type-specific data: it's a "general" pointer -> hash of referenced type (we have recursion here) + byte with value 2.

For the recursive calculation of the hash of the referenced type (void), the type is represented by a Type_t with the following contents:

The data we need is built as follows:

  1. type qualifiers -> byte with value 0;
  2. type group: it is a primitive type -> byte with value 1;
  3. type-specific data: for Type_t 0x40 (void), XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x0E.

Type hash of parameter 2

The type of the second parameter is const void *. That type is represented by a Type_t with the following contents:

00000102 00000200 [+ pointer to referenced Type_t]

The data we need is built as follows:

  1. type qualifiers -> byte with value 0;
  2. type group: it is a pointer -> byte with value 3;
  3. type-specific data: it's a "general" pointer -> hash of referenced type (we have recursion here) + byte with value 2.

For the recursive calculation of the hash of the referenced type (const void), the type is represented by a Type_t with the following contents:

The data we need is built as follows:

  1. type qualifiers: it has the const qualifier -> encoded as a byte with value 1;
  2. type group: it is a primitive type -> byte with value 1;
  3. type-specific data: for Type_t 0x40 (void) -> XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x0E.

Type hash of parameter 3

The type of the thid parameter is size_t. That type is represented by a Type_t with the following contents:

The data we need is built as follows:

  1. type qualifiers -> byte with value 0;
  2. type group: it is a primitive type -> byte with value 1;
  3. type-specific data: for Type_t 0x4019 (unsigned long long) -> XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x88.

Type hash of return type

The return type is void *, same as the first parameter of the function, so here we just repeat what we obtained before.

  1. type qualifiers -> byte with value 0;
  2. type group: it is a pointer -> byte with value 3;
  3. type-specific data: it's a "general" pointer -> hash of referenced type (we have recursion here) + byte with value 2.

For the recursive calculation of the hash of the referenced type (void):

  1. type qualifiers -> byte with value 0;
  2. type group: it is a primitive type -> byte with value 1;
  3. type-specific data: for Type_t 0x40 (void), XFGHelper::XFGTypeHasher::hash_primitive writes a byte with value 0x0E.

Putting everything together

Let's assemble all the data together:

# Number of params
03 00 00 00

# type hash of param 1 (void *)
SHA256(
    00  #qualifiers
    03  # type group: pointer
    # type hash of referenced type (void)
    SHA256(
        00  # qualifiers
        01  # type group: primitive type
        0E  # hash of primitive type: void -> 0x0E
    )[0:8]
    02  # regular pointer
)[0:8]

# type hash of param 2 (const void *)
SHA256(
    00  # qualifiers
    03  # type group: pointer
    # type hash of referenced type (const void)
    SHA256(
        01  # qualifiers: const
        01  # type group: primitive type
        0E  # hash of primitive type: void -> 0x0E
    )[0:8]
    02  # regular pointer
)[0:8]

# type hash of param 3 (size_t)
SHA256(
    00  # qualifiers
    01  # type group: primitive type
    88  # hash of primitive type: unsigned long long -> 0x88
)[0:8]

# is variadic
00

# calling convention
01 00 00 00

# type hash of return value (void *)
SHA256(
    00  # qualifiers
    03  # type group: pointer
    # type hash of referenced type (void)
    SHA256(
        00  # qualifiers
        01  # type group: primitive type
        0E  # hash of primitive type: void -> 0x0E
    )[0:8]
    02  # regular pointer
)[0:8]

The following Python code obtains the SHA256 digest of that data, and truncates it to its first 8 bytes to obtain a hash identical to the one emitted by the compiler front end. Finally, it applies the two masks of the compiler back end to obtain the XFG hash in its ultimate form:

import struct
import hashlib

def truncated_hash(data):
    return hashlib.sha256(data).digest()[0:8]

def apply_backend_masks(hash):
    hash = hash & 0xFFFDBFFF7EDFFB70
    hash = hash | 0x8000060010500070
    return hash


def main():
    # number of params
    data  = struct.pack('<L', 3)
    # type hash of first param (void *)
    data += truncated_hash(b'\x00\x03' + truncated_hash(b'\x00\x01\x0e') + b'\x02')
    # type hash of second param (const void *)
    data += truncated_hash(b'\x00\x03' + truncated_hash(b'\x01\x01\x0e') + b'\x02')
    # type hash of third param (size_t)
    data += truncated_hash(b'\x00\x01\x88')
    # is variadic
    data += struct.pack('<B', 0x0)
    # calling convention (default)
    data += struct.pack('<L', 0x201 & 0x0F)
    # type hash of return type (void *)
    data += truncated_hash(b'\x00\x03' + truncated_hash(b'\x00\x01\x0e') + b'\x02')

    print(f'Data to be hashed: {data} ({len(data)} bytes)')
    frontend_hash = struct.unpack('<Q', truncated_hash(data))[0]
    print(f'Hash generated by the frontend: 0x{frontend_hash:x}')

    final_hash = apply_backend_masks(frontend_hash)
    print(f'[*] Final XFG hash: 0x{final_hash:x}')

The output of that Python code is the following:

> python test.py

Data to be hashed: b'\x03\x00\x00\x00\xf5\x97x>[J`\xb0\x17\x80\xb8\xc0[\x1b\xd0\xd8#\x14\xb4\xba\x91\xc7\xf6j\x00\x01\x00\x00\x00\xf5\x97x>[J`\xb0' (41 bytes)

Hash generated by the frontend: 0x1da7d393d6b63a72

[*] Final XFG hash: 0x9da5979356d63a70

If we compile some code using a function pointer to call a function whose prototype matches the one that we have been discussing in this section, we can see that the XFG hash we calculated by hand perfectly matches the one generated by MSVC (see the value assigned to register R10 at main+0x8E in the disassembly below):

main+1C        lea     rax, my_memcpy
main+23        mov     [rsp+78h+var_50], rax
[...]
main+6A        lea     rcx, aCallingFunctio ; "Calling function pointer...\n"
main+71        call    printf
main+76        lea     rcx, Str        ; "a test"
main+7D        call    strlen
main+82        cdqe
main+84        mov     rcx, [rsp+78h+var_50]
main+89        mov     [rsp+78h+var_48], rcx
main+8E        mov     r10, 9DA5979356D63A70h
main+98        mov     r8, rax
main+9B        lea     rdx, aATest_0   ; "a test"
main+A2        lea     rcx, [rsp+78h+var_28]
main+A7        mov     rax, [rsp+78h+var_48]
main+AC        call    cs:__guard_xfg_dispatch_icall_fptr

Conclusions

In this blog post I wanted to share all the details of how the MSVC compiler generates XFG hashes for C programs. Besides exploring the details of an upcoming exploit mitigation, the topic allows to dig a little bit into compiler internals.

Please have in mind that, for now, XFG is only found on Windows Insider Preview builds, so what we have described here may be subject to changes before this CFI solution makes it into an official release of Windows 10.

Some questions remain unanswered for now, such as why the compiler back end applies two bit masks to the hashes generated by the front end, and why the hash is stored with the bit 0 set before the function start, but kept with the bit 0 unset in the XFG-instrumented call site.

Finally, it would be interesting to see what are the differences in the way the C++ compiler front end (c1xx.dll) computes XFG hashes. A quick look at this binary suggests that the hashing algorithm looks quite similar to the one used for the C language, but it will likely be adapted to take C++ concepts such as inheritance and C++ type qualifiers and modifiers into account.


文章来源: http://blog.quarkslab.com/how-the-msvc-compiler-generates-xfg-function-prototype-hashes.html
如有侵权请联系:admin#unsafe.sh