ELF hash function
2023-4-12 15:0:0 Author: maskray.me(查看原文) 阅读量:24 收藏

This article describes an interesting overflow bug in the ELF hash function.

The System V Application Binary Interface (generic ABI) specifies the ELF object file format. When producing an output executable or shared object needing a dynamic symbol table (.dynsym), a linker generates a .hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH tag is produced to hold the address of .hash.

DT_HASH is used by a dynamic loader to perform symbol lookup (for dynamic relocations and dlsym family functions). A detailed description of the format can be found in ELF: symbol lookup via DT_HASH.

Other use cases

In a Solaris Version Definition Section, vd_hash holds a value generated using the ELF hash function. The GNU symbol versioning scheme inherits this field from Solaris.

Overflow bug

The generic ABI gives the following code fragment in "Figure 5-13: Hashing Function".

1
2
3
4
5
6
7
8
9
10
11
12
13
unsigned long
elf_hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}

The function is supposed to return a value no larger than 0x0fffffff. Unfortunately, there is a bug. When unsigned long consists of more than 32 bits, the return value may be larger than UINT32_MAX. For instance, elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns 0x100000002, which is clearly unintended, as the function should behave the same way regardless of whether long represents a 32-bit integer or a 64-bit integer.

It is possible to use 7-bit ASCII characters to trigger the issue. For instance, elf_hash((const unsigned char *)"iiiiii\ni")) == 100000009.

Most ELF operating systems have switched from DT_HASH to DT_GNU_HASH for many years.

Project survey

glibc fixed the overflow issue while optimizing the function in April 1995. The two XOR operations were optimized to one in Dec 2011.

binutils-gdb fixed the overflow issue in May 2003.

musl has had the elegant and efficient implementation since 2011 (initial check-in). It is worth noting that uint_fast32_t is used, so that an architecture can optimize the implementation if the architecture has slow 32-bit integer arithmetic operations.

1
2
3
4
5
6
7
8
9
10
static uint32_t sysv_hash(const char *s0)
{
const unsigned char *s = (void *)s0;
uint_fast32_t h = 0;
while (*s) {
h = 16*h + *s++;
h ^= h>>24 & 0xf0;
}
return h & 0xfffffff;
}

Nathan Sidwell raised the issue for llvm-project and pointed out a bug about using char instead of unsigned char on 2023-04-09.

I asked What if the result of elf_hash is larger than UINT32_MAX? on 2023-04-11.


文章来源: https://maskray.me/blog/2023-04-12-elf-hash-function
如有侵权请联系:admin#unsafe.sh