Build glibc with LLD 13
2021-09-05 16:00:00 Author: maskray.me(查看原文) 阅读量:72 收藏

LLD is the LLVM linker. It was added to the LLVM repository at the end of 2011 as a work-in-progress rewrite of ld64 for the Mach-O binary format. Today, it is a mature and fast linker supporting multiple binary formats (ELF, Mach-O, PE/COFF, WebAssembly).

As a main contributor of LLD's ELF port who has fixed numerous corner cases in recent years, I consider that its x86-64 support has been mature since the 8.0.0 release and is in a great shape since 9.0.0. The AArch64 and PowerPC32/PowerPC64 support are great since the 10.0.0 release. The 11.0.0 release has very solid linker script support. (When people complain that GNU ld's linker script is not immediately usable with LLD, it is almost assuredly the problem of the script itself.) So, what's the next? Build glibc with LLD!

glibc is known for tricks used here and there and tons of GNU extensions which challenge a "foreign" toolchain like llvm-project (Clang, LLD, etc). Read on.

Build

librtld.map

There is a bootstrapping problem between ld.so and libc because they are separate. In a nutshell, elf/Makefile performs the following steps to build elf/ld.so:

  • Create elf/libc_pic.a from libc .os files
  • Create elf/dl-allobjs.os from a relocatable link of rtld .os files
  • Create link map elf/librtld.map from a relocatable link of elf/dl-allobjs.os, elf/libc_pic.a, and -lgcc
  • Get a list of extracted archive members (elf/librtld.mk) from elf/librtld.map and create elf/rtld-libc.a
  • Create elf/librtld.os from a relocatable link of elf/dl-allobjs.os and elf/rtld-libc.a
  • Create elf/ld.so from a -shared link of elf/librtld.os with the version script ld.map

In a link map printed by GNU ld, Archive member included to satisfy reference by file (symbol) is followed by extracted archive members. elf/Makefile made use of sed -n 's@^$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*$$@\1 \2@p' to extract the archive members. LLD doesn't implement Archive member included to satisfy reference by file (symbol). Fortunately, LLD's output has lines like

1
2
3
1f350            1f350       1e    16         /home/maskray/Dev/glibc/out/lld/elf/rtld-libc.a(rtld-access.os):(.text)
1f350 1f350 1e 1 __access
1f350 1f350 1e 1 access

We can use sed -n 's@^[0-9a-f ]*$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*$$@\1 \2@p' to extract the archive members.

scripts/output-format.sed

BZ #26559

libc.so is a linker script. On Debian GNU/Linux, it looks like:

1
2
3
4
5
6
% cat /lib/x86_64-linux-gnu/libc.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/libc_nonshared.a AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) )

The idea is that -lc can expand to something like -( libc.so.6 libc_nonshared.a --push-state --as-needed ld-linux-x86-64.so.2 --pop-state -). libc_nonshared.a contains functions which should be statically linked. ld-linux-x86-64.so.2 is mostly for __tls_get_addr used by general-dynamic/local-dynamic TLS models. Commit d3f5f87569398d11756b3dcb7a66926bfd8ee047 (in 2015) added AS_NEEDED with no description of the purpose. Retroactively, this can make a ld.so performance difference when an executable has O(1000) shared object dependencies because the overall shared object uniqueness checks has has quadratic time complexity.

The first non-comment line is an OUTPUT_FORMAT command, which is derived from the output of ld --verbose. In GNU ld, --verbose prints the internal linker script, which is used when an external one (-T) is not used.

1
2
3
4
5
6
7
8
9
10
11
...
using internal linker script:
==================================================
/* Script for -z combreloc -z separate-code */
/* Copyright (C) 2014-2020 Free Software Foundation, Inc.
Copying and distribution of this script, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
"elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)

Makerules extracted the OUTPUT_FORMAT line with a frightening sed script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/ld.*[  ]-E[BL]/b f
/collect.*[ ]-E[BL]/b f
/OUTPUT_FORMAT[^)]*$/{N
s/\n[ ]*/ /
}
t o
: o
s/^.*OUTPUT_FORMAT(\([^,]*\), \1, \1).*$/OUTPUT_FORMAT(\1)/
t q
s/^.*OUTPUT_FORMAT(\([^,]*\), \([^,]*\), \([^,]*\)).*$/\1,\2,\3/
t s
s/^.*OUTPUT_FORMAT(\([^,)]*\).*$)/OUTPUT_FORMAT(\1)/
t q
d
: s
s/"//g
G
s/\n//
s/^\([^,]*\),\([^,]*\),\([^,]*\),B/OUTPUT_FORMAT(\2)/p
s/^\([^,]*\),\([^,]*\),\([^,]*\),L/OUTPUT_FORMAT(\3)/p
s/^\([^,]*\),\([^,]*\),\([^,]*\)/OUTPUT_FORMAT(\1)/p
/,/s|^|*** BUG in libc/scripts/output-format.sed *** |p
q
: q
s/"//g
p
q
: f
s/^.*[ ]-E\([BL]\)[ ].*$/,\1/
t h
s/^.*[ ]-E\([BL]\)$/,\1/
t h
d
: h
h

LLD does not have an internal linker script so libc.so did not have the OUTPUT_FORMAT line. ( Personally I think an internal linker script is not useful. It would have some exposition value but the language is not powerful enough to encode all built-in logic. If LLD is to support the feature, we would need to emit a lot of conditional code which can add huge amount of maintenance burden. )

Inspired by a Linux kernel usage, I realized that there is a better way to get the output format (bfdname): we can just parse the output of objdump -f.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
% objdump -f elf/ld.so

elf/ld.so: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001050

% llvm-objdump -f elf/ld.so

elf/ld.so: file format elf64-x86-64

architecture: x86_64
start address: 0x0000000000001050

llvm-objdump -f somewhat printed upper-case output formats. I switched the case in D76046.

--defsym

A Makefile specified -Wl,--defsym=malloc=0 and other malloc.os definitions before libc_pic.a so that libc_pic.a(malloc.os) is not extracted. This trick was used to avoid multiple definition errors.

1
2
3
4
5
$(objpfx)librtld.map: $(objpfx)dl-allobjs.os $(common-objpfx)libc_pic.a
@-rm -f $@T
$(reloc-link) -o $@.o $(rtld-stubbed-symbols-args) \
'-Wl,-(' $^ -lgcc '-Wl,-)' -Wl,-Map,$@T
rm -f $@.o

For the interaction between a linker option and an input file, LLD generally chooses the behavior so that their relative order does not matter. Some options are inherently order dependent, e.g. --as-needed and --no-as-needed, --whole-archive and --no-whole-archive. However, reducing order dependence can improve robustness of a build system.

I had a debate with others and finally I noticed one point: --defsym defines a SHN_ABS symbol while a normal definition is relative to the image base. So a normal definition is better regardless.

I sent the patch in April 2020, pinged once in August. Since nobody responded, I sent again in December. Finally, this issue is fixed by elf: Replace a --defsym trick with an object file to be compatible with LLD.

_GLOBAL_OFFSET_TABLE_[0]

In nearly every ELF port of GNU ld, _GLOBAL_OFFSET_TABLE_[0] is the link-time address of _DYNAMIC (the start of .dynamic/PT_DYNAMIC). In glibc, sysdeps/*/dl-machine.h files used this approach to compute the load base (the virtual address of the ELF header).

1
2
runtime_DYNAMIC = PC relative address of _DYNAMIC
load_base = runtime_DYNAMIC - linktime_DYNAMIC = runtime_DYNAMIC - _GLOBAL_OFFSET_TABLE_[0]

So you may ask: why can't glibc extract the p_vaddr field of the PT_DYNAMIC program header. Well, its code has a poor organization and makes this elegant solution difficult...

Due to the glibc requirement, unfortunately _GLOBAL_OFFSET_TABLE_[0] has been a part of i386/x86-64 and PowerPC64 ELFv2 ABIs.

LLD's AArch64 port does not set _GLOBAL_OFFSET_TABLE_[0], so the trick does not work. I figured out an elegant fix without updating LLD:

In 2012, GNU ld and gold (included in binutils 2.23) started to define __ehdr_start which has the link-time address zero. Using a PC relative code sequence to take the runtime address of __ehdr_start gives us a better way to get the load base. I submitted patches to use the approach for aarch64/arm/riscv/x86_64. The aarch64 code looks like the following. I originally intended to use inline assembly to avoid relying on compiler generating PC-relative addressing for hidden symbol access, but Szabolcs Nagy recommended the pure C approach.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17


static inline ElfW(Addr) __attribute__ ((unused))
elf_machine_load_address (void)
{
extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
return (ElfW(Addr)) &__ehdr_start;
}



static inline ElfW(Addr) __attribute__ ((unused))
elf_machine_dynamic (void)
{
extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
return (ElfW(Addr)) _DYNAMIC - elf_machine_load_address ();
}

Tests

AArch64's general-dynamic/local-dynamic TLS models

Complaints

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391 is my go-to example when I want to demonstrate that less amicable to a foreign toolchain. "NO LLD is not implemented the ABI as PIE COPYRELOC is required by ABI these days". The status was updated back and forth between "invalid", "wontfix" and "worksforme" until a maintainer realized GNU ld had indeed one bug (R_X86_64_[REX_]GOTPCRELX cannot be relaxed for SHN_ABS) and one enhancement (PC relative relocations to a non-preemptible symbol should be rejected).


文章来源: http://maskray.me/blog/2021-09-05-build-glibc-with-lld
如有侵权请联系:admin#unsafe.sh