LLD is the LLVM linker. It was added to the LLVM repository at the end of 2011 as a work-in-progress rewrite of ld64 for the Mach-O binary format. Today, it is a mature and fast linker supporting multiple binary formats (ELF, Mach-O, PE/COFF, WebAssembly).
As a main contributor of LLD's ELF port who has fixed numerous corner cases in recent years, I consider that its x86-64 support has been mature since the 8.0.0 release and is in a great shape since 9.0.0. The AArch64 and PowerPC32/PowerPC64 support are great since the 10.0.0 release. The 11.0.0 release has very solid linker script support. (When people complain that GNU ld's linker script is not immediately usable with LLD, it is almost assuredly the problem of the script itself.) So, what's the next? Build glibc with LLD!
glibc is known for tricks used here and there and tons of GNU extensions which challenge a "foreign" toolchain like llvm-project (Clang, LLD, etc). Read on.
Build
librtld.map
There is a bootstrapping problem between ld.so and libc because they are separate. In a nutshell, elf/Makefile
performs the following steps to build elf/ld.so
:
- Create
elf/libc_pic.a
from libc.os
files - Create
elf/dl-allobjs.os
from a relocatable link of rtld.os
files - Create link map
elf/librtld.map
from a relocatable link ofelf/dl-allobjs.os
,elf/libc_pic.a
, and-lgcc
- Get a list of extracted archive members (
elf/librtld.mk
) fromelf/librtld.map
and createelf/rtld-libc.a
- Create
elf/librtld.os
from a relocatable link ofelf/dl-allobjs.os
andelf/rtld-libc.a
- Create
elf/ld.so
from a-shared
link ofelf/librtld.os
with the version scriptld.map
In a link map printed by GNU ld, Archive member included to satisfy reference by file (symbol)
is followed by extracted archive members. elf/Makefile
made use of sed -n 's@^$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*$$@\1 \2@p'
to extract the archive members. LLD doesn't implement Archive member included to satisfy reference by file (symbol)
. Fortunately, LLD's output has lines like
1 | 1f350 1f350 1e 16 /home/maskray/Dev/glibc/out/lld/elf/rtld-libc.a(rtld-access.os):(.text) |
We can use sed -n 's@^[0-9a-f ]*$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*$$@\1 \2@p'
to extract the archive members.
scripts/output-format.sed
libc.so
is a linker script. On Debian GNU/Linux, it looks like:
1 | % cat /lib/x86_64-linux-gnu/libc.so |
The idea is that -lc
can expand to something like -( libc.so.6 libc_nonshared.a --push-state --as-needed ld-linux-x86-64.so.2 --pop-state -)
. libc_nonshared.a
contains functions which should be statically linked. ld-linux-x86-64.so.2
is mostly for __tls_get_addr
used by general-dynamic/local-dynamic TLS models. Commit d3f5f87569398d11756b3dcb7a66926bfd8ee047 (in 2015) added AS_NEEDED
with no description of the purpose. Retroactively, this can make a ld.so performance difference when an executable has O(1000) shared object dependencies because the overall shared object uniqueness checks has has quadratic time complexity.
The first non-comment line is an OUTPUT_FORMAT
command, which is derived from the output of ld --verbose
. In GNU ld, --verbose
prints the internal linker script, which is used when an external one (-T
) is not used.
1 | ... |
Makerules
extracted the OUTPUT_FORMAT
line with a frightening sed script:
1 | /ld.*[ ]-E[BL]/b f |
LLD does not have an internal linker script so libc.so
did not have the OUTPUT_FORMAT
line. ( Personally I think an internal linker script is not useful. It would have some exposition value but the language is not powerful enough to encode all built-in logic. If LLD is to support the feature, we would need to emit a lot of conditional code which can add huge amount of maintenance burden. )
Inspired by a Linux kernel usage, I realized that there is a better way to get the output format (bfdname): we can just parse the output of objdump -f
.
1 | % objdump -f elf/ld.so |
llvm-objdump -f
somewhat printed upper-case output formats. I switched the case in D76046.
--defsym
A Makefile
specified -Wl,--defsym=malloc=0
and other malloc.os
definitions before libc_pic.a
so that libc_pic.a(malloc.os)
is not extracted. This trick was used to avoid multiple definition errors.
1 | $(objpfx)librtld.map: $(objpfx)dl-allobjs.os $(common-objpfx)libc_pic.a |
For the interaction between a linker option and an input file, LLD generally chooses the behavior so that their relative order does not matter. Some options are inherently order dependent, e.g. --as-needed
and --no-as-needed
, --whole-archive
and --no-whole-archive
. However, reducing order dependence can improve robustness of a build system.
I had a debate with others and finally I noticed one point: --defsym
defines a SHN_ABS
symbol while a normal definition is relative to the image base. So a normal definition is better regardless.
I sent the patch in April 2020, pinged once in August. Since nobody responded, I sent again in December. Finally, this issue is fixed by elf: Replace a --defsym trick with an object file to be compatible with LLD
.
_GLOBAL_OFFSET_TABLE_[0]
In nearly every ELF port of GNU ld, _GLOBAL_OFFSET_TABLE_[0]
is the link-time address of _DYNAMIC
(the start of .dynamic
/PT_DYNAMIC
). In glibc, sysdeps/*/dl-machine.h
files used this approach to compute the load base (the virtual address of the ELF header).
1 | runtime_DYNAMIC = PC relative address of _DYNAMIC |
So you may ask: why can't glibc extract the p_vaddr
field of the PT_DYNAMIC
program header. Well, its code has a poor organization and makes this elegant solution difficult...
Due to the glibc requirement, unfortunately _GLOBAL_OFFSET_TABLE_[0]
has been a part of i386/x86-64 and PowerPC64 ELFv2 ABIs.
LLD's AArch64 port does not set _GLOBAL_OFFSET_TABLE_[0]
, so the trick does not work. I figured out an elegant fix without updating LLD:
In 2012, GNU ld and gold (included in binutils 2.23) started to define __ehdr_start
which has the link-time address zero. Using a PC relative code sequence to take the runtime address of __ehdr_start
gives us a better way to get the load base. I submitted patches to use the approach for aarch64/arm/riscv/x86_64. The aarch64 code looks like the following. I originally intended to use inline assembly to avoid relying on compiler generating PC-relative addressing for hidden symbol access, but Szabolcs Nagy recommended the pure C approach.
1 |
|
Tests
AArch64's general-dynamic/local-dynamic TLS models
Complaints
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391 is my go-to example when I want to demonstrate that less amicable to a foreign toolchain. "NO LLD is not implemented the ABI as PIE COPYRELOC is required by ABI these days". The status was updated back and forth between "invalid", "wontfix" and "worksforme" until a maintainer realized GNU ld had indeed one bug (R_X86_64_[REX_]GOTPCRELX cannot be relaxed for SHN_ABS) and one enhancement (PC relative relocations to a non-preemptible symbol should be rejected).