Copy relocations, canonical PLT entries and protected visibility

Background:

-fno-pic can only be used by executables. On most platforms and architectures, direct access relocations are used to reference external data symbols.
-fpic can be used by both executables and shared objects. Windows has __declspec(dllimport) but most other binary formats allow a default visibility external data to be resolved to a shared object, so generally direct access relocations are disallowed.
-fpie was introduced as a mode similar to -fpic for ELF: the compiler can make the assumption that the produced object file can only be used by executables, thus all definitions are non-preemptible and thus interprocedural optimizations can apply on them.

For

1 2	extern int a; int *foo() { return &a; }

-fno-pic typically produces an absolute relocation (a PC-relative relocation can be used as well). On ELF x86-64 it is usually R_X86_64_32 in the position dependent small code model. If a is defined in the executable (by another translation unit), everything works fine. If a turns out to be defined in a shared object, its real address will be non-constant at link time. Either action needs to be taken:

Emit a dynamic relocation in every use site. Text sections are usually non-writable. A dynamic relocation applied on a non-writable section is called a text relocation.
Emit a single copy relocation. The linker obtains the size of the symbol, allocates the bytes in .bss (this may make the object writable. On LLD a readonly area may be picked.), and emit an R_*_COPY relocation. All references resolve to the new location.

Multiple text relocations are even less acceptable, so on ELF a copy relocation is generally used. Here is a nice description from Rich Felker: "Copy relocations are not a case of overriding the definition in the abstract machine, but an implementation detail used to support data objects in shared libraries when the main program is non-PIC."

Copy relocations have drawbacks:

Break page sharing.
Make the symbol properties (e.g. size) part of ABI.
If the shared object is linked with -Bsymbolic or --dynamic-list and defines a data symbol copy relocated by the executable, the address of the symbol may be different in the shared object and in the executable.

Traditionally copy relocations could only occur in -fno-pic code. A GCC 5 change made this possible for x86-64. Please read on.

x86: copy relocations and `-fpie`

-fpic using GOT indirection for external data symbols has cost. Making -fpie similar to -fpic in this regard incurs costs if the data symbol turns out to be defined in the executable. Having the data symbol defined in another translation unit linked into the executable is very common, especially if the vendor uses fully/mostly statically linking mode.

In GCC 5, "x86-64: Optimize access to globals in PIE with copy reloc" started to use direct access relocations for external data symbols on x86-64 in -fpie mode.

1 2	extern int a; int foo() { return a; }

GCC<5: movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax (8 bytes)
GCC>=5: movl a(%rip), %eax (6 bytes)

This change is actually useful for architectures other than x86-64 but is never implemented for other architectures. What went wrong: the change was implemented as an inflexible configure-time choice (HAVE_LD_PIE_COPYRELOC), defaulting to such a behavior if ld supports PIE copy relocations (most binutils installations). Keep in mind that such a default breaks -Bsymbolic and --dynamic-list in shared objects.

Clang addressed the inflexible configure-time choice via an opt-in option -mpie-copy-relocations (D19996).

I noticed that:

The option can be used for -fno-pic code as well to prevent copy relocations on ELF. This is occasionally users want, and they switch from -fno-pic to -fpie just for this purpose.
The option name should describe the code generation behavior, instead of the inferred behavior at the linking stage on a partibular binary format.
The option does not need to tie to ELF.
- On COFF, the behavior is like always -fdirect-access-external-data. __declspec(dllimport) is needed to enable indirect access.
- On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic (only available on arm) and the opposite for -fpic.
x86-64 psABI introduced R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX as GOT optimization. With the optimization, GOT indirection can be optimized, so the incured cost is very low now.

So I proposed an alternative option -f[no-]direct-access-external-data: https://reviews.llvm.org/D92633 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is to drop HAVE_LD_PIE_COPYRELOC and (x86-64) default to GOT indirection for external data symbols in -fpie mode.

Please keep in mind that -f[no-]semantic-interposition is for definitions while -f[no-]direct-access-external-data is for undefined data symbols.

GCC 5 introduced -fno-semantic-interposition to use local aliases for references to definitions in the same translation unit.

`STV_PROTECTED`

Now let's consider how STV_PROTECTED comes into play. Here is the generic ABI definition on STV_PROTECTED:

A symbol defined in the current component is protected if it is visible in other components but not preemptable, meaning that any reference to such a symbol from within the defining component must be resolved to the definition in that component, even if there is a definition in another component that would preempt by the default rules. A symbol with STB_LOCAL binding may not have STV_PROTECTED visibility. If a symbol definition with STV_PROTECTED visibility from a shared object is taken as resolving a reference from an executable or another shared object, the SHN_UNDEF symbol table entry created has STV_DEFAULT visibility.

A non-local STV_DEFAULT defined symbol is by default preemptible in a shared object on ELF. STV_PROTECTED can make the symbol non-preemptible. You may have noticed that I use "preemptible" while the generic ABI uses "preemptable" and LLVM IR uses "dso_preemptable". Both forms work but "preemptible" is more common.

x86: protected data symbols and copy relocations

Many folks consider that copy relocations are best-effort support provided by the toolchain. STV_PROTECTED is intended as an optimization and the optimization can error out if it can't be done for whatever reason. Since copy relocations are already oftentimes unacceptable, it is natural to think that we should just disallow copy relocations on protected data symbols.

However, GNU ld 2.26 made a change which enabled copy relocations on protected data symbols for i386 and x86-64.

x86: protected data symbols and direct accesses

If a protected data symbol in a shared object is copy relocated, allowing direct accesses will cause the shared object to operate on a different copy from the executable. Therefore, direct accesses to protected data symbols have to be disallowed in -fpic code, just in case the symbols may be copy relocated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to use GOT indirection for protected external data.

This caused unneeded pessimization for protected external data. Clang always treats protected similar to hidden/internal.

For older GCC (and all versions of Clang), direct accesses are produced in -fpic code. Mixing such object files can silently break copy relocations on protected data symbols. Therefore, GNU ld made the change https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5 to error in -shared mode.

% cat a.s
leaq foo(%rip), %rax

.data
.global foo
.protected foo
foo:
% gcc -fuse-ld=bfd -shared a.s
/usr/bin/ld.bfd: /tmp/ccchu3Xo.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status

This led to a heated discussion https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to switch from GNU ld to gold.

binutils commit "x86: Clear extern_protected_data for GNU_PROPERTY_NO_COPY_ON_PROTECTED" introduced GNU_PROPERTY_NO_COPY_ON_PROTECTED. With this property, ld -shared will not error for relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object.

The two issues above are the costs enabling copy relocations on protected data symbols. Personally I don't think copy relocations on protected data symbols are actually leveraged. GNU ld's x86 port can just (1) reject such copy relocations and (2) allow direct accesses referencing protected data symbols in -shared mode. GNU_PROPERTY_NO_COPY_ON_PROTECTED can be phased out.

Protected function symbols and canonical PLT entries


__attribute__((visibility("protected"))) void *foo () {
  return (void *)foo;
}

GNU ld's aarch64 port somehow rejects this:

% gcc -shared -fuse-ld=bfd -fpic b.c -o b.so
/usr/bin/ld.bfd: /tmp/ccXdBqMf.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `foo' which may bind externally can not be used when making a shared object; recompile with -fPIC
/tmp/ccXdBqMf.o: in function `foo':
a.c:(.text+0x0): dangerous relocation: unsupported relocation
collect2: error: ld returned 1 exit status

The code is supported on many other architectures, including powerpc and x86.

On many architectures, a branch instruction uses a branch specific relocation type (e.g. R_AARCH64_CALL26, R_PPC64_REL24, R_RISCV_CALL_PLT). This is great because the address is insignificant and the linker can arrange for a regular PLT if the symbol turns out to be external.

On i386, a branch in -fno-pic code emits an R_386_PC32 relocation, which is indistinguishable from an address taken operation. If the symbol turns out to be external, the linker has to employ a tricky called "canonical PLT entry" (st_shndx=0, st_value!=0).

1
2
3


extern void foo(void);
int main() { foo(); }

% gcc -m32 -shared -fuse-ld=bfd -fpic b.c -o b.so
% gcc -m32 -fno-pic -no-pie -fuse-ld=lld a.c ./b.so

% gcc -m32 -fno-pic a.c ./b.so -fuse-ld=lld
ld.lld: error: cannot preempt symbol: foo
>>> defined in ./b.so
>>> referenced by a.c
>>>               /tmp/ccDGhzEy.o:(main)
collect2: error: ld returned 1 exit status

% gcc -m32 -fno-pic -no-pie a.c ./b.so -fuse-ld=bfd
# canonical PLT entry; foo has different addresses in a.out and b.so.
% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd
/usr/bin/ld.bfd: /tmp/ccZ3Rl8Y.o: warning: relocation against `foo' in read-only section `.text'
/usr/bin/ld.bfd: warning: creating DT_TEXTREL in a PIE
% gcc -m32 -fno-pic -pie a.c ./b.so -fuse-ld=bfd -z text
/usr/bin/ld.bfd: /tmp/ccUv8wXc.o: warning: relocation against `foo' in read-only section `.text'
/usr/bin/ld.bfd: read-only segment has dynamic relocations
collect2: error: ld returned 1 exit status

This used to be a problem for x86-64 as well, until "x86-64: Generate branch with PLT32 relocation" changed call/jmp foo to emit R_X86_64_PLT32 instead of R_X86_64_PC32. Note: (-fpie/-fpic) call/jmp foo@PLT always emits R_X86_64_PLT32.

x86: copy relocations and -fpie

STV_PROTECTED

x86: protected data symbols and copy relocations

x86: protected data symbols and direct accesses

Protected function symbols and canonical PLT entries

x86: copy relocations and `-fpie`

`STV_PROTECTED`