All about COMMON symbols
2022-2-6 16:0:0 Author: maskray.me(查看原文) 阅读量:11 收藏

Programming language behavior

FORTRAN 77 COMMON blocks compiled to COMMON symbols. You could declare a COMMON block in more than one file, with each specifying the number, type, and size of the variable. The linker allocated enough space to satisfy the largest size.

This feature was somehow ported to C. Unix C compilers traditionally permitted a variable using tentative definition in different compilation units and the linker would allocate enough space without reporting an error.

This behavior is constrast to both C and C++ standards, but GCC and Clang traditionally defaulted to -fcommon for C. GCC since 10 and Clang since 11 default to -fno-common.

1
2
3
4
5
6
7
8
% echo 'int x;' > a.c
% gcc -S -fcommon a.c -o - | grep -w x
.comm x,4,4
% gcc -S -fno-common a.c -o - | grep -w x
.globl x
.type x, @object
.size x, 4
x:

Assembler behavior

The directive .comm identifier, size[, alignment] instructs the assembler to define a COMMON symbol with the specified size and the optional alignment.

In the ELF object file format, the symbol is represented as a STT_OBJECT STB_GLOBAL symbol whose st_shndx field holds SHN_COMMON. In readelf, the SHN_COMMON value is shown as COM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
typedef struct {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;

typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;

The st_value field holds the alignment.

1
2
3
4
5
6
7
8
9
% cat a.s
.comm x,8,4
% as a.s -o a.o
% readelf -Ws a.o

Symbol table '.symtab' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000004 8 OBJECT GLOBAL DEFAULT COM x

The binding STB_WEAK is not allowed. Other types are not allowed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
% >err.s cat <<e
.comm x,4,4
.weak x
e
% as err.s
err.s: Assembler messages:
err.s: Error: symbol `x' can not be both weak and common
% >err.s cat <<e
.comm x,4,4
.type x,@function
e
% as err.s
err.s: Assembler messages:
err.s:2: Error: cannot change type of common symbol 'x'

The generic ABI supports STT_COMMON as another way to label a COMMON symbol. It says:

Symbols with type STT_COMMON label uninitialized common blocks. In relocatable objects, these symbols are not allocated and must have the special section index SHN_COMMON (see below). In shared objects and executables these symbols must be allocated to some section in the defining object.

In relocatable objects, symbols with type STT_COMMON are treated just as other symbols with index SHN_COMMON. If the link-editor allocates space for the SHN_COMMON symbol in an output section of the object it is producing, it must preserve the type of the output symbol as STT_COMMON.

When the dynamic linker encounters a reference to a symbol that resolves to a definition of type STT_COMMON, it may (but is not required to) change its symbol resolution rules as follows: instead of binding the reference to the first symbol found with the given name, the dynamic linker searches for the first symbol with that name with type other than STT_COMMON. If no such symbol is found, it looks for the STT_COMMON definition of that name that has the largest size.

--elf-stt-common=yes causes GNU assembler to use STT_COMMON. It is super rare in the wild, though.

1
2
3
4
5
6
7
% as a.s --elf-stt-common=yes -o a.o
% readelf -Ws a.o

Symbol table '.symtab' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000004 4 COMMON GLOBAL DEFAULT COM x

Linker behavior

They key is: a COMMON symbol does not lead to a duplicate definition error with any kind of definitions. The quoted generic ABI text describes the behavior when a COMMON symbol has different sizes in relocatable objects. The output symbol gets the largest size.

Platforms differ in how the alignment is selected. GNU ld and ld.lld pick the largest alignment.

1
2
3
as -o a.o <<< '.comm x,8,4'
as -o b.o <<< '.comm x,4,8'
ld a.o b.o

Mach-O ld64 lets the copy with the largest size decide the alignment.

IN ELF, the precedence is STB_GLOBAL > COMMON > STB_WEAK.

When the link editor combines several relocatable object files, it does not allow multiple definitions of STB_GLOBAL symbols with the same name. On the other hand, if a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error. The link editor honors the global definition and ignores the weak ones. Similarly, if a common symbol exists (that is, a symbol whose st_shndx field holds SHN_COMMON), the appearance of a weak symbol with the same name will not cause an error. The link editor honors the common definition and ignores the weak ones.

1
2
3
as -o a.o <<< '.comm x,8,4'
as -o b.o <<< '.data; .globl x; x: .space 16; .size x, 16'
as -o c.o <<< '.data; .weak x; x: .space 16; .size x, 16'
1
2
3
4
% ld.bfd -e 0 a.o b.o  
ld.bfd: warning: alignment 1 of symbol `x' in b.o is smaller than 4 in a.o
ld.bfd: warning: size of symbol `x' changed from 8 in a.o to 16 in b.o
% ld.bfd -e 0 a.o c.o

GNU ld ported a strange rule from SUN's linker in 1999-12: GNU-ld behaviour does not match native linker behaviour.

Here is a table showing when an element is pulled in from an archive with the Solaris 2.6 linker and ar program:

1
2
3
4
5
main program\archive   undefined    common    defined

undefined no yes yes
common no no yes
defined no no no

When a symbol is COMMON and ld sees an archive, ld checks whether the archive index provides a STB_GLOBAL definition of the symbol. If yes, ld extracts the archive as well. This is in contrary to the usual rule that only an undefined symbol leads to archive member extraction.

ld.lld since 12.0.0 has this behavior (D86142) with the enabled-by-default --fortran-common option.

Say b0.a and b1.a are mostly identical archives, but b0.a objects are compiled with -fcommon while b1.a objects are compiled with -fno-common . If a.o references b0.a, this archive lookup behavior may cause a duplicate definition error for ld a.o b0.a b1.a while b1.a can be shadowed by b0.a without the rule.

1
2
3
4
5
echo 'extern int ret; int main() { return ret; }' > a.c
echo 'int ret; void foo() {}' > b.c
gcc -c a.c
gcc -c -fcommon b.c -o b0.o && rm -f b0.a && ar rc b0.a b0.o
gcc -c b.c -o b1.o && rm -f b1.a && ar rc b1.a b1.o
1
2
3
4
5
6
7
8
9
# ret in b0.a(b0.o) is COMMON. b1.a(b1.o) is extracted to override the COMMON symbol with a STB_GLOBAL definition.
% gcc a.o b0.a b1.a
ld.lld: error: duplicate symbol: foo
>>> defined at b.c
>>> b0.o:(foo) in archive b0.a
>>> defined at b.c
>>> b1.o:(.text+0x0) in archive b1.a
collect2: error: ld returned 1 exit status
% gcc a.o b1.a b0.a # b1.a shadows b0.a

What I am most concerned with is how to parallelize symbol resolution in the presence of this archive lookup rule.

GNU ld and ld.lld treat COMMON symbols as though they are in an input section named COMMON. *(COMMON) in a linker script can match these symbols.

Error-prone COMMON symbols

With -fcommon, due to the linker symbol resolution rule, a tentative definition int x; may be overridden by a STB_GLOBAL definition in another compilation unit. This is error-prone since the user may assume an initial value of zero if unware of int x = 1;.

1
2
3
gcc -c -fcommon -xc - -o a.o <<< 'int x;'
gcc -c -xc - -o b.o <<< 'int x = 1;'
gcc -shared a.o b.o

GNU ld and ld.lld support --warn-common which detects the error-prone overridding.

1
2
% gcc -shared -fuse-ld=bfd -Wl,--warn-common a.o b.o
/usr/bin/ld.bfd: b.o: warning: definition of `x' overriding common from a.o

Some legacy code may inadvertently rely on COMMON symbols by having something like int x; in a header file. Such code may not compile with -fno-common.

LLVM IR

In LLVM IR, a COMMON symbol has the "common" linkage. It is a interposable linkage and some optimizations are suppressed. For example:

  • InstCombine assumes that the addresses of a common global i8 and an external global i32 may be the same.
  • llvm.objectsize intrinsic does not know the size. This may lead to conservative assumptions for some _chk functions.

文章来源: https://maskray.me/blog/2022-02-06-all-about-common-symbols
如有侵权请联系:admin#unsafe.sh