Symbol processing
2021-06-20 16:00:00 Author: maskray.me(查看原文) 阅读量:66 收藏

UNDER CONSTRUCTION (COFF, Mach-O)

After the linker reads an input file (object file, shared object, archive, LLVM bitcode file), the most critical task is to process its symbol table.

There is a global symbol table. Every input symbol table may interact with the global one, and affect archive processing and future steps (LTO, relocation processing, as-needed shared objects, etc).

ELF

Symbol tables

An object file can optionally have symbol tables.

A relocatable object file almost always has a symbol table, which is represented by a section .symtab of type SHT_SYMTAB. The symbol table is sometimes called a "static symbol table".

An executable or shared object almost always has a dynamic symtable table, which is represented by a section .dynsym of type SHT_DYNSYM. The dynamic symbol table specifies defined and undefined symbols, which can be seen as its export and import lists. They are needed by runtime relocation processing and symbol binding. A position dependent statically linked executable usually has no dynamic symbol table, because (1) it usually does not need dynamic relocations and (2) there is only one component and every needed symbol is defined internally, no need for symbol binding.

An executable or shared object may optionally have a symbol table of type SHT_SYMTAB. ld produces the symbol table (.symtab) by default. strip can remove it along with .strtab. The static symbol table is a superset of the dynamic symbol table and has many entries (local symbols and other non-exported symbols) not needed by runtime. It has value for symbolization without debug information but otherwise is not useful. Therefore an executable or shared object is usually post-processed by strip --strip-all which can remove .symtab along with .strtab and debug sections.

An archive is like a tarball. It almost always contains multiple relocatable object files. Almost all archives have a symbol index which is a collection of (defined_symbol, member_name) pairs. An archive requires special processing. See Dependency related linker options#Archive processing for details.

Symbols

A symbol table holds an array of entries. Each symbol table entry indicates a symbol. Let's look at the representation of a 64-bit ELF object file:

1
2
3
4
5
6
7
8
typedef struct {
uint32_t st_name;
unsigned char st_info;
unsigned char st_other;
uint16_t st_shndx;
uint64_t st_value;
uint64_t st_size;
} Elf64_Sym;

Here is the description from the ELF specification:

  • st_name: This member holds an index into the object file's symbol string table, which holds the character representations of the symbol names. If the value is non-zero, it represents a string table index that gives the symbol name. Otherwise, the symbol table entry has no name.
  • st_value: This member gives the value of the associated symbol. Depending on the context, this may be an absolute value, an address, and so on; details appear below.
  • st_size: Many symbols have associated sizes. For example, a data object's size is the number of bytes contained in the object. This member holds 0 if the symbol has no size or an unknown size.
  • st_info: This member specifies the symbol's type and binding attributes. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects.
  • st_other: This member currently specifies a symbol's visibility. A list of the values and meanings appears below. The following code shows how to manipulate the values for both 32 and 64-bit objects. Other bits contain 0 and have no defined meaning.
  • st_shndx: Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index. As the sh_link and sh_info interpretation table and the related text describe, some section indexes indicate special meanings. If this member contains SHN_XINDEX, then the actual section header index is too large to fit in this field. The actual value is contained in the associated section of type SHT_SYMTAB_SHNDX.

Explanation:

st_name indicates the name.

st_shndx and st_value indicate whether the symbol is defined or undefined, and the associated section and the offset if defined. If st_shndx==SHN_UNDEF, we say the symbol is undefined. For an undefined symbol foo, we often say the object file references foo. If st_shndx!=SHN_UNDEF, we say the symbol is defined.

Some st_shndx values are special. If st_shndx==SHN_ABS, this is an absolute symbol. If st_shndx==SHN_COMMON, this is a common symbol (FORTRAN COMMON blocks or C tentative definitions). The binding must be STB_GLOBAL. A common symbol can also be represented as hasing a type of STT_COMMON but that is uncommon.

st_info encodes the type and the binding. Among types, STT_FILE, STT_SECTION and STT_TLS are special. Most symbols are of type STT_NOTYPE, STT_OBJECT, and STT_FUNC. Other types are uncommon. The binding is a very important attribute. All of STB_LOCAL, STB_GLOBAL, and STB_WEAK are important. A symbol of binding STB_LOCAL is often called a local symbol. A local symbol must be defined. It is not visible outside the object file, therefore it does contribute to the global symbol table. STB_WEAK represents a weak symbol. See Weak symbol for details. STB_GLOBAL represents a regular symbol visible outside the object file. Both weak and global symbols contribute to the global symbol table.

st_other encodes the visibility. The other bits are used by ppc64 ELFv2, AArch64, MIPS, etc. The visibility attribute represents different symbol resolution strategies for a non-local symbol. The linker only uses the information for a relocatable object file, not for a shared object.

A STV_HIDDEN or STV_INTERNAL symbol will be made STB_LOCAL in the linker output. This provides a mechanism to ensure a relocatable object file symbol will not be visible to other components. A STV_PROTECTED symbol provides a way to defeat performance loss due to symbol interposition for a relocatable object file which will be linked into a shared object. STV_DEFAULT is the default.

If multiple relocatable object files have a non-local symbol, the most constraining visibility will be the visibility in the output. The attributes, ordered from least to most constraining, are: STV_DEFAULT, STV_PROTECTED, STV_HIDDEN, and STV_INTERNAL. For a non-definition declaration in C/C++, we can make it STV_PROTECTED or STV_HIDDEN to ensure the symbol must be defined in the component. Actually, if every undefined is STV_PROTECTED by default, the model will be similar to PE-COFF's non-export by default model.

Symbol resolution

The following pseudocode gives a summary of input file processing in the linker.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
for file in input {
if file is a relocatable object file/bitcode file surrounded by --start-lib {
for sym in file's non-local symbols {
... // Lazy object file extraction may happen
}
} else if file is an archive not surrounded by --whole-archive {
for sym in file's index {
... // Archive member extraction may happen
}

} else if file is a shared object {
for sym in file's .dynsym { ... }

} else if file is a relocatable object file/bitcode file not surrounded by --start-lib {
for sym in file's non-local symbols { ... }
} else if file is an archive surrounded by --whole-archive {
for member in file {
if member is a relocatable object file/bitcode file {
handle member as a regular relocatable object file/bitcode file
}
}

} else {
error
}
}

The linker maintains a global symbol table for STB_GLOBAL and STB_WEAK symbols. The table can be seen as a collection mapping names to states. Each state encodes the symbol kind. In the following list, I place the LLD internal struct name before the description.

  • Undefined: An undefined symbol only referended by shared objects
  • Undefined: An undefined symbol referenced by at least one relocatable object file
  • LazyArchive/LazyObjFile: An entry in an archive index or a definition in a relocatable object file inside a pair of --start-lib --end-lib
  • Shared: A definition in a shared object
  • Defined: A definition in a relocatable object file or LLVM bitcode file

The kinds are listed in increasing priority. If the symbol in the current input file has a higher priority than the global symbol table entry, the global symbol table entry will be overwritten.

Note the first Undefined kind. Such an undefined symbol only referenced by shared objects will not contribute a symbol table entry to the output. It is needed to implement --no-allow-shlib-undefined. Such an undefined symbol can make an executable link to know that the symbol needs to be exported if ends up defined.

We will use some examples to explain the symbol resolution rules.

Duplicate definitions between relocatable object files

Both a.o and b.o define STB_GLOBAL foo. The linker command line is ld a.o b.o.

  • For a.o, insert foo to the global symbol table.
  • For b.o, notice that foo exists. Both are STB_GLOBAL => duplicate definition error.

STB_GLOBAL overrides STB_WEAK between relocatable object files

a.o defines STB_GLOBAL foo. c.o defines STB_WEAK foo. The linker command line is ld a.o c.o.

  • For a.o, insert foo as a Defined to the global symbol table.
  • For c.o, notice that foo exists. The STB_GLOBAL definition takes precedence.

For ld c.o a.o, the existing STB_WEAK definition will be overridden by the incoming STB_GLOBAL definition.

Note: the STB_GLOBAL overriding STB_WEAK rule is between two relocatable object files.

STB_WEAK overrides common between relocatable object files

c.o defines STB_WEAK foo. d.o defines STB_GLOBAL SHN_COMMON foo.

Relocatable object file overriding shared object

a.so defines STB_GLOBAL foo. c.o defines STB_WEAK foo. The linker command line is ld a.so c.o.

  • For a.so, insert foo as a Shared to the global symbol table.
  • For c.o, notice that foo exists. The relocatable object file definition wins.

For ld c.o a.so, the definition in a.so will be ignored.

Note: the binding in a shared object is ignored for symbol resolution. The STB_GLOBAL overriding STB_WEAK rule does not apply, because a shared object is involved.

First shared object wins

a.so defines STB_GLOBAL foo. c.so defines STB_WEAK foo. The linker command line is ld c.so a.so.

  • For c.so, insert foo as a Shared to the global symbol table.
  • For a.so, notice that foo exists. The first shared object wins.

Note: the binding in a shared object is ignored for symbol resolution. The STB_GLOBAL overriding STB_WEAK rule does not apply, because two shared objects are involved.

An undefined symbol in a shared object does not change the binding

w.o references STB_WEAK foo. x.so references STB_GLOBAL foo. The linker command line is ld w.o x.so.

  • For w.o, insert foo as an Undefined to the global symbol table.
  • For x.so, notice that foo exists. The binding is unchanged.

The output binding is STB_WEAK. For an executable link, -z defs is the default. The linker will report an error.

Shared object overriding archive

Both a.o and b.o define foo. b.a contains b.o. The linker command line is ld a.so b.a.

  • For a.so, insert foo as a Shared to the global symbol table.
  • For b.a, try inserting every symbol from the index to the global symbol table. foo is already a shared definition, so a.so wins.

ld a.so --start-lib b.o --end-lib is similar.

0.o references bar. a.o defines foo. b.o defines foo and bar. b.a contains b.o. The linker command line is ld 0.o a.so b.a.

  • For 0.o, insert bar as an Undefined to the global symbol table.
  • For a.so, insert foo as a Shared to the global symbol table.
  • For b.a, try inserting every symbol from the index to the global symbol table. foo is already a shared definition. bar can resolve an Undefined, so the member providing bar (b.o) is extracted.
  • b.a(b.o) is extracted and added like a relocatable object file. Its foo definition overrides the Shared entry in the global symbol table.

The linker command line is ld 0.o a.so b.a.

  • For 0.o, insert bar as an Undefined to the global symbol table.
  • For b.a, b.a(b.o) is extracted.
  • b.a(b.o) is extracted and added like a relocatable object file. Its foo definition overrides the Undefined entry in the global symbol table.
  • For a.so, its shared definition loses to the Defined entry in the global symbol table.

m.o defines memcpy. libc.a(memcpy.o) defines memcpy. The linker command line is ld ... m.o -lc.

  • For m.o, insert memcpy as a Defined to the global symbol table.
  • For libc.a, try inserting every symbol from the index to the global symbol table. Some members are extracted. However, because no symbol defined by memcpy.o is Undefined in the global symbol table, memcpy.o is not extracted.

As a result, m.o succeeds in shadowing libc.a(memcpy.o). In practice, m.o may be an object file providing more mem* optimized routines. As long as m.o is before libc.a and m.o defines all libc.a(memcpy.o) symbols which may be referenced, this interposing scheme will be reliable.

PE-COFF

Mach-O


文章来源: http://maskray.me/blog/2021-06-20-symbol-processing
如有侵权请联系:admin#unsafe.sh