I do not use Apple products, but I sometimes like investigating Mach-O as an object file format and my llvm-project changes sometimes need to work around the quirks.
LLVM has a function call tracing system called XRay. It supports many
architectures on Linux and some BSDs but does not support Apple systems.
If the target triple is x86_64-apple-darwin*
, you may
notice that Clang will allow you to perform compilation, but linking
will fail. For other architectures, Clang will reject it.
1 | % clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c |
So I dove down the rabbit hole.
1 | .section __DATA,xray_instr_map |
.quad Lxray_sled_0-Ltmp0
is represented as a pair of
relocations (llvm-readobj -r a.o
)
1 | 0x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map |
X86_64_RELOC_SUBTRACTOR
is an external relocation
(r_extern==1
) where r_symbolnum
references a
symbol table entry. Linkers will give an error if
r_extern=0
. The symbols with the "L" prefix are called
temporary symbols in LLVMMC and are not present in the symbol table.
LLVM integrated assembler tries to convert the subtractor symbol to an
atom, that is, a non-temporary symbol defined in the same section.
However, since xray_instr_map
does not define a
non-temporary symbol, the X86_64_RELOC_SUBTRACTOR
relocation will have no associated symbol and its r_extern
will be 0.
To fix this issue, we need to define a non-temporary symbol. We can
accomplish this by renaming Lxray_sleds_start0
to
lxray_sleds_start0
. In LLVMMC,
LinkerPrivateGlobalPrefix
is set to "l" for Apple targets.
We can define an overload of
MCContext::createLinkerPrivateTempSymbol(const Twine &Name)
to allow LLVMMC to select an unused symbol starting with
lxray_sleds_start
. (There is a pitfall: "ltmp" should be
compiler internal.) For ELF targets, the
MCContext::createLinkerPrivateTempSymbol
function creates a
temporary symbol starting with ".L".
Oleksii Lozovskyi reported that the
-fxray-function-index
option has been broken.
- (default): no function index
-fxray-function-index
: no function index-fno-xray-function-index
:xray_fn_idx
section is present
-fxray-function-index
was the default. It turns out that
a clangDriver
refactoring accidentally caused this regression, but the negative
variable name was probably the main reason. XRay tests were not great
and there was no driver test to catch this. I fixed
this.
Now that -fxray-function-index
is back, we get the
xray_fn_idx
section by default. The section contains
entries like the following:
1 | .section __DATA,xray_fn_idx |
BTW: I noticed an old workaround (2015) for ld64 and proposed to remove it: https://reviews.llvm.org/D152831.
These absolute addresses require rebase opcodes in the special
section __LINKEDIT,__rebase
. This is not great and I wanted
to fix it back in 2020 but never got around to do it. This motivated
me to actually fix the issue and create https://reviews.llvm.org/D152661 to change the
[start,end)
representation to the
(pc_relative_start, size)
representation.
My initial attempt somehow wrote something like this. I took a difference of two labels, and right shifted it by 5 to get the number of sleds.
1 | .section __DATA,xray_fn_idx,regular,live_support |
This approach works on ELF targets but not on Mach-O targets due to a pile of assembler issues.
1 | % clang -c --target=x86_64-apple-darwin a.s |
Assembler issues
When assembling an assembly file into an object file, an expression can be evaluated in multiple steps. Two steps are particularly important:
- Parsing time. At this stage, We have a
MCAssembler
object but noMCAsmLayout
object. Instruction operands and certain directives like.if
require the ability to evaluate an expression early. - Object file writing time. At this stage, we have both a
MCAssembler
object and aMCAsmLayout
object. TheMCAsmLayout
object provides information about the offset of each fragment.
The first issue is not specific to this case and is also encountered
in ELF. The following assembly code should assemble to the hex pairs
01000001, but Clang fails to compute .if .-1b == 3
.
1 | % cat x.s |
Jian Cai implemented limited expression folding support to LLVM integrated assembler to support the Linux kernel arm use case.
1 | arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression |
I have added support for MCFillFragment
(.space
and .fill
) and for A-B, where A is a
pending label (which will be reassigned to a real fragment in
flushPendingLabels()
). Now, the LLVM integrated assembler
can successfully assemble x.s
when a
MCAssembler
object is present. However, evaluation still
does not work without a MCAssembler
object, which is
expected.
1 | % llvm-mc x.s -filetype=null |
Then I noticed a potential pitfall for Mach-O
in
MCSection::flushPendingLabels
. When flushing pending
labels, it did not ensure that the new fragment inherits the previous
atom symbol. I fixed this issue, although I haven't been able to create
a test case to verify this behavior.
After this fix,
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
can
be successfully assembled. However, during "direct
object emission", an error
expected relocatable expression
will be reported. The issue
is quite subtle.
In the case of direct object emission, where LLVM IR is directly
lowered to an object file bypassing assembly (e.g., using
clang -c a.c
instead of clang -c a.s
or
clang -c --save-temps a.c
), the assembler information is
not used for parsing
(MCStreamer::UseAssemblerInfoForParsing
). As a result, the
assembly code
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
will
be transformed into a fixup.
During object writing time, we have a MCAsmLayout
object
and atom information for fragments. However, the label
Lxray_sleds_end0
will belong to the next fragment, causing
the condition in
MachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl
to
fail. In my opinion, it may be necessary to relax the condition in this
case.
Linker dead stripping
To support linker dead stripping, also known as linker garbage
collection, we need to add the S_ATTR_LIVE_SUPPORT
attribute to the two sections xray_instr_map
and
xray_fn_idx
.
Runtime issue
compiler-rt/lib/xray/xray_trampoline_x86_64.S
used
.Ltmp*
symbols which are temporary for ELF but
non-temporary for Mach-O. The non-temporary labels become atoms and can
cause bad dead stripping behaviors.
I fixed the problem by using the LOCAL_LABEL
macro,
which generates an "L" symbol specifically for Mach-O.
Driver change
After AArch64 works, we can make Clang Driver accept
--target=arm64-apple-darwin
for XRay.