use a global offsets table for incremental linking and hot code swapping #5260

andrewrk · 2020-05-03T06:41:43Z

Related issues: #1535 and #68

Here's an example of how to lay out code in the binary:

example.zig

const msg = "Hello, World!\n";

export fn _start() callconv(.Naked) {
    _ = syscall3(SYS_write, 1, msg, msg.len);
    exit();
}

fn exit() noreturn {
    syscall1(SYS_exit_group, 0);
}

fn syscall1(number: SYS, arg1: usize) usize {
    return asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (@enumToInt(number)),
          [arg1] "{rdi}" (arg1)
        : "rcx", "r11", "memory"
    );
}

fn syscall3(number: SYS, arg1: usize, arg2: usize, arg3: usize) usize {
    return asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (@enumToInt(number)),
          [arg1] "{rdi}" (arg1),
          [arg2] "{rsi}" (arg2),
          [arg3] "{rdx}" (arg3)
        : "rcx", "r11", "memory"
    );
}

example.asm

; table of offsets
_exit: offsetof exit
_msg: offsetof msg
_syscall1: offsetof syscall1
_syscall3: offsetof syscall3

msg:
.ascii "Hello, World!\n"

syscall1:
mov rax, rdi
mov rdi, rsi
syscall
ret

syscall3:
mov rax, rdi
mov rdi, rsi
mov rsi, rdx
mov rdx, rcx
syscall
ret

exit:
mov edi, 0x3c
xor esi, esi
call [_syscall1]

_start:
mov edi, 0x1
mov esi, 0x1
mov rdx, [_msg]
mov ecx, 0xe
call [_syscall3]
call [_exit]

The main idea here is that (1) each global declaration in zig directly maps to a symbol in the output binary and (2) within each symbol, the code is Position Independent, not only with respect to itself, but also with respect to the other symbols it references. Each reference to a global declaration is indirect, through the table of offsets. This accomplishes 2 things:

A symbol can be relocated within the output binary, e.g. imagine that "Hello, world!" was changed to "lllll, world!". And further imagine that the symbol offsets table needed to grow, so that the msg symbol needed to move to later in the file, below _start. The only thing that would need to change is (1) moving the msg data to the new location, and (2) updating the table of offsets. Even if msg was referenced 100 times, those 100 references would be unchanged, despite the fact that it moved around in the address space.
Really powerful hot code swapping. The process could be paused with e.g. ptrace, and then updated symbols would, rather than being updated in place, be appended only. The table of offsets would be updated to point to the new symbols. The process would then be resumed. Function calls which were in-progress (all the way up the stack) would complete using the old function code, however new function calls would call the new function. Any instruction addresses captured for debug info purposes would be valid throughout any number of hot code swaps. There are a lot of different ways this could go, but this demonstrates what a table of offsets accomplishes.

I'm pretty sure I just reinvented the Global Offset Table so it might make sense to read about how that works and just use that.

The text was updated successfully, but these errors were encountered:

andrewrk · 2020-05-17T17:57:38Z

Landed in 16f100b

This is working out nicely. Functions will use .got.plt and use a "trampoline" style, so that the codegen can be a jmp to a hard coded addr in the offset table, which will have a call to a hard coded addr. So the CPU sees only direct jumps to hard coded addresses, but it's still "indirect" in the sense that there is only 1 place for codegen to edit.

tbodt · 2020-08-08T20:11:15Z

Did you measure the performance impact? There's gotta be some; there's now an extra instruction in every function call, and the code size is increased by all the trampolines.

cdunham · 2021-03-12T02:15:52Z

Reflecting @tbodt 's concern -- using the GOT to get the entry point to branch to using an indirect is more expensive on the first call, since it resolves in the backend rather than the front-end, but uses only one BTB entry. So the trampoline method works the CPU front end a bit harder in steady state. This would be most impactful in code that has many very short functions/methods where you would pay the two- vs. one-branch overhead more often and stress BTB capacity with the extra entry for each trampoline. As for the cache impact of the trampolines, the trampoline method forces them to live on the instruction side, whereas with indirects the GOT offsets can live on the data side. Obviously these impacts will vary greatly across microarchitectures and workloads.

andrewrk · 2021-03-12T04:39:54Z

This strategy is used for scopes that are configured to be compiled in Debug mode. I expect in real world use cases, applications will compile some, perhaps most, of their dependency packages in one of the release modes, as well as the hot code paths of the application. With per-scope granularity of optimization mode, I expect the entire surface area of hot paths to be fully optimized, so these concerns should not apply there.

andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels May 3, 2020

andrewrk added this to the 0.7.0 milestone May 3, 2020

andrewrk added the accepted This proposal is planned. label May 17, 2020

andrewrk closed this as completed May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a global offsets table for incremental linking and hot code swapping #5260

use a global offsets table for incremental linking and hot code swapping #5260

andrewrk commented May 3, 2020

andrewrk commented May 17, 2020

tbodt commented Aug 8, 2020

cdunham commented Mar 12, 2021

andrewrk commented Mar 12, 2021

use a global offsets table for incremental linking and hot code swapping #5260

use a global offsets table for incremental linking and hot code swapping #5260

Comments

andrewrk commented May 3, 2020

example.zig

example.asm

andrewrk commented May 17, 2020

tbodt commented Aug 8, 2020

cdunham commented Mar 12, 2021

andrewrk commented Mar 12, 2021