Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a global offsets table for incremental linking and hot code swapping #5260

Closed
andrewrk opened this issue May 3, 2020 · 4 comments
Closed
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented May 3, 2020

Related issues: #1535 and #68

Here's an example of how to lay out code in the binary:

example.zig

const msg = "Hello, World!\n";

export fn _start() callconv(.Naked) {
    _ = syscall3(SYS_write, 1, msg, msg.len);
    exit();
}

fn exit() noreturn {
    syscall1(SYS_exit_group, 0);
}

fn syscall1(number: SYS, arg1: usize) usize {
    return asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (@enumToInt(number)),
          [arg1] "{rdi}" (arg1)
        : "rcx", "r11", "memory"
    );
}

fn syscall3(number: SYS, arg1: usize, arg2: usize, arg3: usize) usize {
    return asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (@enumToInt(number)),
          [arg1] "{rdi}" (arg1),
          [arg2] "{rsi}" (arg2),
          [arg3] "{rdx}" (arg3)
        : "rcx", "r11", "memory"
    );
}

example.asm

; table of offsets
_exit: offsetof exit
_msg: offsetof msg
_syscall1: offsetof syscall1
_syscall3: offsetof syscall3

msg:
.ascii "Hello, World!\n"

syscall1:
mov rax, rdi
mov rdi, rsi
syscall
ret

syscall3:
mov rax, rdi
mov rdi, rsi
mov rsi, rdx
mov rdx, rcx
syscall
ret

exit:
mov edi, 0x3c
xor esi, esi
call [_syscall1]

_start:
mov edi, 0x1
mov esi, 0x1
mov rdx, [_msg]
mov ecx, 0xe
call [_syscall3]
call [_exit]

The main idea here is that (1) each global declaration in zig directly maps to a symbol in the output binary and (2) within each symbol, the code is Position Independent, not only with respect to itself, but also with respect to the other symbols it references. Each reference to a global declaration is indirect, through the table of offsets. This accomplishes 2 things:

  • A symbol can be relocated within the output binary, e.g. imagine that "Hello, world!" was changed to "lllll, world!". And further imagine that the symbol offsets table needed to grow, so that the msg symbol needed to move to later in the file, below _start. The only thing that would need to change is (1) moving the msg data to the new location, and (2) updating the table of offsets. Even if msg was referenced 100 times, those 100 references would be unchanged, despite the fact that it moved around in the address space.

  • Really powerful hot code swapping. The process could be paused with e.g. ptrace, and then updated symbols would, rather than being updated in place, be appended only. The table of offsets would be updated to point to the new symbols. The process would then be resumed. Function calls which were in-progress (all the way up the stack) would complete using the old function code, however new function calls would call the new function. Any instruction addresses captured for debug info purposes would be valid throughout any number of hot code swaps. There are a lot of different ways this could go, but this demonstrates what a table of offsets accomplishes.

I'm pretty sure I just reinvented the Global Offset Table so it might make sense to read about how that works and just use that.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels May 3, 2020
@andrewrk andrewrk added this to the 0.7.0 milestone May 3, 2020
@andrewrk andrewrk added the accepted This proposal is planned. label May 17, 2020
@andrewrk
Copy link
Member Author

Landed in 16f100b

This is working out nicely. Functions will use .got.plt and use a "trampoline" style, so that the codegen can be a jmp to a hard coded addr in the offset table, which will have a call to a hard coded addr. So the CPU sees only direct jumps to hard coded addresses, but it's still "indirect" in the sense that there is only 1 place for codegen to edit.

@tbodt
Copy link

tbodt commented Aug 8, 2020

Did you measure the performance impact? There's gotta be some; there's now an extra instruction in every function call, and the code size is increased by all the trampolines.

@cdunham
Copy link

cdunham commented Mar 12, 2021

Reflecting @tbodt 's concern -- using the GOT to get the entry point to branch to using an indirect is more expensive on the first call, since it resolves in the backend rather than the front-end, but uses only one BTB entry. So the trampoline method works the CPU front end a bit harder in steady state. This would be most impactful in code that has many very short functions/methods where you would pay the two- vs. one-branch overhead more often and stress BTB capacity with the extra entry for each trampoline. As for the cache impact of the trampolines, the trampoline method forces them to live on the instruction side, whereas with indirects the GOT offsets can live on the data side. Obviously these impacts will vary greatly across microarchitectures and workloads.

@andrewrk
Copy link
Member Author

This strategy is used for scopes that are configured to be compiled in Debug mode. I expect in real world use cases, applications will compile some, perhaps most, of their dependency packages in one of the release modes, as well as the hot code paths of the application. With per-scope granularity of optimization mode, I expect the entire surface area of hot paths to be fully optimized, so these concerns should not apply there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

3 participants