-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use a global offsets table for incremental linking and hot code swapping #5260
Comments
Landed in 16f100b This is working out nicely. Functions will use .got.plt and use a "trampoline" style, so that the codegen can be a jmp to a hard coded addr in the offset table, which will have a call to a hard coded addr. So the CPU sees only direct jumps to hard coded addresses, but it's still "indirect" in the sense that there is only 1 place for codegen to edit. |
Did you measure the performance impact? There's gotta be some; there's now an extra instruction in every function call, and the code size is increased by all the trampolines. |
Reflecting @tbodt 's concern -- using the GOT to get the entry point to branch to using an indirect is more expensive on the first call, since it resolves in the backend rather than the front-end, but uses only one BTB entry. So the trampoline method works the CPU front end a bit harder in steady state. This would be most impactful in code that has many very short functions/methods where you would pay the two- vs. one-branch overhead more often and stress BTB capacity with the extra entry for each trampoline. As for the cache impact of the trampolines, the trampoline method forces them to live on the instruction side, whereas with indirects the GOT offsets can live on the data side. Obviously these impacts will vary greatly across microarchitectures and workloads. |
This strategy is used for scopes that are configured to be compiled in Debug mode. I expect in real world use cases, applications will compile some, perhaps most, of their dependency packages in one of the release modes, as well as the hot code paths of the application. With per-scope granularity of optimization mode, I expect the entire surface area of hot paths to be fully optimized, so these concerns should not apply there. |
Related issues: #1535 and #68
Here's an example of how to lay out code in the binary:
example.zig
example.asm
The main idea here is that (1) each global declaration in zig directly maps to a symbol in the output binary and (2) within each symbol, the code is Position Independent, not only with respect to itself, but also with respect to the other symbols it references. Each reference to a global declaration is indirect, through the table of offsets. This accomplishes 2 things:
A symbol can be relocated within the output binary, e.g. imagine that "Hello, world!" was changed to "lllll, world!". And further imagine that the symbol offsets table needed to grow, so that the
msg
symbol needed to move to later in the file, below_start
. The only thing that would need to change is (1) moving themsg
data to the new location, and (2) updating the table of offsets. Even ifmsg
was referenced 100 times, those 100 references would be unchanged, despite the fact that it moved around in the address space.Really powerful hot code swapping. The process could be paused with e.g.
ptrace
, and then updated symbols would, rather than being updated in place, be appended only. The table of offsets would be updated to point to the new symbols. The process would then be resumed. Function calls which were in-progress (all the way up the stack) would complete using the old function code, however new function calls would call the new function. Any instruction addresses captured for debug info purposes would be valid throughout any number of hot code swaps. There are a lot of different ways this could go, but this demonstrates what a table of offsets accomplishes.I'm pretty sure I just reinvented the Global Offset Table so it might make sense to read about how that works and just use that.
The text was updated successfully, but these errors were encountered: