-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper tail calls #1888
Proper tail calls #1888
Conversation
It should be possible to implement tail calls as some sort of transformation in rustc itself, predicated on the backend allowing manipulation of the stack and supporting some form of goto. I assume that WebAssembly at least would allow for us to write our own method calls, but haven't looked at it. C is a problem, save in the case that the become keyword is used on the function we are in (there is a name for this that I'm forgetting). In that case, you may be able to just reassign to the arguments and reuse variables. More advanced constructions might be possible by wrapping the tail calls in some sort of outer running loop and returning codes as to which to call next, but this doesn't adequately address how to go about passing arguments around. I wouldn't rule out being able to support this in ANSI C, it's just incredibly tricky. |
Okay, an outline for a specific scheme in C:
I believe this scheme works in all cases. We can deal with the issue of things that have Unless I'm missing something obvious, anyway. |
@camlorn Does that work for function pointers? I don't immediately see any reason it wouldn't, just the "Build a list of the possible tail calls that may be reached from each of these functions." snippet which sticks out otherwise, because in the interesting cases it's presumably "all of them"? (Also, this feels very like defunctionalization? Is it?) |
@glaebhoerl I don't know what defunctionalization is. Is this defunctionalization? I'll get back to you once I learn a new word. |
Is there any benchmarking data regarding the callee-pops calling convention? |
@ranma42 Also, why does it matter here? |
@glaebhoerl Instead of making the outer loop be inside a function that declares all the needed variables, make the outer loop something that expects a struct morally equivalent to the tuple I don't think this is defunctionalization, based off a quick google of that term. |
@camlorn That is my opinion, too, but it is mentioned as "one major drawback of proper tail calls" in the current RFC |
0000-template.md
Outdated
Later phases in the compiler assert that these requirements are met. | ||
|
||
New nodes are added in HIR and HAIR to correspond to `become`. In MIR, however, | ||
a new flag is added to the `TerminatorKind::Call` varient. This flag is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: varient -> variant
@camlorn @ranma42 The drawback of a callee-pops calling convention is that for caller-pops calling conventions, much of the stack pointer motion can be eliminated by the optimizer, since it is all in one function. However, with a callee-pops calling convention, you might be able to do the same thing in the callee – but I don't think you gain anything except on Windows, due to the red zone which Windows doesn't have. I really don't know what I am talking about on the performance front though. Easy way to find out would be to patch the LLVM bindings that Rust uses to always enable tail calls at the LLVM level, then build the compiler, and finally see if the modified compiler is faster or slower than the original. |
@camlorn My intent was that one can |
@DemiMarie I think my latest idea works out, but I'm not quite sure where you put the supporting structs without heap allocation. I do agree that not being able to do it in all backends might sadly be a deal breaker. Is there a reason that Rustc doesn't already always enable tail calls in release mode? |
0000-template.md
Outdated
[implementation]: #implementation | ||
|
||
A current, mostly-functioning implementation can be found at | ||
[DemiMarie/rust/tree/explicit-tailcalls](/DemiMarie/rust/tree/explicit-tailcalls). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 404s for me.
Is there any particular reason this RFC specifies that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note:
You shouldn't be changing the template file, but rather copying the template to a new file (0000-proper-tail-calls.md
) and changing that!
I wonder if one can simulate the behavior of computed goto dispatch using these tail calls. That would be pretty neat indeed! |
@archshift There is a better way to do that (get rustc to emit the appropriate LLVM IR for a loop wrapped around a match when told to do so, perhaps by an attribute). |
@archshift done. |
As a non-FP/non-PL person, it would be really nice to see some concrete examples of where |
@Stebalien a case where they are typically nicer than a loop is when they are used to encode (the states of a) state machine. That is because instead of explicitly looping and changing the state, it is sufficient to call the appropriate function (i.e. the state is implicitly encoded by the function being run at that time). Note that this often makes it easier for the compiler to detect optimisation opportunities, as in some cases a state can trivially be inlined. |
@ranma42 I see. Usually, I'd just put the state in an enum and use a |
Should this RFC include at least one example of what this syntax looks like in use? (e.g. an entire function body) |
A good example snippet would go a long way. 👍 overall as the surface is fairly small and really helps rust functional mojo. |
Pinging @thepowersgang because they are the only person working on an alternative Rust compiler to the best of my knowledge, and because since their compiler (mrustc) compiles via C they would need to implement one of the above solutions. |
As an aside, are there any directives that make LLVM leave as little as possible on the stack? Ideally profiler guided optimizations would attempt to detect where such places belong. |
Elsewhere they are discussing the https://releases.llvm.org/13.0.0/tools/clang/docs/ReleaseNotes.html#major-new-features https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html https://reviews.llvm.org/D107872 Perhaps we could introduce in Nightly a similar Rust attribute like |
There is already a keyword reserved for tail-calls Note that rustc itself is unlikely to perform the replacements, it would typically leave this up to LLVM. What rustc should do, however, is to check the feasibility; typically, such tail calls require that no destructor be executed "around" the call, and it seems from the original article that the function to be called must have the same arguments as the function calling it. |
One advantage of having |
If |
@timthelion what is meant is that rustc will tell LLVM to guarantee it, since for a function that isn't inlined this out of rustc's hands. The guarantee is still made, however. |
Should this issue still be closed considering it seems there's at least some movement happening on it? Or has another issue number taken over (and if so could the OP be updated)? |
Is there anything that can be done/contributed to help move this feature forward, even if it's just on nightly or gated behind some In this PR to optimise rust-decimal's string parsing the final optimisation involves transforming to a tail-call based parser which cuts runtime by 30-50% (130ns -> 70ns) depending on platform, and would be quite ugly if not impossible to cleanly/efficiently implement with a loop/switch based construct. We also have some internal code that sees a ~30% improvement from tail-call based dispatch and is fundamentally impossible to transform into a fast loop/switch since the targets aren't known at runtime. Trusting the optimiser works for our internal code since it's not general purpose, we can track via benchmarks, and have a clean way to partially switch to looping in debug. However, for something like rust-decimal which is intended to be a general purpose library, the tradeoffs here are harder to evaluate if there's no guarantee. |
@TradingTomatillo I think that rust is open source, so if you want this internally than you can just edit the source code. If you release your patches publicly, then they'll be at worst like linux-rt, a long living patch set that those who need it can apply. Fork it and fix it, please! |
Time to reopen this? |
@DemiMarie Oh yea |
So I was thinking some about the calling convention problems, and my current thought is that this is really a problem for the backend code generator. Ideally, any function could be tail-called without restriction, but given that this will be difficult to implement, I think there could be a few steps:
|
If you have an explicit group of functions all in one crate, rustc should be able to relatively easily transform that to a trampoline function that does a switch in a loop, avoiding needing llvm support for tail-calls on all platforms in all scenarios. something like: #[tail_call_group = "my_tail_group"]
pub fn func1(state: &mut State) -> io::Result<String> {
if state.call_func2() {
become func2(state);
}
Ok("example".to_string())
}
#[tail_call_group = "my_tail_group"]
pub fn func2(state: &mut State) -> io::Result<String> {
let buf = fs::read("file.txt")?;
if state.call_func1(buf) {
become func1(state);
}
become func3(state);
}
#[tail_call_group = "my_tail_group"]
pub fn func3(state: &mut State) -> io::Result<String> {
state.in_func3()?;
become func1(state);
} The compiler could transform it when necessary into: enum MyTailGroup<'a> {
CallFunc1(&'a mut State),
CallFunc2(&'a mut State),
CallFunc3(&'a mut State),
Return(io::Result<String>),
}
impl MyTailGroup<'_> {
fn run(self) -> io::Result<String> {
let mut state = self;
loop {
state = match state {
Self::CallFunc1(arg1) => func1_internal(arg1),
Self::CallFunc2(arg1) => func2_internal(arg1),
Self::CallFunc3(arg1) => func3_internal(arg1),
Self::Return(v) => return v,
};
}
}
}
pub fn func1(state: &mut State) -> io::Result<String> {
MyTailGroup::CallFunc1(state).run()
}
pub fn func2(state: &mut State) -> io::Result<String> {
MyTailGroup::CallFunc2(state).run()
}
pub fn func3(state: &mut State) -> io::Result<String> {
MyTailGroup::CallFunc3(state).run()
}
fn func1_internal(state: &mut State) -> MyTailGroup<'_> {
if state.call_func2() {
return MyTailGroup::CallFunc2(state);
}
MyTailGroup::Return(Ok("example".to_string()))
}
fn func2_internal(state: &mut State) -> io::Result<String> {
let buf = match fs::read("file.txt") {
Ok(v) => v,
Err(v) => return MyTailGroup::Return(Err(v)),
};
if state.call_func1(buf) {
return MyTailGroup::CallFunc1(state);
}
return MyTailGroup::CallFunc3(state);
}
fn func3_internal(state: &mut State) -> io::Result<String> {
match state.in_func3() {
Ok(_) => {},
Err(v) => return MyTailGroup::Return(Err(v)),
}
return MyTailGroup::CallFunc1(state);
} |
I wanted to share this c23 proposal to make sure it's in this thread: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf. I'm not adding anything constructive to this discussion, but I would like to see rust become more functional in the long run, and I believe that proper tail-call optimization would help rust achieve that goal. Hopefully, we will start to see more progress as I'm very excited about this feature -- thank you! |
@DemiMarie Thank you for revisiting this issue! I too am very keen on getting guaranteed tail calls in Rust. Maybe it would make sense to crop the scope of this issue to just step 1 and once that is done create another issue for extensions:
For sake of simplicity let us not focus on the generalized tail calls for now since platform availability is a real concern. @programmerjake First, thank you for the clarifications with respect to targets that do not support tail calls. I know what follows is pure bike shedding but I think
|
grouping functions across crates is much more complex than within a single crate, since rustc would have to collect all crates and generate code for them all as part of the last invocation of rustc where it can actually see all tail-callable functions. This also makes compilation much slower since codegen can't usually be reused for all your dependencies for each incremental compilation. that's why I wanted to limit groups to not go between crates. I'd be fine having each crate be a group for initial implementation though, since that avoids cross-crate tail calls while not needing special annotation of which group tail calls are in.
|
Within a single crate, cant tail groups be determined by static analysis by looking for connected components in the call graph with the same return type? |
not completely, calls through function pointers/ |
though you would also want to require identical calling convention and argument types, not just return types. |
Fair enough, I've never needed that personally; Therefore I wonder if such an analysis wouldnt provide quite a reasonable default. |
I actually want to support tail calls to functions not known at compile time. That could be a later stage, though. |
In case the WebAssembly target is a blocker for this feature there very recently was progress towards standardization of
@DemiMarie Sounds like this would require the let ops: &[#[may_tail] fn()] = &[ ... ]; In order to be able to call any of those functions in |
As a general approach, we might start from the most basic case and grow from there. This most basic case is a function, which contains no droppable values, and recuresively calls other functions of same arguments. Here, the rustc implementation is only about threading the required information through ast-> hir -> mir -> llvm, and performing necessary checks. Then we can introduce the capability to run in scopes with droppable values, but requiring the programmer to drop the values explicitly beforehand. And only after we have some experience with explicitly dropping values, we might introduce the changes to drop order with full implementation. All of these constraints would be a hard error, and would allow us to get the MVP. Interaction with async, closures and other features is an unknown area, so explicit tail calls should be disabled in those scopes. Anyway I had some time yesterday, so I started on the frontend work. Feel free to take it or contribute. |
Closures should just work. Async is trickier because the generated state machine needs to be aware that the tail call took place, but should be possible to implement.
I actually wonder if |
@semtexzv I implemented tail-calls myself in rust-lang/rust@master...DemiMarie:rust:explicit-tailcalls. That implementation is over five years old, so I do not know if it would actually help. |
This commit is a port of the changes done by @semtexzv, as described in this comment: rust-lang/rfcs#1888 (comment) All locations that need further work are marked with: FIXME(explicit_tail_calls)
Rendered