Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper tail calls #1888

Closed
wants to merge 8 commits into from
Closed

Proper tail calls #1888

wants to merge 8 commits into from

Conversation

DemiMarie
Copy link

@DemiMarie DemiMarie commented Feb 7, 2017

@ahicks92
Copy link

ahicks92 commented Feb 7, 2017

It should be possible to implement tail calls as some sort of transformation in rustc itself, predicated on the backend allowing manipulation of the stack and supporting some form of goto. I assume that WebAssembly at least would allow for us to write our own method calls, but haven't looked at it.

C is a problem, save in the case that the become keyword is used on the function we are in (there is a name for this that I'm forgetting). In that case, you may be able to just reassign to the arguments and reuse variables. More advanced constructions might be possible by wrapping the tail calls in some sort of outer running loop and returning codes as to which to call next, but this doesn't adequately address how to go about passing arguments around. I wouldn't rule out being able to support this in ANSI C, it's just incredibly tricky.

@ahicks92
Copy link

ahicks92 commented Feb 7, 2017

Okay, an outline for a specific scheme in C:

  • Post-monomorphization, find all functions that use become. Build a list of the possible tail calls that may be reached from each of these functions.

  • Declare a C variable for all variables in all the functions making up the set. Ad a variable, code, that says which function we're in. code of 0 means we're done. Add another variable, ret, to hold the return value.

  • Give each function a nonzero code and copy the body into a while loop that dispatches based on the code variable. When the while loop exits, return ret. Any instances of return are translated into an assignment to ret and setting code to 0.

  • When entering a tailcall function, redirect the call to the special version instead.

I believe this scheme works in all cases. We can deal with the issue of things that have Drop impls by dropping them: the variable can stay around without a problem, as long as the impl gets called. The key point is that we're declaring the slots up front so that the stack doesn't keep growing. The biggest disadvantage is that we have to declare all the slots for all the functions, and consequently the combined stack frame is potentially (much?) larger than if we had done it in the way the RFC currently proposes. If making sure there is parody in terms of performance is a concern, this could be the used scheme in all backends. Nonetheless, it works for any backend which C can be compiled to.

Unless I'm missing something obvious, anyway.

@glaebhoerl
Copy link
Contributor

@camlorn Does that work for function pointers? I don't immediately see any reason it wouldn't, just the "Build a list of the possible tail calls that may be reached from each of these functions." snippet which sticks out otherwise, because in the interesting cases it's presumably "all of them"?

(Also, this feels very like defunctionalization? Is it?)

@ahicks92
Copy link

ahicks92 commented Feb 7, 2017

@glaebhoerl
Can you become a function pointer? This was not how I read the RFC, though it would make sense if this were the case. Nonetheless, you are correct: if function pointers are allowed, this probably does indeed break my scheme. It might be possible to get around it, somehow.

I don't know what defunctionalization is. Is this defunctionalization? I'll get back to you once I learn a new word.

@ranma42
Copy link
Contributor

ranma42 commented Feb 7, 2017

Is there any benchmarking data regarding the callee-pops calling convention?
AFAICT Windows uses such a calling convention stdcall for most APIs.
I have repeatedly looked for benchmarks comparing stdcall to cdecl, but I have only found minor differences (in either direction, possibly related to the interaction with optimisations) and I was unable to find something providing a conclusive answer on which one results in better performance.

@ahicks92
Copy link

ahicks92 commented Feb 7, 2017

@ranma42
I'm not sure why there would be a difference: either you do your jmp for return and then pop or you pop and then do your jmp for return, but in either case someone is popping the same amount of stuff?

Also, why does it matter here?

@ahicks92
Copy link

ahicks92 commented Feb 7, 2017

@glaebhoerl
Apparently today is idea day:

Instead of making the outer loop be inside a function that declares all the needed variables, make the outer loop something that expects a struct morally equivalent to the tuple (int code, void* ptr, void* args), then have it cast ptr to the appropriate function pointer type by switching on code, cast args to a function-pointer-specific argument structure, then call the function pointer. It should be possible to get the args struct to be inline as opposed to an additional level of indirection somehow, but I'm not sure how to do it without violating strict aliasing. This has the advantage of making the stack frame roughly the same size as what it would be in the LLVM backend, but the disadvantage of being slower (but maybe we can sometimes use the faster while-loop with switch statement approach).

I don't think this is defunctionalization, based off a quick google of that term.

@ranma42
Copy link
Contributor

ranma42 commented Feb 7, 2017

@camlorn That is my opinion, too, but it is mentioned as "one major drawback of proper tail calls" in the current RFC

0000-template.md Outdated
Later phases in the compiler assert that these requirements are met.

New nodes are added in HIR and HAIR to correspond to `become`. In MIR, however,
a new flag is added to the `TerminatorKind::Call` varient. This flag is only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: varient -> variant

@DemiMarie
Copy link
Author

DemiMarie commented Feb 7, 2017

@camlorn @ranma42 The drawback of a callee-pops calling convention is that for caller-pops calling conventions, much of the stack pointer motion can be eliminated by the optimizer, since it is all in one function. However, with a callee-pops calling convention, you might be able to do the same thing in the callee – but I don't think you gain anything except on Windows, due to the red zone which Windows doesn't have.

I really don't know what I am talking about on the performance front though. Easy way to find out would be to patch the LLVM bindings that Rust uses to always enable tail calls at the LLVM level, then build the compiler, and finally see if the modified compiler is faster or slower than the original.

@DemiMarie
Copy link
Author

@camlorn My intent was that one can become any function or method that uses the Rust ABI or the rust-call ABI (both of which lower to LLVM fastcc), provided that the return types match. Haven't thought about function pointers, but I believe that tail calls on trait object methods are an equivalent problem.

@ahicks92
Copy link

ahicks92 commented Feb 7, 2017

@DemiMarie
Good point. They are.

I think my latest idea works out, but I'm not quite sure where you put the supporting structs without heap allocation. I do agree that not being able to do it in all backends might sadly be a deal breaker.

Is there a reason that Rustc doesn't already always enable tail calls in release mode?

0000-template.md Outdated
[implementation]: #implementation

A current, mostly-functioning implementation can be found at
[DemiMarie/rust/tree/explicit-tailcalls](/DemiMarie/rust/tree/explicit-tailcalls).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 404s for me.

@cramertj
Copy link
Member

cramertj commented Feb 8, 2017

Is there any particular reason this RFC specifies that become should be implemented at an LLVM level rather than through some sort of MIR transformation? I don't know how they work, but it seems like maybe StorageLive and StorageDead could be used to mark the callee's stack as expired prior to the function call.

Copy link
Contributor

@archshift archshift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note:

You shouldn't be changing the template file, but rather copying the template to a new file (0000-proper-tail-calls.md) and changing that!

@archshift
Copy link
Contributor

archshift commented Feb 8, 2017

I wonder if one can simulate the behavior of computed goto dispatch using these tail calls. That would be pretty neat indeed!

@DemiMarie
Copy link
Author

@archshift There is a better way to do that (get rustc to emit the appropriate LLVM IR for a loop wrapped around a match when told to do so, perhaps by an attribute).

@DemiMarie
Copy link
Author

@archshift done.

@Stebalien
Copy link
Contributor

As a non-FP/non-PL person, it would be really nice to see some concrete examples of where become is nicer than a simple while loop. Personally, I only ever use recursion when I want a stack.

@ranma42
Copy link
Contributor

ranma42 commented Feb 8, 2017

@Stebalien a case where they are typically nicer than a loop is when they are used to encode (the states of a) state machine. That is because instead of explicitly looping and changing the state, it is sufficient to call the appropriate function (i.e. the state is implicitly encoded by the function being run at that time). Note that this often makes it easier for the compiler to detect optimisation opportunities, as in some cases a state can trivially be inlined.

@Stebalien
Copy link
Contributor

Stebalien commented Feb 8, 2017

@ranma42 I see. Usually, I'd just put the state in an enum and use a while match loop but I can see how become with a bunch of individual functions could be cleaner. Thanks!

@sgrif
Copy link
Contributor

sgrif commented Feb 8, 2017

Should this RFC include at least one example of what this syntax looks like in use? (e.g. an entire function body)

@arthurprs
Copy link

A good example snippet would go a long way. 👍 overall as the surface is fairly small and really helps rust functional mojo.

@DemiMarie
Copy link
Author

Pinging @thepowersgang because they are the only person working on an alternative Rust compiler to the best of my knowledge, and because since their compiler (mrustc) compiles via C they would need to implement one of the above solutions.

@burdges
Copy link

burdges commented Apr 25, 2021

As an aside, are there any directives that make LLVM leave as little as possible on the stack? Ideally profiler guided optimizations would attempt to detect where such places belong.

@leonardo-m
Copy link

Elsewhere they are discussing the [[clang::musttail]] attribute (in LLVM 13):

https://releases.llvm.org/13.0.0/tools/clang/docs/ReleaseNotes.html#major-new-features

https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html

https://reviews.llvm.org/D107872

Perhaps we could introduce in Nightly a similar Rust attribute like #[tailcall] that stops the compilation of a function (or group of mutually recursive ones) if rustc can't replace all its recursive calls with loops.

@matthieu-m
Copy link

Elsewhere they are discussing the [[clang::musttail]] attribute (in LLVM 13):
[...]
Perhaps we could introduce in Nightly a similar Rust attribute like #[tailcall] that stops the compilation of a function (or group of mutually recursive ones) if rustc can't replace all its recursive calls with loops.

There is already a keyword reserved for tail-calls become. Is there a reason to use an attribute rather than a keyword when the keyword already exists?

Note that rustc itself is unlikely to perform the replacements, it would typically leave this up to LLVM. What rustc should do, however, is to check the feasibility; typically, such tail calls require that no destructor be executed "around" the call, and it seems from the original article that the function to be called must have the same arguments as the function calling it.

@DemiMarie
Copy link
Author

Note that rustc itself is unlikely to perform the replacements, it would typically leave this up to LLVM. What rustc should do, however, is to check the feasibility; typically, such tail calls require that no destructor be executed "around" the call, and it seems from the original article that the function to be called must have the same arguments as the function calling it.

One advantage of having become be explicit is that it can change when destructors run, so that destructors run before the tail call instead of afterwards.

@timthelion
Copy link

Note that rustc itself is unlikely to perform the replacements, it would typically leave this up to LLVM.

If rustc doesn't do the replacements itself then can it guarantee that functions with become will be TCO'd? It would be kind of nightmarish if you specified TCOing with become and then had the stack explode and had no idea why.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 7, 2021

@timthelion what is meant is that rustc will tell LLVM to guarantee it, since for a function that isn't inlined this out of rustc's hands. The guarantee is still made, however.

@OvermindDL1
Copy link

Should this issue still be closed considering it seems there's at least some movement happening on it? Or has another issue number taken over (and if so could the OP be updated)?

@TradingTomatillo
Copy link

TradingTomatillo commented Jan 1, 2022

Is there anything that can be done/contributed to help move this feature forward, even if it's just on nightly or gated behind some #[cfg(target_feature ...)]?

In this PR to optimise rust-decimal's string parsing the final optimisation involves transforming to a tail-call based parser which cuts runtime by 30-50% (130ns -> 70ns) depending on platform, and would be quite ugly if not impossible to cleanly/efficiently implement with a loop/switch based construct.

We also have some internal code that sees a ~30% improvement from tail-call based dispatch and is fundamentally impossible to transform into a fast loop/switch since the targets aren't known at runtime.

Trusting the optimiser works for our internal code since it's not general purpose, we can track via benchmarks, and have a clean way to partially switch to looping in debug. However, for something like rust-decimal which is intended to be a general purpose library, the tradeoffs here are harder to evaluate if there's no guarantee.

@timthelion
Copy link

@TradingTomatillo I think that rust is open source, so if you want this internally than you can just edit the source code. If you release your patches publicly, then they'll be at worst like linux-rt, a long living patch set that those who need it can apply.

Fork it and fix it, please!

@DemiMarie
Copy link
Author

Time to reopen this?

@ignamartinoli
Copy link

@DemiMarie Oh yea

@DemiMarie
Copy link
Author

So I was thinking some about the calling convention problems, and my current thought is that this is really a problem for the backend code generator. Ideally, any function could be tail-called without restriction, but given that this will be difficult to implement, I think there could be a few steps:

  1. Tail recursion only. This is obviously very limiting, but it should be enough to get some of the tricky cases (such as destructor ordering) correct, and it is easy to lower to a loop or musttail so there is no portability problem.
  2. Tail calls to functions marked as being tail-callable, perhaps via an attribute. This should be enough for parsers, and only requires a local transformation (as opposed to a global one). WebAssembly will be tricky to lower to, but it should be doable for non-public functions.
  3. Fully general tail calls. This requires the most work from the backend, and is what would require callee-pops calling conventions everywhere. It also seems to be much less important than 2.

@programmerjake
Copy link
Member

  1. Tail calls to functions marked as being tail-callable, perhaps via an attribute. This should be enough for parsers, and only requires a local transformation (as opposed to a global one).

If you have an explicit group of functions all in one crate, rustc should be able to relatively easily transform that to a trampoline function that does a switch in a loop, avoiding needing llvm support for tail-calls on all platforms in all scenarios.

something like:

#[tail_call_group = "my_tail_group"]
pub fn func1(state: &mut State) -> io::Result<String> {
    if state.call_func2() {
        become func2(state);
    }
    Ok("example".to_string())
}

#[tail_call_group = "my_tail_group"]
pub fn func2(state: &mut State) -> io::Result<String> {
    let buf = fs::read("file.txt")?;
    if state.call_func1(buf) {
        become func1(state);
    }
    become func3(state);
}

#[tail_call_group = "my_tail_group"]
pub fn func3(state: &mut State) -> io::Result<String> {
    state.in_func3()?;
    become func1(state);
}

The compiler could transform it when necessary into:

enum MyTailGroup<'a> {
    CallFunc1(&'a mut State),
    CallFunc2(&'a mut State),
    CallFunc3(&'a mut State),
    Return(io::Result<String>),
}

impl MyTailGroup<'_> {
    fn run(self) -> io::Result<String> {
        let mut state = self;
        loop {
            state = match state {
                Self::CallFunc1(arg1) => func1_internal(arg1),
                Self::CallFunc2(arg1) => func2_internal(arg1),
                Self::CallFunc3(arg1) => func3_internal(arg1),
                Self::Return(v) => return v,
            };
        }
    }
}

pub fn func1(state: &mut State) -> io::Result<String> {
    MyTailGroup::CallFunc1(state).run()
}

pub fn func2(state: &mut State) -> io::Result<String> {
    MyTailGroup::CallFunc2(state).run()
}

pub fn func3(state: &mut State) -> io::Result<String> {
    MyTailGroup::CallFunc3(state).run()
}

fn func1_internal(state: &mut State) -> MyTailGroup<'_> {
    if state.call_func2() {
        return MyTailGroup::CallFunc2(state);
    }
    MyTailGroup::Return(Ok("example".to_string()))
}

fn func2_internal(state: &mut State) -> io::Result<String> {
    let buf = match fs::read("file.txt") {
        Ok(v) => v,
        Err(v) => return MyTailGroup::Return(Err(v)),
    };
    if state.call_func1(buf) {
        return MyTailGroup::CallFunc1(state);
    }
    return MyTailGroup::CallFunc3(state);
}

fn func3_internal(state: &mut State) -> io::Result<String> {
    match state.in_func3() {
        Ok(_) => {},
        Err(v) => return MyTailGroup::Return(Err(v)),
    }
    return MyTailGroup::CallFunc1(state);
}

@SeniorMars
Copy link

SeniorMars commented Jul 11, 2022

I wanted to share this c23 proposal to make sure it's in this thread: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf. I'm not adding anything constructive to this discussion, but I would like to see rust become more functional in the long run, and I believe that proper tail-call optimization would help rust achieve that goal. Hopefully, we will start to see more progress as I'm very excited about this feature -- thank you!

@Robbepop
Copy link
Contributor

Robbepop commented Jul 15, 2022

@DemiMarie Thank you for revisiting this issue! I too am very keen on getting guaranteed tail calls in Rust. Maybe it would make sense to crop the scope of this issue to just step 1 and once that is done create another issue for extensions:

  • This issue: Allow recursive tail calls using the become keyword as stated above.
  • Follow-up: Allow functions marked as #[may_tail] to be called using the become keyword.

For sake of simplicity let us not focus on the generalized tail calls for now since platform availability is a real concern.

@programmerjake First, thank you for the clarifications with respect to targets that do not support tail calls. I know what follows is pure bike shedding but I think #[may_tail] is to be preferred over a solution using groups such as the above #[tail_call_group = "my_tail_group"] for the following reasons:

  • The Rust compiler can group all #[may_tail] annotated functions with the same return type.
  • Being able to create tail groups is just an optimization for targets that do not support tail calls natively but we should not really focus on performance for those targets but more on the semantics and guarantees of tail calls.
  • If we find that we still want to introduce tail groups we can introduce them later as a feature extension, e.g. #[may_tail(group = "my_group")].

@programmerjake
Copy link
Member

@programmerjake First, thank you for the clarifications with respect to targets that do not support tail calls. I know what follows is pure bike shedding but I think #[may_tail] is to be preferred over a solution using groups such as the above #[tail_call_group = "my_tail_group"] for the following reasons:

  • The Rust compiler can group all #[may_tail] annotated functions with the same return type.

grouping functions across crates is much more complex than within a single crate, since rustc would have to collect all crates and generate code for them all as part of the last invocation of rustc where it can actually see all tail-callable functions. This also makes compilation much slower since codegen can't usually be reused for all your dependencies for each incremental compilation.

that's why I wanted to limit groups to not go between crates. I'd be fine having each crate be a group for initial implementation though, since that avoids cross-crate tail calls while not needing special annotation of which group tail calls are in.

  • Being able to create tail groups is just an optimization for targets that do not support tail calls natively but we should not really focus on performance for those targets but more on the semantics and guarantees of tail calls.
  • If we find that we still want to introduce tail groups we can introduce them later as a feature extension, e.g. #[may_tail(group = "my_group")].

@boggle
Copy link

boggle commented Jul 15, 2022

Within a single crate, cant tail groups be determined by static analysis by looking for connected components in the call graph with the same return type?

@programmerjake
Copy link
Member

Within a single crate, cant tail groups be determined by static analysis by looking for connected components in the call graph with the same return type?

not completely, calls through function pointers/dyn methods can be basically impossible to analyze. ruling those out, yes.

@programmerjake
Copy link
Member

though you would also want to require identical calling convention and argument types, not just return types.

@boggle
Copy link

boggle commented Jul 15, 2022

Fair enough, I've never needed that personally; Therefore I wonder if such an analysis wouldnt provide quite a reasonable default.

@DemiMarie
Copy link
Author

I actually want to support tail calls to functions not known at compile time. That could be a later stage, though.

@Robbepop
Copy link
Contributor

Robbepop commented Jul 15, 2022

In case the WebAssembly target is a blocker for this feature there very recently was progress towards standardization of tail-calls in WebAssembly since the last remaining blocker, a 2nd big browser engine supporting it, is finalizing its implementation. With this out of the way Wasm tail-calls may soon enter the next phase which is standardization of the proposal. The following link provides further information: WebAssembly/tail-call#15 (comment)

I actually want to support tail calls to functions not known at compile time. That could be a later stage, though.

@DemiMarie Sounds like this would require the may_tail information to be part of the function type, right?
For example, to create an array of tail callable functions this information is required to be part of that array's element type, like:

let ops: &[#[may_tail] fn()] = &[ ... ];

In order to be able to call any of those functions in ops using a tail call. Or am I missing something? This way the Rust compiler can guarantee that only tail callable functions are stored inside the ops slice.

@semtexzv
Copy link

As a general approach, we might start from the most basic case and grow from there. This most basic case is a function, which contains no droppable values, and recuresively calls other functions of same arguments. Here, the rustc implementation is only about threading the required information through ast-> hir -> mir -> llvm, and performing necessary checks.

Then we can introduce the capability to run in scopes with droppable values, but requiring the programmer to drop the values explicitly beforehand.

And only after we have some experience with explicitly dropping values, we might introduce the changes to drop order with full implementation.

All of these constraints would be a hard error, and would allow us to get the MVP.

Interaction with async, closures and other features is an unknown area, so explicit tail calls should be disabled in those scopes.

Anyway I had some time yesterday, so I started on the frontend work. Feel free to take it or contribute.
semtexzv/rust

@DemiMarie
Copy link
Author

As a general approach, we might start from the most basic case and grow from there. This most basic case is a function, which contains no droppable values, and recuresively calls other functions of same arguments. Here, the rustc implementation is only about threading the required information through ast-> hir -> mir -> llvm, and performing necessary checks.

Then we can introduce the capability to run in scopes with droppable values, but requiring the programmer to drop the values explicitly beforehand.

And only after we have some experience with explicitly dropping values, we might introduce the changes to drop order with full implementation.

All of these constraints would be a hard error, and would allow us to get the MVP.

Interaction with async, closures and other features is an unknown area, so explicit tail calls should be disabled in those scopes.

Closures should just work. Async is trickier because the generated state machine needs to be aware that the tail call took place, but should be possible to implement.

Anyway I had some time yesterday, so I started on the frontend work. Feel free to take it or contribute. semtexzv/rust

I actually wonder if become of an operator should work. Rust operators do desugar to function calls, after all, so I see no fundamental problem.

@DemiMarie
Copy link
Author

@semtexzv I implemented tail-calls myself in rust-lang/rust@master...DemiMarie:rust:explicit-tailcalls. That implementation is over five years old, so I do not know if it would actually help.

@phi-go phi-go mentioned this pull request Apr 6, 2023
phi-go added a commit to phi-go/rust that referenced this pull request Apr 19, 2023
This commit is a port of the changes done by @semtexzv, as described
in this comment:
rust-lang/rfcs#1888 (comment)

All locations that need further work are marked with:
FIXME(explicit_tail_calls)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.