-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: closure func name changes when inlining #60324
Comments
not only k8s, but for instance leaky go routine tests may fail b/c whitelisting doesn't work correctly anymore. |
Hi Austin, since you are the original author of CL 479095, assigning this bug to you. If you would prefer that I work on this particular cleanup, please assign back to me. Thanks, Than. |
This was noticed before: #55980 |
Is it okay if we just arrange that the |
I don't think I understand the suggestion, but we want the inlining-disabled name to appear in profiles, stack traces, and symbol tables. I don't think changing just runtime.Func will do that. |
For what it's worth, we have the name of the lexically enclosing function at the point of the closure creation, because if the program crashed at that PC we would show that frame in the stack trace marked [inlined]. |
Yes, it's easy to track the original closure symbol name. That's not a problem. What I'm trying to understand is what you want done with that. We currently duplicate the closure and underlying function text, because they can be optimized differently due to different escape analysis and variable capture. So the duplicated closure needs its own linker symbol. |
Does it? I thought the linker addressed all the symbols by table index now, so if we made it a symbol internal to the package being compiled, it would not collide with any others with the same name. |
|
I am interpreting the previous comment as my not having answered your questions, so I will elaborate a bit here. The function names reported in profiles, stack traces, symbol tables, disassembly, and so on should not depend on how aggressively we inline. That is, the names that we get with no inlining at all are the "official" names, and inlining should preserve them. I understand that, in terms of the example at the top, if we do:
that there will be three copies of the actual text symbol generated when f is inlined, and that the texts may actually differ across the 3 calls. But I think the new linker's ability to address symbols by table index instead of name should mean that it's completely fine to have those three text symbols all use the same name: the original name they'd have used without inlining. |
@cherrymui How should the compiler create multiple obj.LSyms that are kept separate but all end up with the same linkname in the symbol tables? |
As background, we originally named closures opaque names like 'func•1', but that was extremely unhelpful in all the places I mentioned where function names appear, not to mention in all tools that didn't support UTF-8. Issue #8291 was to give the closures more useful names, which Dmitri did in CL 3964. Inlining of functions containing closure creation is essentially a regression of that issue. It may seem like they just need a name, any name, but that's not true. The names carry useful information that has been lost since |
Please retract that accusation. The stack trace for your test program is the same in Go 1.19 as in Go 1.20: https://go.dev/play/p/yjP9hrxZ_K0?v=goprev |
It was not an accusation, but I apologize nonetheless. My memory was that one of unified IR's important inlining advances was to enable inlining of closure-containing functions. I misremembered - we started inlining those in Go 1.17, so obviously that must have been with the original IR. In any event, as unified IR has made it possible to do more aggressive inlining, we lose more and more of this information. This is not a criticism of unified IR. My point is only to explain the context of how we got here. All the changes individually make sense, but this is a rough edge where they aren't quite working well enough together. |
Thank you.
Ack. That's what I'm trying to understand: how should this work? As I've said already, we need multiple linker symbols (i.e., obj.LSyms), because in general the inlined closure can get optimized differently than the original closure. You suggest the multiple obj.LSyms should just have identical names in the symbol tables, etc (i.e., use the same linknames). That's not something we do anywhere else in the compiler to my knowledge, so I don't know how to invoke cmd/internal/obj to make that happen. Hence why I asked @cherrymui how to do that. However, I remember she's on vacation, so if any other linker experts can advise on this, that would be great. @golang/compiler |
I think it would be good enough for Go 1.21 if we embed the inlining stack in the name (eliding package names), so for example if F inlines f inlines g inlines h creates a closure, the function would be named F.f.g.h.func1. As long as the same counter is used for all the funcs, it won't matter if that sequences appears again: it would get a different trailing number. So if you had
The four text symbols created in main would be named |
Thanks, that seems doable for 1.21. |
I agree that we should improve the names of inlined closures and the suggestion in #60324 (comment) seems like a reasonable short-term compromise. If that is easy to do at the end of the cycle, great, but I'd push back a bit on this being a release-blocking issue that needs an immediate fix right before the freeze. We know that this is an existing issue that gets more annoying as we are more aggressive with inlining closures, but are we actually that much aggressive than before? CL 479095 states that it increases inlinability by 0.6%. It doesn't say how much it increases inlinability of closures specifically, which is probably the more important metric. But at first glance, it looks like we got unlucky with one specific test rather than us getting sharply more aggressive and causing widespread issues. |
Change https://go.dev/cl/497137 mentions this issue: |
A few other suggestions I want to make, as long as we're discussing it:
The exact numbers used (i.e., for Looking through Google's internal Go code base, there's some code that looks for the current Note: gccgo already uses a completely different naming scheme for anonymous functions. |
not exactly thrilled when closure func names change (again), because in the Gomega go routine leak checker (https://github.com/onsi/gomega/blob/55a33f3ff7b97c9f999d641bb6f92621d1c2edcf/gleak/have_leaked_matcher.go#L44) we need to maintain an ignore list of known background go routines from stdlib and testing. I think that we luckily have avoided hardcoding closure funcs using |
@mdempsky , just to clarify, the I always worry about what assumptions linkers and tools make about what characters are allowed in symbol names. 😅
@thediveo , thanks for pointing that out. If we implement @mdempsky's ideas, it looks like the one way Gomega's matcher will break is that |
I'm concerned about the user experience for non-expert Go programmers. This will add churn to the overall user experience, as every change here does. Do we have evidence that the changes made for Go 1.21 aren't good enough? It seems like a change here is only worthwhile if it is obviously dramatically better or it addresses something in the status quo that is demonstrably not good enough. I don't see either one of those here. The go/ssa names seem strictly worse than what we have today. From the point of view of someone not familiar with the ins and outs of the compiler, it seems guessable that F.func1 is the first func in F. In contrast, F$1 seems like cryptic line noise (or Perl syntax) to me. If the func is assigned to a unique variable g, then F.g seems very clear too. I don't see any reason to introduce a new delimiter $ like in F$g. I also don't think I understand what [X] means but The transitive inlining "breadcrumbs" within a package seem important to me to preserve as well. They're much more readable than an arbitrary numbering. The choice of delimiter matters too. If p.F.g is inlined into H and from there inlined into J and we want a symbol of the form p.F.g(delim)H(delim)J, then we need a delimiter that suggests "in". @ is pretty good for that: p.F.g at the use in H at the use in J. Are we sure that C toolchains will break if we use @ here? The symbols would not be exported in any way, so I would hope a C linker would leave them alone. If @ is off the table, then > might be good, as it suggests injection; p.F.g>H>K. |
|
One thing to keep in mind is that, I think, the
One thing is that the It would be nice if a delimiter other than Another thing is that the inlining breadcrumbs aren't all that user friendly either, I think. If I didn't know about this thread |
unless you name all your packages pkg, A, B, etc., it usually becomes quite obvious where the package in the name is, more so as it is an import path with github.com, or some other TLD. |
In an ELF object file, the |
ad 1. should work as we would add the new pattern next to the existing one, both coexisting without interference. ad 2. I lived halfway happily with the existing scheme but hadn't an urge for longer symbol names. |
I understand that, but I also noted "The symbols would not be exported in any way, so I would hope a C linker would leave them alone." That is, I think that the vast majority of our symbols are or can be generated as STB_LOCAL, and in particular any closure functions can definitely be set STB_LOCAL. If we use the @ syntax in STB_LOCAL names, will that still affect ELF linkers? |
Technically you're right that an ELF linker shouldn't care about @ in a local symbol name. But it seems like a bad idea to count on such symbols being handled cleanly by the various ELF linkers and by the various ELF tools in general. I would only try that if it were fairly important to use @. None of the @ support is standardized or seriously documented (Sun introduced symbol versioning, and they documented it well, but the @ support is a GNU extension that lets people skip writing actual version definition files). If you really want the @ then an option would be to only use the @ in the debug info, and use a mangled name in the ELF symbol table. |
Sounds good, thanks for the extra details. |
That's not ambiguity, because pkg.A can only be one of those two things.
I agree with Matthew's observation that it would be more helpful to reverse the breadcrumbs. My point was that if we have F inlined into G inlined into H as well as F inlined into J inlined into K, it is much more helpful to name them with some representation that includes (F,G,H) and (F,J,K) than it is to name them F$1 and F$2. |
Ok, but for example debug/gosym doesn't know that and gets them wrong. |
@aarzilli Do you mean just debug/gosym.Sym.BaseName, or other code in debug/gosym too? What do you use BaseName for? |
If we were to make the inlining breadcrumbs unambiguously parseable from the symbol name, then in a traceback, we could print only the "original" name by default, and include the full name if and only if we're showing full PCs in the traceback (I'm thinking as additional information at the end of the line, not replacing the original name). That seems to be the exact case where the actual symbol name may matter. Perhaps we suppress the PC offset if we're only printing the original name to avoid confusion. This is a more refined version of the point I made in #60324 (comment). |
BaseName and ReceiverName. Delve doesn't use debug/gosym, exactly, but it has its own equivalent reimplementation. It uses it to find functions with a partial name match (for example to set a breakpoint). It has never needed to work correctly with closures, so far, but it would be nice to know that it could. Looking around with sourcegraph I can't find many users of debug/gosym, but for example GoRE could be affected by this problem as well. Hard to say if there are other half-reimplementations floating around that are affected. Personally, as a user of go, I agree that |
It occurs to me that this case of inlining is just one example of function multicompilation/multiversioning. In this case, we're compiling multiple versions of the function because they may be optimized differently in different inlining contexts. But this is a technique we've considered for other things, like multiversioning for escape analysis and for access to CPU features that are detected at runtime. In fact, there's another case where we do this already: GC shape stenciling. We had to disambiguate the symbols in that case, too. But in that case we decided to hide the disambiguator in tracebacks and only show the "user" name. I think that was the right choice (though we still show PCs, which might be a mistake 😅). In that case, the disambiguator was pretty meaningless to users, and I could see an argument that the inlining breadcrumbs are more meaningful, but I'm not sure I buy that these cases are so different. |
Aren't generics the same? So maybe it would be neat to include in stack traces also the concrete type for which a generic function got compiled? |
Sorry, that's exactly what I meant by "GC shape stenciling" (I can see how that wouldn't be obvious!) The thing is, the symbol name doesn't include the concrete type for which a generic function got instantiated. What it includes is much more abstract and not particularly helpful to the user. |
Hi just to weigh in on this comment by @mdempsky #60324 (comment).
I agree to the whole comment because I'm writing such tool within the IDE, and it is sometimes not possible to correctly resolve the symbol there. Stack traces in particular have the file name, but this might be transformed (trimmed, or else, e.g. Bazel does that with the default ruleset). Having a marker to identify inlines so it's possible to handle them (maybe skip them) could help. |
Without inlining, the stack shows
main.f.func1
panicking, which is approximately correct: it's the 1st closure inside main.f.With inlining, the stack instead shows
main.main.func1
, because now it's the 1st closure inside main.main due to inlining of f. However, that's a much less useful name, since the closure is still lexically inside main.f. If there is some way to preserve the old name even when inlining, that would be a good idea.This is the source of a Kubernetes test failure using Go 1.21, because a more complicated function was not being inlined in Go 1.20 but is being inlined in Go 1.21 after CL 492017. The test expected to see "newHandler" on the stack when a closure lexically inside newHandler panicked.
If there's something we can do here, we probably should, both to keep the closure names helpful and to keep tests like the Kubernetes one passing.
The text was updated successfully, but these errors were encountered: