-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: cmd/compile: add a go:inline directive #21536
Comments
This proposal has basically no chance of being accepted. Go eschews knobs. Compiler directives are widely disliked. I don't think it can be implemented reasonably, if at all. Functions are not eligible for inlining for many reasons. These include conceptual implementation difficulty: I have no idea how to inline a function containing defer. And practical implementation difficulty: It took a dedicated person a summer to get non-leaf inlining working correctly, even with part of the implementation already written. And it's still not released because of the binary size and compilation time aspects. Supporting such a directive would mean solving a host of such problems. I personally have long wanted a much smaller directive that causes compilation to fail if a function is deemed ineligible for inlining. This would allow me to refactor without fear that future inlining changes will cause my no-op inlining to become a silent performance regression. Even this much more limited proposal has been rejected forcefully. All that said, if you'd like to work on improving inlining, that'd be fabulous. There are many opportunities to improve inlining decisions, to improve how we do inlining (e.g. partial inlining), and to expand the set of functions that can be unlined. |
@josharian The flag |
@neelance yes, or alternatively parse the output of |
@josharian I already suspected that you want it for the stdlib. May I ask why you didn't build it yet? I get the point of not adding such a feature to the Go compiler itself, but a standalone tool doesn't do any harm if only used by individuals voluntarily. I just cringe when I hear "fear of refactoring". ;-) |
@neelance just haven't gotten to it yet; I have little contribution time at the moment and lots I want to do. I'd be delighted if someone else wanted to jump in and do this. The recently introduce runtime function |
Change https://golang.org/cl/57410 mentions this issue: |
For the reasons Josh outlined above, we're not going to do this. Note also that even in C/C++, the "inline" directive does not really mean "this must be inlined". It is more a guideline than a rule. |
The intent is to allow more aggressive refactoring in the runtime without silent performance changes. The test would be useful for many functions. I've seeded it with the runtime functions tophash and add; it will grow organically (or wither!) from here. Updates #21536 and #17566 Change-Id: Ib26d9cfd395e7a8844150224da0856add7bedc42 Reviewed-on: https://go-review.googlesource.com/57410 Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Any reason why inlining is more difficult in go than other languages like java or c++. I was trying to find the rationale behind no branches / 40 expressions or less restrictions for inlined functions. While trying to write performant code (dynamic proto3 marshaling), hand inlining a function brought down marshal time from 5ms to 4ms. That can be significant depending on your performance goals. |
@mandarjog A closed issue is not the place for this discussion. General discussions about inlining in Go should use a forum; see https://golang.org/wiki/Questions . Specific discussions about the gc toolchain should take place on the golang-dev mailing list. |
I'm going to re-open this for further, given the recent discussion at #17566. In an attempt to migrate discussion to here, to keep the threads distinct, I will also paste in some other folks' comments from there. |
|
Agreed. Any request for inlining will fundamentally need to be a request, not a promise. I imagine the initial implementation would be a relaxation or removal of the budget, but not the other constraints. For callsite inlining requests, does that propagate to only the immediate function being called (possibly using mid-stack inlining), or to as much of the call tree from that point onwards as is possible? What (if anything) does this do in the case of recursive calls which can be partially mid-stack inlined? There is discussion of this case in a few scattered issues and CLs, because it turns out to be important in a few places in the runtime. |
My rationale for asking for this is at #17566 (comment). In particular, an argument that I believe does not get weaker with better (but not perfect) heuristics is that having the inliner decisions be opaque and subtle means that unrelated changes by other developers can significantly impact performance, and the only way to get robustness is currently writing the caller in assembly. My proposal, instead of
Like It's per-function instead of per-callsite, but I never needed the latter myself. |
A totally different option is to have a “flatten” directive that forces a function to be a leaf, that is all calls within the function body are recursively inlined. This is usually simpler to reason about because people usually know which functions should go fast, while it’s sometimes harder to foresee whether a function should always be inlined or not. Moreover the flatten directive doesn’t propagate transitively, so a module author that misuses flatten shouldn’t cause exponential compile time behavior like a force-inline directive might cause. |
A small collection of examples where this would have been useful. https://speakerdeck.com/gtank/micro-optimizing-go-code?slide=21 I also had half a dozen people complain about it to me or to my immediate circle in the last year or so. I'll ask them to chime in with examples. |
I've had my share of fights with the inliner. Some representative examples are in some private code: a column-oriented database which naturally has some very important tight loops.
In all these cases, it would be very helpful to have an assertion that can simply fail compilation if something cannot be inlined. Even without any fundamentally new inlining capabilities, it would greatly increase my confidence in making code adjustments and changing Go versions if I could tell the compiler my inlining wishes rather than just leaving scary comments for programmers. |
A function call isn't a great fit for this use case. What should happen here? func f() {
if false {
runtime.Inline()
}
} What if it was
Please do. They are helpful not only in making the case, but in seeing opportunities for immediate improvement. George's problematic inlining case turned out to be a compiler bug around inlining intrinsics.
Someone or other I proposed
Very interesting. There are some tricky cases that I don't quite see how to express with this directive, though. Consider: func f() {
for {
if rare {
impossibleToInline()
}
definitelyNeedsToBeInlined()
}
} You can't flatten I suppose you could also fix this by tweaks the semantics of There's another problem with flattening across package lines: you can't request recursive inlining of a function that has already been compiled without recursive inlining--it breaks package-at-a-time compilation. I suppose you could specify that flatten only works within a package. That might suffice, and doing better inter-package inlining might take up the slack. But it is another complication. |
It would be neat to find a way to enable something like
but I can't think of anything that doesn't involve complex closure edge-cases.
I think his benchmarks are still useful to show the performance impact, even if they are not meaningful to discuss the heuristic. My point is that any imperfect heuristic is a balancing game and a ticking bomb. As soon as someone comes along and changes something innocent, performance might drop. (We should still fix bugs in the heuristic, of course.) |
One example of a function that could really use some sort of inlining guarantee I have is the If we did the closure technique then I don't think we should use the func quarterRound(a, b, c, d uint32) (uint32, uint32, uint32, uint32) {
try.Inline(func() {
a += b
d ^= a
...
b = (b << 7) | (b >> 25)
})()
return a, b, c, d
} Still a bit ugly and hard to read though. |
I'll notice that this is one of those cases where the goal of the author is better described by simply saying: func (s *Cipher) XORKeyStream(dst, src []byte) {
compiler.Flatten()
[...]
} as the idea is that |
I wonder to what extent we can separate the “what” from the “why”.
The thing that you need to communicate to the compiler is “this function is called frequently enough that it's worth spending the extra compile time and binary size to optimize heavily”. If you can't express that through profile-guided optimization, then perhaps an annotation like |
|
Anyway, this is a nice idea. It's a good way to dip our toes in the FGO water (and ultimately perhaps decide to ignore the hint in favor of actual FGO?) and it is not hard to imagine finding other places in the compiler where we could choose to spend a few extra cycles on such functions. |
For what it's worth, GCC supports both GCC also supports function attributes |
Re #17566, package-runtime-limited annotations are fine to experiment with, without a proposal. We have plenty of those already. But for the broader space of all Go packages, micromanaging optimizations at the source code level like Closing again. |
A few people pointed out that I didn't respond directly to the various examples of automatic inlining not being optimal for various performance results. To address that explicitly, the answer needs to be to improve the inlining rules. Look at what the compiler does, understand why it's wrong, and fix it. An |
@rsc I did not reopen the conversation because I believe the issue is automatic inlining not being optimal. What I tried to communicate at #21536 (comment) and #17566 (comment) is that:
I believe 2. in particular is not in line with Go's approach to team collaboration and explicit behavior, and while annotations are also not in line with Go's approach, so is assembly (3.). I'd love to be wrong about 1. but no one from the compiler team has expressed confidence that I am. |
I tend to agree with your first point, it's going to be hard to make inlining happen at all the points which are performance critical, and not blow up the binary by inlining too much everywhere else. So I think we're going to need to be somewhat conservative in our rules. I'm ok with that. I'm a fan of simplicity even if it costs a little bit of performance. I don't really agree with your second point. If you're modifying the code, you should expect possible performance changes, and you should run benchmarks as part of validating those changes. Inlining is only one of many ways in which you could break the performance characteristics of a function by modifying its code. |
I suspect a heuristic could be good enough, but we have not put adequate effort into finding such a heuristic, and are proceeding only gradually. I did an experiment last night where I exposed all the inliner knobs and turned them up well past 11%. I'm working on the spreadsheet and how to share it but TLDR is:
Ridiculous inlining tends to tickle build bugs in the runtime, which limited how far I could take this. |
I suspect the compiler is much better than I am at deciding what to inline roughly 95% of the time. It's true that you should probably be benchmarking performance-sensitive code when changing it, but it's still useful to get more direct feedback on what specifically changed -- if a change is part of a larger change, it can be very non-obvious that the actual substantive performance change is a single function no longer being inlined. (There's also the risk that the change actually comes about because of a new compiler, or better yet, a new compiler and a code change, where either alone doesn't change the behavior.) I do note that "inline", and "register" for that matter, can have additional compiler-hinting semantics that are sometimes non-obvious, like "also this implies you can't take the thing's address". I am not sure whether I like that or not -- I dislike the additional complexity, but I see the appeal of being able to give the compiler hints about a thing's behavior or assumptions. I do sort of like an annotation for "be sure to let me know if this can't be inlined", but that's something I think we can already do with existing tools and a bit of shell scripting. |
You are claiming that, without This is untrue. At most, people must copy and paste Go code. No assembly required. A little copying is better than a new, subtle source code annotation. |
A few other, much more minor points:
It depends on what "good enough" means. For one person, today's compiler might be "good enough". For someone else, no compiler would ever be "good enough". That needs to be defined better. And without filed bugs for the specific cases that aren't good enough today, how would we know? The first step has to be filing those bugs.
The performance cliff you are describing is true of almost every compiler optimization, but most notably escape analysis. We avoid making escape analysis "opaque" by giving developers tools to understand the compiler's decisions, specifically
As Keith pointed out, tests and benchmarks exist precisely to enable this kind of coordination. It is not some impossibility but instead a reality of all our development.
Regarding annotations more generally, there is a significant learning, maintenance, and semantic cost to annotations and they are historically avoided for this kind of thing in Go. Unlike modern C, Go is not meant to be a replacement for an assembler in all cases. If you really need very precise control over details of code generations, assembly can be an entirely appropriate answer. And that is OK! But again the proper phrasing of (3) would be "forcing users to copy and paste code". And that makes the argument for micromanagement of the compiler's decision here even weaker. |
The compiler (gc) has come a long way, however the inliner is sometimes lacking, for example any function with a
for-loop
no matter how simple it is, won't get inlined so we either have to use agoto
loop instead (which is used few times in the runtime) or just have a one big ugly function.My proposal is to add a
go:inline
directive that will force inlining, even if it's just for non-exported functions.This will help make a lot of code much cleaner and allows separation of big functions that can't be currently inlined.
If the proposal is accepted, I'd like to work on it.Another semi-common example:
Related: #17566 #14768
@dr2chase @josharian @randall77 @mvdan
The text was updated successfully, but these errors were encountered: