-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: fluent interfaces doing new allocations in go 1.19 #57434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just did the obvious experiment, and it is not unified IR. Still slow:
|
Good find. |
It looks like a bug in escape analysis.
|
It's possible this was already fixed? A redditor cannot repro in latest 1.20-
It occurs to me that when I was asked about "latest release" in the PR template, they might have meant this version and not 1.19.4- my apologies. |
Quite the opposite. Unified IR actually fixes it. At tip, unified IR doesn't allocate, non-unified allocates. The IR between the two looks slightly different. Non-unified generates an autotmp for the result of the first call, which is the The escape analysis definitely looks weird. |
|
This is introduced in https://go-review.googlesource.com/c/go/+/392834, and Unified IR also has this problem until https://go-review.googlesource.com/c/go/+/421821.
|
Change https://go.dev/cl/459295 mentions this issue: |
@dr2chase @cherrymui Is performance regression legal for backport? |
What do you mean by a "non-Name call"? A call to a func value that is not a named function? In this case it doesn't have a func value. Or the method selection expression is one? That would be usually considered as a named function call, though. I'm not sure I understand how this connects with escape analysis. |
I think a significant performance regression would qualify a backport. But I'm not sure this one is significant. It would need to see an impact on real code, instead of synthetic benchmarks. |
"non-Name call" is a function call where the call expression is not a I think if the call have no side-effect argument, then we should not apply
|
Sorry, I must still have missed something. When would a call expression be an ONAME node? I'd think a call expression would be an OCALL or OCALLXXX node. Or you mean the function being called? Back off a level. The AST is not malformed. But the escape analysis behave weirdly, at least it looks like so. I'd think we want to understand why the escape analysis doesn't work as expected. Is it fundamentally not able to handle the current AST? (Then changing the AST would probably be the right answer.) Or it is something else? Thanks.
That looks like a regression from 1.19 to 1.20, though. We may want to address it before 1.20 release. But no backport involved. |
Sorry, I mean the callee expression. In
Yeah, I am talking at AST level. (*Object).Initialize(obj) is the callee expression of the call expression (*Object).Initialize(obj).Update()
I think it's reasonable for escape analysis to behave like that, because we now introduce new temp var for (*Object).Initialize(obj) call. Before CL 392834, escape analysis would see (*Object).Initialize(obj).Update() call. But now it would see:
We have a special case for ONEW inside RewriteNonNameCall, which will help a better escape analysis if we not rewriting at all. |
Sorry, I must still have missed something... If by "callee expression" you mean the function being called, in this case it would be
I don't think so, sorry. In fact, if I rewrite the source code of
it actually does not allocate. Also not allocate with
|
It is! In case of
Hmm, sounds like there's actual mismatch between emitting autotmp vs hand written one. In case of hand written one, there's a
So you're right that fixing the AST (emitting the ODCL for autotmp) will fix the problem. |
I still think this is incorrect behavior by escape analysis, but it is taking me a while to figure out exactly where it goes wrong. I've been stepping through the compiler in a debugger since yesterday (as time/focused-attention permit). But thanks for all the investigation, this will tell me where to be especially attentive. Also, everything gets inlined. This is in code that no longer contains real calls, though there are INLCALLs. Here's the pruned test case:
|
Yeah, but all discussion above happens during old typecheck pass, so before any inlining can happen. |
The escape analysis runs on inlined code. That's where the decision is made wrong, in my opinion. If this AST is actually malformed, we need some way of rejecting it. Are autotmps required to have ODCL nodes, or would it just solve this problem for us? I just now went looking at uses of |
No, autotmps do not require to ave ODCL node, but emitting
What do you mean "50"? Is this the diff in CL 459295 or something else? If you refer to the diff, you may want to look at latest patchset, which is just one line change. |
The problem is loopdepth, but I think this is a bug in escape analysis -- anytime there is an autotmp used in a loop, its loopdepth will be wrong unless it is declared, and so far, none of them are declared. "50" refers to places in the compiler source code where we create temporaries; any of these with a pointer type is conceivably a source of such an escape analysis hiccup. And, also, adding all these declarations for temporaries where none existed before might cause other problems. I think the heuristic to try is that the loop level of an autotmp is the minimum level at which it is assigned; that is where it would have been declared. |
I'm AFK, but there's a place in compiler where we emit ODCL for autotmp, I cant find it offhand, though. |
Hmm, I think it can apply not just for autotmp, but any local ONAME that appear in the LHS of the assignment. If |
ONAMEs can have assignments at any depth. It's important to record the least-nested visibility because that is a form of escape (many-to-one). For autotmps, we do seem to declare them, and we do seem to declare them at varying depth (i.e., I logged this stuff complete with stack traces) and it "seems" to get them right for 1.20. I think for 1.19, this is just a minor performance regression and we live with it. Arguably, as the keeper of many benchmarks, I should add this microbenchmark, since it would be nice not to regress again. |
Sorry for the late reply. I agree with @dr2chase that targeting the escape analysis is the right thing to do here. And the heuristic of using the minimum level where an autotmp is assigned is reasonable to me (unless it is otherwise declared). |
Agree, but if we're going to backport, then emitting ODCL seems safer. Otherwise, we can fix escape analysis in 1.21 cycle. |
Change https://go.dev/cl/459496 mentions this issue: |
Besides the escape analysis change, I think the escape analysis diagnostics also needs some improvement. If it is the assignment |
With
|
Updates #57434 Change-Id: Ib90c228f95c3d61204e60f63d7de55884d839e05 Reviewed-on: https://go-review.googlesource.com/c/go/+/459496 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Ran the following benchmarks:
What did you expect to see?
I expected neither benchmark to perform any allocations.
What did you see instead?
BenchmarkAllocs
allocated obj onto the heap, butBenchmarkNoAllocs
does not:However, it does not do this in 1.18.9:
The text was updated successfully, but these errors were encountered: