New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: unexpected difference in compiled code for returned array #31198
Comments
Probably a dup of #24416. |
I think this is caused by
Most of the store forwarding rules have special cases to allow them to work despite the presence of In general this is all quite fragile, sorry. These rules were added before the |
The last example (the "interesting case") is actually a regression from commit 8607b2e: Before: xorps X0, X0
movups X0, "".~r0+8(SP)
movups X0, "".~r0+8(SP)
movb $1, "".~r0+8(SP)
ret After: subq $24, SP
movq BP, 16(SP)
leaq 16(SP), BP
xorps X0, X0
movups X0, "".~r0+32(SP)
movups X0, "".x(SP)
movb $1, "".x(SP)
movups "".x(SP), X0
movups X0, "".~r0+32(SP)
movq 16(SP), BP
addq $24, SP
ret Should I open a separate issue for this? |
Thanks, that makes sense, any thoughts @josharian? I guess the rule added in that commit adds a 'dead end' for the optimization somehow.
No, this issue is fine. |
I think what is happening is that rule is interfering with dead auto elimination. Previously, we had 'x -> y -> ret', and the x and y were recognized as dead-autos. This rule rewrites 'x -> y -> ret' into 'x -> ret', ironically in the hopes that y can be dead-auto-eliminated. This interferes with eliminating x. I'm not sure exactly why yet; maybe dead auto elimination relies on your store forwarding rules that aren't triggering in order to be effective? From spending a few minutes on this now I suspect that the fix won't be straightforward/easy. So part of the question is whether this is a rare case or not, and thus whether it is worth spending the time to restore this optimization. |
The challenge here is that it is hard to tell at the moment that that rule is being applied whether 'x' can be eliminated or not. I don't see a good approach here, although I'd be open to suggestions. On balance, the optimization does appear to help more than it hurts. One question:
Is it worth it (and sound!) to allow these rules to count the number of non-LocalAddr uses of the VarDef memory state? The only important thing for LocalAddr's memory arg is that it be correctly ordered in the memory chain for liveness purposes; we should be able to preserve that property. All in all, I'm inclined to wait for Keith's work on SSAing all structs and arrays. That should handle this and much more. |
Not sure, sounds unpleasant in SSA. I'd rather we found a way to make these sorts of annotations less intrusive when optimizing. Not that I have any ideas how to do that...
Yes, that sounds interesting. Avoiding touching memory entirely, at least early on in the compilation pipeline, seems like the nicest approach to take for these sorts of issues. |
It's possible that we could fix this by adding arch-specific variants of these generic rules:
We'd be able to detect writing any set of duplicate bits twice (not just zeros), although we'd have to write out all the size-specific variants. During lowering, the LocalAddrs become LEAQs (on amd64) and drop out of the memory chain. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What did you do?
I compiled this code:
What did you expect to see?
I expected the generated assembly code to be similar to this program's code:
What did you see instead?
Instead, the generated code is much more bloated than I anticipated, even involving stack movement.
As pointed out by @vcabbage, this issue is probably related to the assignment to "x", given that the compiler produces better code for the following program:
Another interesting case is the one below, where the "good" case above becomes as bad as the "bad" one:
The text was updated successfully, but these errors were encountered: