-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: assembly generated is bigger than previous versions #30229
Comments
Is this about performance, binary size, or just correctness? |
In terms of correctness, I believe both versions are correct in the sense that they produce the desired behavior. |
Fair enough - was just wondering to appropriately label the issue :) If you'd like to help get this issue fixed faster, you could try bisecting which Go commit introduced the regression. For example, you could bisect between the |
After following your advice, I found that the commit that started generating this code was 837ed98, which makes sense. |
/cc @aclements @randall77 @dr2chase; please see the comments above. |
Looks like the phi tighten pass introduces the duplicate xors. If I turn that pass off, then the register allocator also introduces duplicate xors. I can't turn off the register allocator :( The reason for both of those behaviors is that we're trying to avoid excess register pressure by loading constants into registers as late as possible. For an example reason why, see #16407. It sounds like we're being a bit too aggressive in that regard, but I'm not sure how to design a knob to adjust it, or how to decide where to set it. |
Fixed in Go 1.20 |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. I tried on tip and the same happens.
What did you do?
I compiled the following code and inspected the assembly instructions generated by the compiler:
Assembly code generated:
What did you expect to see?
I expected the compiler to generate code similar to the 1.10.1 version, because on tip it generates unnecessary jumps and an extra block of XOR's.
What did you see instead?
Instead, the compiler generated more code than what is necessary.
In the 1.10.1 version, the only thing that I think could be different is that, on line 5, the slice address is moved to CX, but it might not be necessary in the case that len(slc) is 0, which is well handled on tip.
Summing up, I believe the code should look something like:
Regarding the test_pc21 block, it could disappear, as is done in the 1.10.1 version.
The text was updated successfully, but these errors were encountered: