Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: Dgeev/Circulant100-8 sec/op 1.1% regression, 1.1%point change between 3d92205 and da6042e #56532

Closed
mknyszek opened this issue Nov 2, 2022 · 3 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge Performance
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Nov 2, 2022

Regression range: https://go.googlesource.com/go/+log/3d92205ef5ed42147376d929e0f59c765974e345..da6042e82eb7fe5ba40a5c17959a31e19a44c3e8

Next step is to bisect.

@mknyszek mknyszek added Performance compiler/runtime Issues related to the Go compiler and/or runtime. labels Nov 2, 2022
@mknyszek mknyszek added this to the Go1.20 milestone Nov 2, 2022
@randall77
Copy link
Contributor

Bisected to https://go-review.googlesource.com/c/go/+/440035

This looks kind of unfortunate, but not sure there's anything to fix. The inner loop in this code is right on the edge of fitting in registers and tweaks from this CL cause regalloc to not find as good a solution as before.

Previously we kept 3-operand LEAs around to the final assembly code generation step, then broke them into two 2-operand LEAs. For instance what was originally a LEAQ 2(AX)(R13*1), R13 gets split into

      10ms       10ms    11fb1cc: LEAQ 0(AX)(R13*1), R13
      10ms       10ms    11fb1d0: LEAQ 0x2(R13), R13

Because we do the breaking very late (and after regalloc), the two parts are always adjacent. The intermediate value isn't seen by the register allocator.

After the CL, we break these LEAs up earlier. As a consequence, they can get scheduled farther away from each other. That's generally a good thing, but in this case it tweaks the register allocator just enough that a value that was in a register throughout the inner loop (the thing in AX) now gets spilled outside the loop and restored at a few places inside the loop.

In unrelated news, I did notice that in the following CL (440036) the addressingmodes pass in now not quite right after this CL, as it matches against the X versions of shifts. Maybe something to fix, but should only affect v3 builds so it isn't this issue. @wdvxdr1123

@randall77 randall77 self-assigned this Jan 11, 2023
@randall77
Copy link
Contributor

I take that back about the addressingmodes issue, we don't generate the X versions in the regular rules any more, but we do generate the Xload versions, which is what addressingmodes cares about.

@randall77
Copy link
Contributor

Closing, as I don't think there's anything actionable here.

@golang golang locked and limited conversation to collaborators Jan 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge Performance
Projects
None yet
Development

No branches or pull requests

4 participants