-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: slower bit operations (regression in go 1.16.3) #45790
Comments
I can reproduce. Looks like it was introduced from 1.16.2 -> 1.16.3. That makes me think it was probably https://go-review.googlesource.com/c/go/+/305069 The regression remains at tip. Looking at the assembly, 1.16.2
1.16.3
The latter code looks better, so I suspect |
One observation: the ANDQ is not required, it is done by BTSQ itself. Another: BTSQ is probably multiple micro-ops including a write of the CF flag. The upper sequence is probably translated 1-to-1 into micro-ops and will have a smaller micro-op count. |
https://www.agner.org/optimize/optimizing_assembly.pdf states that BTS with a memory operand should be avoided on Intel. So moving the result into a register and writing it might be faster. |
@ulikunitz This is only true if the destination operand is not memory. From the instruction reference manual, page 3-11:
The lack of masking was the cause of the issue which prompted the fix.
Given this, I agree with @randall77:
|
Agree with both points, I have been wrong regarding the AND. |
Change https://golang.org/cl/318149 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
We are upgrading from Go 1.15.8 to Go 1.16.3 and found a noticeable benchmark regression. Here is a bit set setting i-th bit
Here is the benchmark result comparing 1.15.8 and 1.16.3.
I bisected it, the benchmark regressions started at 96139f2
What did you expect to see?
Less performance regression.
What did you see instead?
+76.54% performance regression.
The text was updated successfully, but these errors were encountered: