-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: arm comparisons with 0 incorrect on overflow #39303
Comments
Should this block the upcoming beta? |
@andybons No. This is a pretty esoteric corner case. It's also been broken for a while (at least 1.12?). |
Thanks! |
@shawn-xdji |
Thanks @randall77 @andybons, I'm working on it. |
Can we simply delete the problematic rules? By carefully designing to save an add or sub instruction, introducing rarely used MI and PL flags and continuous branch jumps will not bring significant performance improvements, but will make the code look complicated and difficult to maintain, It is also not conducive to further optimization. |
That's the crux of the question. We need to do that experiment. Assertions one way or the other without data aren't convincing.
I'm not sure what you mean by this. Not conducive how? What further optimization? |
To the best of my knowledge, we may benefit from:
|
This is a test did on an AWS linux/arm64 cloud server with patch set1 and patch set10 of https://go-review.googlesource.com/c/go/+/233097 Benchmarks in the above CL:
Benchmarks in go1:
At least I didn't see obvious performance improvement on this machine, of course, it is not impossible to have performance gain on other machines, as shown in the commit message of the CL, but we can also notice that it does not have performance improvements on all machines.
I don’t know what further optimizations can be done, but if there is an optimization opportunity in the future involving code like if y-c > 0, I would prefer to process the instruction sequence of sub / add + cmp + bgt instead of cmp + bmi + beq. |
I'm also curious as to what's happening here. The commit message for CL 223097 mentions different machines, but not really at the level of detail that would tease out why some machines do better and some don't. I'm happy if some machines do better and the others at least do no worse.
Any optimization would see cmp + bgtnoov. The split into bmi+beq happens late, after all optimizations are done. In any case, arguing against an optimization now so that we can do some future optimization (which may or may not happen) doesn't seem productive. We can always back out this optimization if future optimizations require that. (And this issue is about fixing a bug, not really optimizations per se. I guess there is just a more performant and less performant way to fix it.) |
Yes, I agree with you. Thanks. |
Change https://golang.org/cl/236637 mentions this issue: |
Enclose a follow-up to toolstash-check reports, no real issue was identified, just serving as a memo. The following function in 'runtime/sema.go' was lowered to CMP+BLT previously, thus vulnerable to the wrong branching bug, it's used by 'sync.(*Cond).Wait' and affects goroutine's wakeup (seemingly, please correct me if I'm making any mistake). I checked unit tests and benchmarks of the 'sync' package, and it turns out both 'a' and 'b', indicating ticket number of a notification, are not large enough to trigger the issue in those cases. I guess the runtime must have been paralyzed before it reaches that many notifications, so there should be no weird scheduling issues caused by this bug in the past. It's specific to 32bit arm, arm64 doesn't rewriting 'x-y' comparing to 0.
|
Change https://golang.org/cl/237860 mentions this issue: |
Change https://golang.org/cl/238677 mentions this issue: |
Flag constant Ops on arm and arm64 are under refactoring, this change adds a couple of testcases that verify the behavior of 'noov' branches. Updates #39505 Updates #38740 Updates #39303 Change-Id: I493344b52276900cd296c32da494d72932dfc9be Reviewed-on: https://go-review.googlesource.com/c/go/+/238677 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Comparisons like
if x+10 < 0
branch incorrectly if thex+10
overflows.This was found and fixed for arm64, see #38740, CL 233097. The test in that CL also fails on arm, so the same fix needs to happen for arm.
The text was updated successfully, but these errors were encountered: