Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: arm comparisons with 0 incorrect on overflow #39303

Closed
randall77 opened this issue May 28, 2020 · 15 comments
Closed

cmd/compile: arm comparisons with 0 incorrect on overflow #39303

randall77 opened this issue May 28, 2020 · 15 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Milestone

Comments

@randall77
Copy link
Contributor

randall77 commented May 28, 2020

Comparisons like if x+10 < 0 branch incorrectly if the x+10 overflows.
This was found and fixed for arm64, see #38740, CL 233097. The test in that CL also fails on arm, so the same fix needs to happen for arm.

--- FAIL: TestCondRewrite (0.01s)
    --- FAIL: TestCondRewrite/AddConst32 (0.00s)
        rewriteCond_test.go:120: '0x7ffffffd + 11 < 0' failed
        rewriteCond_test.go:125: '0x7ffffffd + 13 <= 0' failed
        rewriteCond_test.go:130: '-0x7fffffff - 11 > 0' failed
        rewriteCond_test.go:135: '-0x7fffffff - 13 >= 0' failed
        rewriteCond_test.go:139: '0x7ffffffd + 19 > 0' failed
        rewriteCond_test.go:143: '0x7ffffffd + 23 >= 0' failed
        rewriteCond_test.go:147: '-0x7fffffff - 19 < 0' failed
        rewriteCond_test.go:151: '-0x7fffffff - 23 <= 0' failed
    --- FAIL: TestCondRewrite/AddVar32 (0.00s)
        rewriteCond_test.go:198: '0x7ffffffd + 0xb < 0' failed
        rewriteCond_test.go:203: '0x7ffffffd + 0xb <= 0' failed
        rewriteCond_test.go:208: '-0x7fffffff + -0xb > 0' failed
        rewriteCond_test.go:213: '-0x7fffffff + -0xb >= 0' failed
        rewriteCond_test.go:217: '0x7ffffffd + 0xb > 0' failed
        rewriteCond_test.go:221: '0x7ffffffd + 0xb >= 0' failed
        rewriteCond_test.go:225: '-0x7fffffff + -0xb < 0' failed
        rewriteCond_test.go:229: '-0x7fffffff + -0xb <= 0' failed
    --- FAIL: TestCondRewrite/MAddVar32 (0.00s)
        rewriteCond_test.go:276: '0x7ffffffd + 0xb*1 < 0' failed
        rewriteCond_test.go:281: '0x7ffffffd + 0xb*1 <= 0' failed
        rewriteCond_test.go:286: '-0x7fffffff + -0xb*1 > 0' failed
        rewriteCond_test.go:291: '-0x7fffffff + -0xb*1 >= 0' failed
        rewriteCond_test.go:295: '0x7ffffffd + 0xb*1 > 0' failed
        rewriteCond_test.go:299: '0x7ffffffd + 0xb*1 >= 0' failed
        rewriteCond_test.go:303: '-0x7fffffff + -0xb*1 < 0' failed
        rewriteCond_test.go:307: '-0x7fffffff + -0xb*1 <= 0' failed
    --- FAIL: TestCondRewrite/MSubVar32 (0.00s)
        rewriteCond_test.go:363: '0x7ffffffd - -0xb*1 < 0' failed
        rewriteCond_test.go:368: '0x7ffffffd - -0xb*1 <= 0' failed
        rewriteCond_test.go:373: '-0x7fffffff - 0xb*1 > 0' failed
        rewriteCond_test.go:378: '-0x7fffffff - 0xb*1 >= 0' failed
        rewriteCond_test.go:382: '0x7ffffffd - -0xb*1 > 0' failed
        rewriteCond_test.go:386: '0x7ffffffd - -0xb*1 >= 0' failed
        rewriteCond_test.go:390: '-0x7fffffff - 0xb*1 < 0' failed
        rewriteCond_test.go:394: '-0x7fffffff - 0xb*1 <= 0' failed
@randall77 randall77 added this to the Go1.15 milestone May 28, 2020
@andybons
Copy link
Member

Should this block the upcoming beta?

@andybons andybons added the NeedsFix The path to resolution is known, but the work has not been done. label May 28, 2020
@randall77
Copy link
Contributor Author

@andybons No. This is a pretty esoteric corner case. It's also been broken for a while (at least 1.12?).

@andybons andybons added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label May 28, 2020
@andybons
Copy link
Member

Thanks!

@randall77
Copy link
Contributor Author

@shawn-xdji

@shawndx
Copy link
Contributor

shawndx commented May 29, 2020

Thanks @randall77 @andybons, I'm working on it.

@erifan
Copy link

erifan commented May 29, 2020

Can we simply delete the problematic rules? By carefully designing to save an add or sub instruction, introducing rarely used MI and PL flags and continuous branch jumps will not bring significant performance improvements, but will make the code look complicated and difficult to maintain, It is also not conducive to further optimization.

@randall77
Copy link
Contributor Author

will not bring significant performance improvements

That's the crux of the question. We need to do that experiment. Assertions one way or the other without data aren't convincing.
If the rules don't make anything faster (or smaller) then yes, they can just be deleted.

It is also not conducive to further optimization

I'm not sure what you mean by this. Not conducive how? What further optimization?

@shawndx
Copy link
Contributor

shawndx commented May 29, 2020

To the best of my knowledge, we may benefit from:

  1. saving one register if the expression is of 'add' or 'sub' originally.
  2. 64-bit madd/msub costs 2 more cycles and achieves one third throughput, compared with mul (roughly).

@erifan
Copy link

erifan commented May 29, 2020

That's the crux of the question. We need to do that experiment. Assertions one way or the other without data aren't convincing.

This is a test did on an AWS linux/arm64 cloud server with patch set1 and patch set10 of https://go-review.googlesource.com/c/go/+/233097

Benchmarks in the above CL:

benchstat patchset10 patchset1
name                     old time/op  new time/op  delta
CondRewrite/SoloJump-48  7.61ns ± 0%  7.62ns ± 0%  +0.13%  (p=0.000 n=10+10)
CondRewrite/CombJump-48  7.65ns ± 0%  7.69ns ± 0%  +0.52%  (p=0.000 n=10+10)

Benchmarks in go1:

benchstat patchset10 patchset1
name                      old time/op    new time/op    delta
BinaryTree17-48              3.05s ± 0%     3.05s ± 1%    ~     (p=0.400 n=9+10)
Fannkuch11-48                2.72s ± 0%     2.73s ± 1%  +0.35%  (p=0.043 n=9+10)
FmtFprintfEmpty-48          48.9ns ± 1%    49.0ns ± 1%    ~     (p=0.392 n=10+10)
FmtFprintfString-48         86.7ns ± 0%    86.8ns ± 0%    ~     (p=0.178 n=10+8)
FmtFprintfInt-48            97.4ns ± 0%    97.4ns ± 0%    ~     (p=0.421 n=10+10)
FmtFprintfIntInt-48          160ns ± 0%     160ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt-48     200ns ± 0%     201ns ± 1%    ~     (p=0.261 n=10+10)
FmtFprintfFloat-48           223ns ± 0%     222ns ± 0%    ~     (p=0.656 n=10+10)
FmtManyArgs-48               633ns ± 0%     633ns ± 0%    ~     (p=0.192 n=10+10)
GobDecode-48                6.84ms ± 0%    6.84ms ± 0%    ~     (p=0.549 n=9+10)
GobEncode-48                5.31ms ± 0%    5.33ms ± 0%  +0.30%  (p=0.000 n=10+10)
Gzip-48                      257ms ± 0%     255ms ± 0%  -0.78%  (p=0.000 n=10+10)
Gunzip-48                   41.4ms ± 0%    41.4ms ± 0%    ~     (p=0.739 n=10+10)
HTTPClientServer-48         53.6µs ± 3%    53.2µs ± 2%    ~     (p=0.165 n=10+10)
JSONEncode-48               10.9ms ± 0%    10.9ms ± 0%    ~     (p=0.684 n=10+10)
JSONDecode-48               47.2ms ± 0%    47.1ms ± 0%  -0.10%  (p=0.025 n=10+7)
Mandelbrot200-48            3.43ms ± 0%    3.43ms ± 0%    ~     (p=0.684 n=10+10)
GoParse-48                  3.29ms ± 0%    3.29ms ± 0%  +0.24%  (p=0.001 n=10+9)
RegexpMatchEasy0_32-48      78.4ns ± 0%    78.4ns ± 1%    ~     (p=0.797 n=10+9)
RegexpMatchEasy0_1K-48       229ns ± 1%     229ns ± 1%    ~     (p=0.802 n=9+10)
RegexpMatchEasy1_32-48      73.8ns ± 0%    74.0ns ± 1%    ~     (p=0.307 n=9+9)
RegexpMatchEasy1_1K-48       407ns ± 1%     408ns ± 1%    ~     (p=0.772 n=9+10)
RegexpMatchMedium_32-48     6.72ns ± 0%    6.72ns ± 0%    ~     (p=0.440 n=9+10)
RegexpMatchMedium_1K-48     37.3µs ± 0%    38.0µs ± 3%    ~     (p=0.616 n=10+10)
RegexpMatchHard_32-48       2.06µs ± 0%    2.08µs ± 1%    ~     (p=0.111 n=10+10)
RegexpMatchHard_1K-48       62.3µs ± 1%    63.5µs ± 3%  +1.98%  (p=0.023 n=10+10)
Revcomp-48                   500ms ± 3%     506ms ± 3%    ~     (p=0.165 n=10+10)
Template-48                 59.5ms ± 2%    59.3ms ± 1%    ~     (p=0.182 n=10+9)
TimeParse-48                 382ns ± 0%     382ns ± 0%    ~     (p=0.805 n=10+10)
TimeFormat-48                358ns ± 0%     358ns ± 0%  -0.11%  (p=0.046 n=10+8)

name                      old speed      new speed      delta
GobDecode-48               112MB/s ± 0%   112MB/s ± 0%    ~     (p=0.535 n=9+10)
GobEncode-48               144MB/s ± 0%   144MB/s ± 0%  -0.30%  (p=0.000 n=10+10)
Gzip-48                   75.6MB/s ± 0%  76.2MB/s ± 0%  +0.79%  (p=0.000 n=10+10)
Gunzip-48                  469MB/s ± 0%   469MB/s ± 0%    ~     (p=0.782 n=10+10)
JSONEncode-48              179MB/s ± 0%   179MB/s ± 0%    ~     (p=0.754 n=10+10)
JSONDecode-48             41.1MB/s ± 0%  41.2MB/s ± 0%  +0.10%  (p=0.021 n=10+7)
GoParse-48                17.6MB/s ± 0%  17.6MB/s ± 0%  -0.23%  (p=0.003 n=10+9)
RegexpMatchEasy0_32-48     408MB/s ± 0%   408MB/s ± 1%    ~     (p=0.905 n=10+9)
RegexpMatchEasy0_1K-48    4.47GB/s ± 1%  4.47GB/s ± 1%    ~     (p=0.905 n=9+10)
RegexpMatchEasy1_32-48     433MB/s ± 0%   433MB/s ± 1%    ~     (p=0.356 n=10+9)
RegexpMatchEasy1_1K-48    2.52GB/s ± 1%  2.51GB/s ± 1%    ~     (p=1.000 n=9+8)
RegexpMatchMedium_32-48    149MB/s ± 0%   149MB/s ± 0%    ~     (p=0.324 n=9+10)
RegexpMatchMedium_1K-48   27.5MB/s ± 0%  26.9MB/s ± 3%    ~     (p=0.724 n=10+10)
RegexpMatchHard_32-48     15.6MB/s ± 0%  15.4MB/s ± 1%  -1.25%  (p=0.040 n=10+10)
RegexpMatchHard_1K-48     16.4MB/s ± 1%  16.1MB/s ± 3%  -1.93%  (p=0.019 n=10+10)
Revcomp-48                 509MB/s ± 3%   503MB/s ± 3%    ~     (p=0.165 n=10+10)
Template-48               32.6MB/s ± 2%  32.7MB/s ± 1%    ~     (p=0.175 n=10+9)

At least I didn't see obvious performance improvement on this machine, of course, it is not impossible to have performance gain on other machines, as shown in the commit message of the CL, but we can also notice that it does not have performance improvements on all machines.

I'm not sure what you mean by this. Not conducive how? What further optimization?

I don’t know what further optimizations can be done, but if there is an optimization opportunity in the future involving code like if y-c > 0, I would prefer to process the instruction sequence of sub / add + cmp + bgt instead of cmp + bmi + beq.
This may be just a personal preference. I really appreciate this fix, thanks @shawn-xdji .

@randall77
Copy link
Contributor Author

but we can also notice that it does not have performance improvements on all machines.

I'm also curious as to what's happening here. The commit message for CL 223097 mentions different machines, but not really at the level of detail that would tease out why some machines do better and some don't. I'm happy if some machines do better and the others at least do no worse.

if there is an optimization opportunity in the future involving code like if y-c > 0, I would prefer to process the instruction sequence of sub / add + cmp + bgt instead of cmp + bmi + beq.

Any optimization would see cmp + bgtnoov. The split into bmi+beq happens late, after all optimizations are done.

In any case, arguing against an optimization now so that we can do some future optimization (which may or may not happen) doesn't seem productive. We can always back out this optimization if future optimizations require that.

(And this issue is about fixing a bug, not really optimizations per se. I guess there is just a more performant and less performant way to fix it.)

@erifan
Copy link

erifan commented May 29, 2020

(And this issue is about fixing a bug, not really optimizations per se. I guess there is just a more performant and less performant way to fix it.)

Yes, I agree with you. Thanks.

@gopherbot
Copy link

Change https://golang.org/cl/236637 mentions this issue: cmd/compile: ARM comparisons with 0 incorrect on overflow

@shawndx
Copy link
Contributor

shawndx commented Jun 9, 2020

Enclose a follow-up to toolstash-check reports, no real issue was identified, just serving as a memo.

The following function in 'runtime/sema.go' was lowered to CMP+BLT previously, thus vulnerable to the wrong branching bug, it's used by 'sync.(*Cond).Wait' and affects goroutine's wakeup (seemingly, please correct me if I'm making any mistake). I checked unit tests and benchmarks of the 'sync' package, and it turns out both 'a' and 'b', indicating ticket number of a notification, are not large enough to trigger the issue in those cases. I guess the runtime must have been paralyzed before it reaches that many notifications, so there should be no weird scheduling issues caused by this bug in the past.

It's specific to 32bit arm, arm64 doesn't rewriting 'x-y' comparing to 0.

// less checks if a < b, considering a & b running counts that may overflow the
// 32-bit range, and that their "unwrapped" difference is always less than 2^31.
func less(a, b uint32) bool {
	return int32(a-b) < 0
}

@gopherbot
Copy link

Change https://golang.org/cl/237860 mentions this issue: cmd/compile: handle constant flags for 'noov' Ops

@gopherbot
Copy link

Change https://golang.org/cl/238677 mentions this issue: cmd/compile: Install testcases for flag constant Ops

gopherbot pushed a commit that referenced this issue Aug 28, 2020
Flag constant Ops on arm and arm64 are under refactoring, this change adds
a couple of testcases that verify the behavior of 'noov' branches.

Updates #39505
Updates #38740
Updates #39303
Change-Id: I493344b52276900cd296c32da494d72932dfc9be
Reviewed-on: https://go-review.googlesource.com/c/go/+/238677
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jun 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants