cmd/compile: arm comparisons with 0 incorrect on overflow #39303

randall77 · 2020-05-28T16:12:54Z

Comparisons like if x+10 < 0 branch incorrectly if the x+10 overflows.
This was found and fixed for arm64, see #38740, CL 233097. The test in that CL also fails on arm, so the same fix needs to happen for arm.

--- FAIL: TestCondRewrite (0.01s)
    --- FAIL: TestCondRewrite/AddConst32 (0.00s)
        rewriteCond_test.go:120: '0x7ffffffd + 11 < 0' failed
        rewriteCond_test.go:125: '0x7ffffffd + 13 <= 0' failed
        rewriteCond_test.go:130: '-0x7fffffff - 11 > 0' failed
        rewriteCond_test.go:135: '-0x7fffffff - 13 >= 0' failed
        rewriteCond_test.go:139: '0x7ffffffd + 19 > 0' failed
        rewriteCond_test.go:143: '0x7ffffffd + 23 >= 0' failed
        rewriteCond_test.go:147: '-0x7fffffff - 19 < 0' failed
        rewriteCond_test.go:151: '-0x7fffffff - 23 <= 0' failed
    --- FAIL: TestCondRewrite/AddVar32 (0.00s)
        rewriteCond_test.go:198: '0x7ffffffd + 0xb < 0' failed
        rewriteCond_test.go:203: '0x7ffffffd + 0xb <= 0' failed
        rewriteCond_test.go:208: '-0x7fffffff + -0xb > 0' failed
        rewriteCond_test.go:213: '-0x7fffffff + -0xb >= 0' failed
        rewriteCond_test.go:217: '0x7ffffffd + 0xb > 0' failed
        rewriteCond_test.go:221: '0x7ffffffd + 0xb >= 0' failed
        rewriteCond_test.go:225: '-0x7fffffff + -0xb < 0' failed
        rewriteCond_test.go:229: '-0x7fffffff + -0xb <= 0' failed
    --- FAIL: TestCondRewrite/MAddVar32 (0.00s)
        rewriteCond_test.go:276: '0x7ffffffd + 0xb*1 < 0' failed
        rewriteCond_test.go:281: '0x7ffffffd + 0xb*1 <= 0' failed
        rewriteCond_test.go:286: '-0x7fffffff + -0xb*1 > 0' failed
        rewriteCond_test.go:291: '-0x7fffffff + -0xb*1 >= 0' failed
        rewriteCond_test.go:295: '0x7ffffffd + 0xb*1 > 0' failed
        rewriteCond_test.go:299: '0x7ffffffd + 0xb*1 >= 0' failed
        rewriteCond_test.go:303: '-0x7fffffff + -0xb*1 < 0' failed
        rewriteCond_test.go:307: '-0x7fffffff + -0xb*1 <= 0' failed
    --- FAIL: TestCondRewrite/MSubVar32 (0.00s)
        rewriteCond_test.go:363: '0x7ffffffd - -0xb*1 < 0' failed
        rewriteCond_test.go:368: '0x7ffffffd - -0xb*1 <= 0' failed
        rewriteCond_test.go:373: '-0x7fffffff - 0xb*1 > 0' failed
        rewriteCond_test.go:378: '-0x7fffffff - 0xb*1 >= 0' failed
        rewriteCond_test.go:382: '0x7ffffffd - -0xb*1 > 0' failed
        rewriteCond_test.go:386: '0x7ffffffd - -0xb*1 >= 0' failed
        rewriteCond_test.go:390: '-0x7fffffff - 0xb*1 < 0' failed
        rewriteCond_test.go:394: '-0x7fffffff - 0xb*1 <= 0' failed

The text was updated successfully, but these errors were encountered:

andybons · 2020-05-28T16:17:21Z

Should this block the upcoming beta?

randall77 · 2020-05-28T16:19:43Z

@andybons No. This is a pretty esoteric corner case. It's also been broken for a while (at least 1.12?).

andybons · 2020-05-28T16:21:03Z

Thanks!

randall77 · 2020-05-28T16:33:50Z

@shawn-xdji

shawndx · 2020-05-29T02:10:29Z

Thanks @randall77 @andybons, I'm working on it.

erifan · 2020-05-29T02:44:49Z

Can we simply delete the problematic rules? By carefully designing to save an add or sub instruction, introducing rarely used MI and PL flags and continuous branch jumps will not bring significant performance improvements, but will make the code look complicated and difficult to maintain, It is also not conducive to further optimization.

randall77 · 2020-05-29T03:00:38Z

will not bring significant performance improvements

That's the crux of the question. We need to do that experiment. Assertions one way or the other without data aren't convincing.
If the rules don't make anything faster (or smaller) then yes, they can just be deleted.

It is also not conducive to further optimization

I'm not sure what you mean by this. Not conducive how? What further optimization?

shawndx · 2020-05-29T04:56:35Z

To the best of my knowledge, we may benefit from:

saving one register if the expression is of 'add' or 'sub' originally.
64-bit madd/msub costs 2 more cycles and achieves one third throughput, compared with mul (roughly).

erifan · 2020-05-29T05:47:10Z

That's the crux of the question. We need to do that experiment. Assertions one way or the other without data aren't convincing.

This is a test did on an AWS linux/arm64 cloud server with patch set1 and patch set10 of https://go-review.googlesource.com/c/go/+/233097

Benchmarks in the above CL:

benchstat patchset10 patchset1
name                     old time/op  new time/op  delta
CondRewrite/SoloJump-48  7.61ns ± 0%  7.62ns ± 0%  +0.13%  (p=0.000 n=10+10)
CondRewrite/CombJump-48  7.65ns ± 0%  7.69ns ± 0%  +0.52%  (p=0.000 n=10+10)

Benchmarks in go1:

benchstat patchset10 patchset1
name                      old time/op    new time/op    delta
BinaryTree17-48              3.05s ± 0%     3.05s ± 1%    ~     (p=0.400 n=9+10)
Fannkuch11-48                2.72s ± 0%     2.73s ± 1%  +0.35%  (p=0.043 n=9+10)
FmtFprintfEmpty-48          48.9ns ± 1%    49.0ns ± 1%    ~     (p=0.392 n=10+10)
FmtFprintfString-48         86.7ns ± 0%    86.8ns ± 0%    ~     (p=0.178 n=10+8)
FmtFprintfInt-48            97.4ns ± 0%    97.4ns ± 0%    ~     (p=0.421 n=10+10)
FmtFprintfIntInt-48          160ns ± 0%     160ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt-48     200ns ± 0%     201ns ± 1%    ~     (p=0.261 n=10+10)
FmtFprintfFloat-48           223ns ± 0%     222ns ± 0%    ~     (p=0.656 n=10+10)
FmtManyArgs-48               633ns ± 0%     633ns ± 0%    ~     (p=0.192 n=10+10)
GobDecode-48                6.84ms ± 0%    6.84ms ± 0%    ~     (p=0.549 n=9+10)
GobEncode-48                5.31ms ± 0%    5.33ms ± 0%  +0.30%  (p=0.000 n=10+10)
Gzip-48                      257ms ± 0%     255ms ± 0%  -0.78%  (p=0.000 n=10+10)
Gunzip-48                   41.4ms ± 0%    41.4ms ± 0%    ~     (p=0.739 n=10+10)
HTTPClientServer-48         53.6µs ± 3%    53.2µs ± 2%    ~     (p=0.165 n=10+10)
JSONEncode-48               10.9ms ± 0%    10.9ms ± 0%    ~     (p=0.684 n=10+10)
JSONDecode-48               47.2ms ± 0%    47.1ms ± 0%  -0.10%  (p=0.025 n=10+7)
Mandelbrot200-48            3.43ms ± 0%    3.43ms ± 0%    ~     (p=0.684 n=10+10)
GoParse-48                  3.29ms ± 0%    3.29ms ± 0%  +0.24%  (p=0.001 n=10+9)
RegexpMatchEasy0_32-48      78.4ns ± 0%    78.4ns ± 1%    ~     (p=0.797 n=10+9)
RegexpMatchEasy0_1K-48       229ns ± 1%     229ns ± 1%    ~     (p=0.802 n=9+10)
RegexpMatchEasy1_32-48      73.8ns ± 0%    74.0ns ± 1%    ~     (p=0.307 n=9+9)
RegexpMatchEasy1_1K-48       407ns ± 1%     408ns ± 1%    ~     (p=0.772 n=9+10)
RegexpMatchMedium_32-48     6.72ns ± 0%    6.72ns ± 0%    ~     (p=0.440 n=9+10)
RegexpMatchMedium_1K-48     37.3µs ± 0%    38.0µs ± 3%    ~     (p=0.616 n=10+10)
RegexpMatchHard_32-48       2.06µs ± 0%    2.08µs ± 1%    ~     (p=0.111 n=10+10)
RegexpMatchHard_1K-48       62.3µs ± 1%    63.5µs ± 3%  +1.98%  (p=0.023 n=10+10)
Revcomp-48                   500ms ± 3%     506ms ± 3%    ~     (p=0.165 n=10+10)
Template-48                 59.5ms ± 2%    59.3ms ± 1%    ~     (p=0.182 n=10+9)
TimeParse-48                 382ns ± 0%     382ns ± 0%    ~     (p=0.805 n=10+10)
TimeFormat-48                358ns ± 0%     358ns ± 0%  -0.11%  (p=0.046 n=10+8)

name                      old speed      new speed      delta
GobDecode-48               112MB/s ± 0%   112MB/s ± 0%    ~     (p=0.535 n=9+10)
GobEncode-48               144MB/s ± 0%   144MB/s ± 0%  -0.30%  (p=0.000 n=10+10)
Gzip-48                   75.6MB/s ± 0%  76.2MB/s ± 0%  +0.79%  (p=0.000 n=10+10)
Gunzip-48                  469MB/s ± 0%   469MB/s ± 0%    ~     (p=0.782 n=10+10)
JSONEncode-48              179MB/s ± 0%   179MB/s ± 0%    ~     (p=0.754 n=10+10)
JSONDecode-48             41.1MB/s ± 0%  41.2MB/s ± 0%  +0.10%  (p=0.021 n=10+7)
GoParse-48                17.6MB/s ± 0%  17.6MB/s ± 0%  -0.23%  (p=0.003 n=10+9)
RegexpMatchEasy0_32-48     408MB/s ± 0%   408MB/s ± 1%    ~     (p=0.905 n=10+9)
RegexpMatchEasy0_1K-48    4.47GB/s ± 1%  4.47GB/s ± 1%    ~     (p=0.905 n=9+10)
RegexpMatchEasy1_32-48     433MB/s ± 0%   433MB/s ± 1%    ~     (p=0.356 n=10+9)
RegexpMatchEasy1_1K-48    2.52GB/s ± 1%  2.51GB/s ± 1%    ~     (p=1.000 n=9+8)
RegexpMatchMedium_32-48    149MB/s ± 0%   149MB/s ± 0%    ~     (p=0.324 n=9+10)
RegexpMatchMedium_1K-48   27.5MB/s ± 0%  26.9MB/s ± 3%    ~     (p=0.724 n=10+10)
RegexpMatchHard_32-48     15.6MB/s ± 0%  15.4MB/s ± 1%  -1.25%  (p=0.040 n=10+10)
RegexpMatchHard_1K-48     16.4MB/s ± 1%  16.1MB/s ± 3%  -1.93%  (p=0.019 n=10+10)
Revcomp-48                 509MB/s ± 3%   503MB/s ± 3%    ~     (p=0.165 n=10+10)
Template-48               32.6MB/s ± 2%  32.7MB/s ± 1%    ~     (p=0.175 n=10+9)

At least I didn't see obvious performance improvement on this machine, of course, it is not impossible to have performance gain on other machines, as shown in the commit message of the CL, but we can also notice that it does not have performance improvements on all machines.

I'm not sure what you mean by this. Not conducive how? What further optimization?

I don’t know what further optimizations can be done, but if there is an optimization opportunity in the future involving code like if y-c > 0, I would prefer to process the instruction sequence of sub / add + cmp + bgt instead of cmp + bmi + beq.
This may be just a personal preference. I really appreciate this fix, thanks @shawn-xdji .

randall77 · 2020-05-29T05:58:02Z

but we can also notice that it does not have performance improvements on all machines.

I'm also curious as to what's happening here. The commit message for CL 223097 mentions different machines, but not really at the level of detail that would tease out why some machines do better and some don't. I'm happy if some machines do better and the others at least do no worse.

if there is an optimization opportunity in the future involving code like if y-c > 0, I would prefer to process the instruction sequence of sub / add + cmp + bgt instead of cmp + bmi + beq.

Any optimization would see cmp + bgtnoov. The split into bmi+beq happens late, after all optimizations are done.

In any case, arguing against an optimization now so that we can do some future optimization (which may or may not happen) doesn't seem productive. We can always back out this optimization if future optimizations require that.

(And this issue is about fixing a bug, not really optimizations per se. I guess there is just a more performant and less performant way to fix it.)

erifan · 2020-05-29T06:02:45Z

(And this issue is about fixing a bug, not really optimizations per se. I guess there is just a more performant and less performant way to fix it.)

Yes, I agree with you. Thanks.

gopherbot · 2020-06-05T04:39:44Z

Change https://golang.org/cl/236637 mentions this issue: cmd/compile: ARM comparisons with 0 incorrect on overflow

shawndx · 2020-06-09T15:44:38Z

Enclose a follow-up to toolstash-check reports, no real issue was identified, just serving as a memo.

The following function in 'runtime/sema.go' was lowered to CMP+BLT previously, thus vulnerable to the wrong branching bug, it's used by 'sync.(*Cond).Wait' and affects goroutine's wakeup (seemingly, please correct me if I'm making any mistake). I checked unit tests and benchmarks of the 'sync' package, and it turns out both 'a' and 'b', indicating ticket number of a notification, are not large enough to trigger the issue in those cases. I guess the runtime must have been paralyzed before it reaches that many notifications, so there should be no weird scheduling issues caused by this bug in the past.

It's specific to 32bit arm, arm64 doesn't rewriting 'x-y' comparing to 0.

// less checks if a < b, considering a & b running counts that may overflow the
// 32-bit range, and that their "unwrapped" difference is always less than 2^31.
func less(a, b uint32) bool {
	return int32(a-b) < 0
}

gopherbot · 2020-06-15T10:53:32Z

Change https://golang.org/cl/237860 mentions this issue: cmd/compile: handle constant flags for 'noov' Ops

gopherbot · 2020-06-18T08:00:05Z

Change https://golang.org/cl/238677 mentions this issue: cmd/compile: Install testcases for flag constant Ops

Flag constant Ops on arm and arm64 are under refactoring, this change adds a couple of testcases that verify the behavior of 'noov' branches. Updates #39505 Updates #38740 Updates #39303 Change-Id: I493344b52276900cd296c32da494d72932dfc9be Reviewed-on: https://go-review.googlesource.com/c/go/+/238677 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>

randall77 added the release-blocker label May 28, 2020

randall77 added this to the Go1.15 milestone May 28, 2020

andybons added the NeedsFix label May 28, 2020

andybons added the okay-after-beta1 label May 28, 2020

gopherbot closed this as completed in e031318 Jun 9, 2020

golang locked and limited conversation to collaborators Jun 18, 2021

gopherbot added the FrozenDueToAge label Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: arm comparisons with 0 incorrect on overflow #39303

cmd/compile: arm comparisons with 0 incorrect on overflow #39303

randall77 commented May 28, 2020 •

edited

Loading

andybons commented May 28, 2020

randall77 commented May 28, 2020

andybons commented May 28, 2020

randall77 commented May 28, 2020

shawndx commented May 29, 2020

erifan commented May 29, 2020

randall77 commented May 29, 2020

shawndx commented May 29, 2020

erifan commented May 29, 2020 •

edited

Loading

randall77 commented May 29, 2020

erifan commented May 29, 2020

gopherbot commented Jun 5, 2020

shawndx commented Jun 9, 2020

gopherbot commented Jun 15, 2020

gopherbot commented Jun 18, 2020

cmd/compile: arm comparisons with 0 incorrect on overflow #39303

cmd/compile: arm comparisons with 0 incorrect on overflow #39303

Comments

randall77 commented May 28, 2020 • edited Loading

andybons commented May 28, 2020

randall77 commented May 28, 2020

andybons commented May 28, 2020

randall77 commented May 28, 2020

shawndx commented May 29, 2020

erifan commented May 29, 2020

randall77 commented May 29, 2020

shawndx commented May 29, 2020

erifan commented May 29, 2020 • edited Loading

randall77 commented May 29, 2020

erifan commented May 29, 2020

gopherbot commented Jun 5, 2020

shawndx commented Jun 9, 2020

gopherbot commented Jun 15, 2020

gopherbot commented Jun 18, 2020

randall77 commented May 28, 2020 •

edited

Loading

erifan commented May 29, 2020 •

edited

Loading