Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: arm64 assembly code for shlVU is incorrect #31084

Closed
ALTree opened this issue Mar 27, 2019 · 36 comments
Closed

math/big: arm64 assembly code for shlVU is incorrect #31084

ALTree opened this issue Mar 27, 2019 · 36 comments
Labels
ExpertNeeded FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ALTree
Copy link
Member

ALTree commented Mar 27, 2019

The TestFloat64SpecialCases test in math/big (ratconv_test.go) is currently broken on the android/arm64 builder, with log:

https://build.golang.org/log/fbc581637a0375532f0d819ae67bc9b9abbe8ba9

Breakage started at e4ba400 (math/big: accept non-decimal floats with Rat.SetString).

cc @griesemer

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 27, 2019
@ALTree ALTree added this to the Go1.13 milestone Mar 27, 2019
@griesemer griesemer self-assigned this Mar 27, 2019
@griesemer
Copy link
Contributor

Hm. It's only broken on android-arm-wikofever. @eliasnaur, it looks like you're the owner of that builder. Is there anything special about that arm device? It's a Cortex-A53 CPU?

@eliasnaur
Copy link
Contributor

I don't think it's a special device, other than it being an Android phone.

The test also broke on darwin/arm64 (iPhone):

https://build.golang.org/log/0b86dd35833a34f28785d335153946ed9a5a3691

which I assume uses an entirely different CPU.

@griesemer
Copy link
Contributor

Indeed, and that appears to be an iPhone 7+. Interesting.

@griesemer
Copy link
Contributor

Here's a smaller test case: https://play.golang.org/p/7OUzcEgoVGk

In the playground, and elsewhere, running this code produces the correct result:

r = 1/100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (true)
len(r) = 308
f = 1e-305 (false)
e = 1e-305

On android-arm-wikofever (and presumably darwin/arm64) we get:

r = 1/96541187637860552075819821540903887880024095043179244007497829804714413865809870901655967120022719342752763287637146850362464306041721929996152029995888654682648990460561537640800770715572834310810731676381392326036599721803873644105423265719091992037347983408132957025607386110373485836090285769528130906264626714751533056 (true)
len(r) = 325
f = 1e-323 (false)
e = 1e-305

That is the fraction is completely wrong.

@griesemer
Copy link
Contributor

The fraction is correct for 1e-164, and then wrong for 1e165.

@gopherbot
Copy link

Change https://golang.org/cl/169721 mentions this issue: math/big: temporarily disable buggy shlVU assembly for arm64

gopherbot pushed a commit that referenced this issue Mar 27, 2019
This addresses the failures we have seen in #31084. The correct
fix is to find the actual bug in the assembly code.

Updates #31084.

Change-Id: I437780c53d0c4423d742e2e3b650b899ce845372
Reviewed-on: https://go-review.googlesource.com/c/go/+/169721
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@griesemer griesemer changed the title math/big: ratconv tests broken on android arm64 builder math/big: arm64 assembly code for shlVU is incorrect Mar 28, 2019
@griesemer
Copy link
Contributor

I've tracked this down a bit more. Here is a stand-alone test file (to be added to the math/big test files) that exposes the problem:

package big

import (
	"strings"
	"testing"
)

func TestSHLVU(t *testing.T) {
	// compute 10^n via 5^n << n
	const n = 165
	p := nat(nil).expNN(nat{5}, nat{n}, nil)
	p = p.shl(p, uint(n)) // <<< this doesn't work using ·shlVU
	// p = nat(nil).shl(p, uint(n)) // <<< this works

	got := string(p.utoa(10))
	want := "1" + strings.Repeat("0", n)
	if got != want {
		t.Errorf("got  %s\nwant %s\n", got, want)
	}
}

When run with the ·shlVU assembly routine (in arith_arm64.s) enabled, this test fails on arm:

$ gomote run -path '$PATH,$WORKDIR/go/bin' $MOTE go/bin/go test -run SHLVU ../src/math/big
go_android_exec: adb wait-for-device exec-out while [[ -z $(getprop sys.boot_completed) ]]; do sleep 1; done;
go_android_exec: adb push /var/folders/f6/d2bhfqss2716nxm8gkv1fmb80000gn/T/workdir-host-darwin-amd64-eliasnaur-android/tmp/go-build825267701/b001/big.test /data/local/tmp/go_android_exec/big.test-89906/big.test
[  1%] /data/local/tmp/go_android_exec/big.test-89906/big.test
...
[100%] /data/local/tmp/go_android_exec/big.test-89906/big.test
/var/folders/f6/d2bhfqss2716nxm8gkv1fmb80000gn/T/workdir-host-darwin-amd64-eliasnaur-android/tmp/go-build825267701/b001/big.test: 1 file pushed. 10.1 MB/s (6329512 bytes in 0.597s)
go_android_exec: adb exec-out export TMPDIR="/data/local/tmp/go_android_exec/big.test-89906"; export GOROOT="/data/local/tmp/go_android_exec/goroot"; export GOPATH="/data/local/tmp/go_android_exec/big.test-89906/gopath"; export CGO_ENABLED=0; export GOPROXY=; export GOCACHE="/data/local/tmp/go_android_exec/gocache"; export PATH=$PATH:"/data/local/tmp/go_android_exec/goroot/bin"; cd "/data/local/tmp/go_android_exec/goroot/src/math/big"; '/data/local/tmp/go_android_exec/big.test-89906/big.test' -test.run=SHLVU; echo -n exitcode=$?
--- FAIL: TestSHLVU (0.00s)
    shl_test.go:18: got  999999999999999999999999999999999999997461262869164377797676543286813257059020658007941683889923711150197932805888983316150450477342246492361833965695145908050591744
        want 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
FAIL
exitcode=1go_android_exec: adb exec-out rm -rf /data/local/tmp/go_android_exec/big.test-89906
FAIL	math/big	0.988s
Error running run: exit status 1

When the assembly routine is disabled, the code works fine.

When the assembly routine is enabled but the test case is tweaked slightly such that it uses a nat(nil) receiver rather than p, then the code also seems to work. This is likely an indicator as to what might be wrong with the assembly code.

When fixing this, one should also look into ·shrVU and verify that it doesn't suffer from the same problem.

cc: @erifan @josharian who seem to have written some of this code

@josharian
Copy link
Contributor

Looks like this routine was written by @erifan. The problem is definitely not obvious at a first read through the code. Let me know if you'd like me to take a closer look; otherwise, I will (tentatively) leave this for @erifan.

@josharian
Copy link
Contributor

Also, the shift==0 case is missing an optimization. If z.ptr == x.ptr, there's no need to do a copy at all; you can simply return.

The pure Go code uses the copy built-in to do the copy, which does this optimization under the hood. amd64 is missing the shift==0 optimization entirely. I'll file a separate issue for that.

@griesemer
Copy link
Contributor

It's definitively subtle. The problem doesn't appear for n < 165, at least for https://play.golang.org/p/7OUzcEgoVGk code.

@erifan
Copy link

erifan commented Mar 28, 2019

Sorry, I'll fix this bug.

@ianlancetaylor
Copy link
Contributor

The problem is that the loop in the assembler code goes forward through the arrays. It has to go backward. Otherwise the output overwrites the input.

The same is true of shrVU.

@ianlancetaylor
Copy link
Contributor

Of course I'm wrong about shrVU, which has to go forward. Only shlVU has to go backward.

@eliasnaur
Copy link
Contributor

@bradfitz I can't imagine this doesn't fail on all arm64 CPUs, yet I see no arm64 builders on build.golang.org apart from the mobiles. Are they gone or am I blind?
With generic arm64 builders, gri wouldn't have to go through that painful debugging session.

@erifan
Copy link

erifan commented Mar 28, 2019

The code has been written and I will submit it immediately.

@gopherbot
Copy link

Change https://golang.org/cl/169780 mentions this issue: math/big: fix the bug in assembly implementation of shlVU on arm64

@erifan
Copy link

erifan commented Mar 28, 2019

I found that the implementation of shlVU on amd64 does not consider the case where the high byte of z overlaps with the low byte of x, so I also did not consider this point. Do we need to consider this situation?

@ianlancetaylor
Copy link
Contributor

We only have to consider word level overlap, not byte level overlap, if that is what you mean. But we do have to consider the case where the high word of z overlaps with the low word of x. I don't see any obvious problem in the amd64 implementation but maybe there is one.

@griesemer
Copy link
Contributor

Byte-overlap should never happen, it's either one or more words, or none - otherwise we'd have a serious bug elsewhere. You can safely assume that all pointers are word-aligned.

The assembly code should behave essentially the same as shlVU_g in arith.go.

@erifan
Copy link

erifan commented Mar 29, 2019

Yes I mean Word level overlap. And I can construct an negative case for this situation, which can prove that the current pure Go implementation are defective.

func TestShlVUOverlap(t *testing.T) {
        a := argVW{nat{2, 2, 2, 2, 2, _M << 1 &_M, 1}, nat(nil).set(nat{1, 1, 1, 1, 1, 1, 1, 1, 1, _M}), 1, 1}
        // save a.x for error message because it will be overrided.
        b := nat(nil).make(10)
        copy(b, a.x)
        got := a.x[0:6].shl(a.x[4:10], uint(a.y))
        for i, zi := range got {
                if zi != a.z[i] {
                        t.Errorf("special shl(%v, %v)\n\tgot z[%d] = %#x; want %#x", b[4:10], a.y, i, zi, a.z[i])
                        break
                }
        }
}

Add this test to file math/big/arith_test.go, and run
$ ../../../bin/go test -run TestShlVU

With the pure Go implementation, we get:

--- FAIL: TestShlVUOverlap (0.00s)
arith_test.go:260: special shl([1 1 1 1 1 18446744073709551615], 1)
got z[0] = 0x4; want 0x2
FAIL
exit status 1
FAIL math/big 0.072s

The current use cases of function shl didn't expose this bug, and as function nat.shl is not a public function, so I wonder if we need to consider this situation when implementing this function or is there some mechanisms that can guarantee this situation never happen?

@ianlancetaylor
Copy link
Contributor

shlVU is an internal function, so we don't need to consider all possible ways to call it. We only need to consider uses of the exported methods. We can trust that the code in the math/big package understands the limitations of shlVU.

@josharian
Copy link
Contributor

We can trust that the code in the math/big package understands the limitations of shlVU.

The code might, but I couldn't tell you those limitations offhand. It'd be good to document them.

Once they're documented, we can start fuzzing them using those documented limitations. Since there are multiple implementations, these are good fuzz candidates. (I have started on this.)

@gopherbot
Copy link

Change https://golang.org/cl/168257 mentions this issue: math/big: add comprehensive aliasing tests (and minor fixes to Exp, Rand)

josharian added a commit to josharian/go that referenced this issue May 7, 2019
DO NOT MAIL

TODO: shrVU too
TODO: benchmarks
TODO: fuzz for confidence
TODO: better commit message

When shift == 0, shlVU and shrVU reduce to a memcopy. When z.ptr == x.ptr, it further reduces to a no-op. The pure Go implementation has these optimizations, as of https://go-review.googlesource.com/c/go/+/164967. The arm64 implementation has one of them (see golang#31084 (comment)). We should add both to the amd64 implementation.

cc @griesemer

Fixes golang#31097

Change-Id: I3979d7c82a63e1840c8191636a8947e8f440af3b
@erifan
Copy link

erifan commented Jul 4, 2019

@gopherbot Please open backport issue to 1.13

@gopherbot
Copy link

Backport issue(s) opened: #32940 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@dmitshur
Copy link
Contributor

dmitshur commented Jul 4, 2019

@erifan Why do you think this should be backported to Go 1.13? Go 1.13 hasn't been released yet, it can't have backports. Did you mean another release?

@erifan
Copy link

erifan commented Jul 5, 2019

Hi @dmitshur , I just hope that this fix can go into Go1.13. This fix was merged before the release date of Go1.12.6, but it did not enter the Go1.12.6. I'm not sure if it will go into Go.1.13, so I raised this request.
It would be better if it could get into an earlier release, because some programs are now affected by this bug, such as https://github.com/shopspring/decimal.

@dmitshur
Copy link
Contributor

dmitshur commented Jul 5, 2019

@erifan Since the fix is in master now, it will be a part of Go 1.13 when we release that. We create new release branches and the cherry-pick process begins at the time of the first release candidate, but we haven't cut that yet (only the first beta has been made by now). (See https://golang.org/s/release for more info about the Go release cycle.)

I'll close the 1.13 cherry-pick issue since it's not necessary.

@erifan
Copy link

erifan commented Jul 5, 2019

@dmitshur Ok, thanks.

@erifan
Copy link

erifan commented Jul 8, 2019

Excuse me @dmitshur , There's another question. Will there be a Go1.12.7 version before Go1.13? If so, I hope this fix can be ported in Go1.12.7. Or, if there is Go1.12.7, no backport is needed and this fix will be included? Thanks.

@dmitshur
Copy link
Contributor

dmitshur commented Jul 8, 2019

There will be a Go 1.12.7 release soon (we aim to make minor releases monthly). We can consider cherry-picking this fix to 1.12 if it meets the backporting criteria (described here). If it doesn't, then it'll be available starting with 1.13.

@dmitshur
Copy link
Contributor

dmitshur commented Jul 8, 2019

@erifan According to the original issue, this breakage started with commit e4ba400. That commit isn't a part of 1.12. So does this problem exist in 1.12?

@erifan
Copy link

erifan commented Jul 8, 2019

@dmitshur In Go, the breakage started from commit e4ba400, but for user code, the affected version may be earlier than 1.12, and there is currently no release version that fixes this issue. The user of Project https://github.com/shopspring/decimal complained to me about this issue. Since it is a library function, users will use it frequently, so I think this is a security issue. If possible, please port it to Go1.12.7, thank you.

@dmitshur
Copy link
Contributor

dmitshur commented Jul 8, 2019

Ok, thanks @erifan. I've repurposed #32940 as the 1.12 backport issue.

I'll discuss whether it meets the backporting criteria with the team tomorrow. If it gets approved for a cherry-pick, we'll need to make a cherry-pick CL for it as described here; would you like to work on that?

@erifan
Copy link

erifan commented Jul 8, 2019

@dmitshur Yeah, I'd like to, thanks.

@gopherbot
Copy link

Change https://golang.org/cl/185041 mentions this issue: [release-branch.go1.12] math/big: fix the bug in assembly implementation of shlVU on arm64

gopherbot pushed a commit that referenced this issue Aug 2, 2019
…ion of shlVU on arm64

For the case where the addresses of parameter z and x of the function
shlVU overlap and the address of z is greater than x, x (input value)
can be polluted during the calculation when the high words of x are
overlapped with the low words of z (output value).

Updates #31084
Fixes #32940

Change-Id: I9bb0266a1d7856b8faa9a9b1975d6f57dece0479
Reviewed-on: https://go-review.googlesource.com/c/go/+/169780
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
(cherry picked from commit 503e6cc)
Reviewed-on: https://go-review.googlesource.com/c/go/+/185041
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
apparentlymart added a commit to hashicorp/terraform that referenced this issue Aug 16, 2019
This is a minor release of Go that does not include any changes that
affect Terraform's behavior.

This does include a fix for golang/go#31084 that could potentially affect
HCL arithmetic (via math/big) on aarch64, but we do not currently build
Terraform for aarch64 so it cannot have affected any previous releases.
apparentlymart added a commit to hashicorp/terraform that referenced this issue Aug 16, 2019
This is a minor release of Go that does not include any changes that
affect Terraform's behavior.

This does include a fix for golang/go#31084 that could potentially affect
HCL arithmetic (via math/big) on aarch64, but we do not currently build
Terraform for aarch64 so it cannot have affected any previous releases.
@golang golang locked and limited conversation to collaborators Jul 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ExpertNeeded FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants