New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: testing/quick misbehaves on Nexus 9 linux/arm64 #19809
Labels
Milestone
Comments
CL https://golang.org/cl/39152 mentions this issue. |
The following C program misbehaves on the problematic system.
After 20k-50k iterations the loop starts printing i on every iteration. Not Go's fault. We'll see about reporting this elsewhere. |
gopherbot
pushed a commit
that referenced
this issue
Apr 3, 2017
When generating a random int8, uint8, int16, uint16, int32, uint32, quick.Value chooses among all possible values. But when generating a random int64 or uint64, it only chooses values in the range [-2⁶², 2⁶²) (even for uint64). It should, like for all the other integers, use the full range. If it had, this would have caught #19807 earlier. Instead it let us discover the presence of #19809. While we are here, also make the default source of randomness not completely deterministic. Fixes #19808. Change-Id: I070f852531c92b3670bd76523326c9132bfc9416 Reviewed-on: https://go-review.googlesource.com/39152 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>
CL https://golang.org/cl/39310 mentions this issue. |
gopherbot
pushed a commit
that referenced
this issue
Apr 3, 2017
Old buggy hardware incorrectly executes the shift-left-K then shift-right-K idiom for clearing K leftmost bits. Use a right rotate instead of shift to avoid triggering the bug. Fixes #19809. Change-Id: I6dc646b183c29e9d01aef944729f34388dcc687d Reviewed-on: https://go-review.googlesource.com/39310 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
lparth
pushed a commit
to lparth/go
that referenced
this issue
Apr 13, 2017
When generating a random int8, uint8, int16, uint16, int32, uint32, quick.Value chooses among all possible values. But when generating a random int64 or uint64, it only chooses values in the range [-2⁶², 2⁶²) (even for uint64). It should, like for all the other integers, use the full range. If it had, this would have caught golang#19807 earlier. Instead it let us discover the presence of golang#19809. While we are here, also make the default source of randomness not completely deterministic. Fixes golang#19808. Change-Id: I070f852531c92b3670bd76523326c9132bfc9416 Reviewed-on: https://go-review.googlesource.com/39152 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>
lparth
pushed a commit
to lparth/go
that referenced
this issue
Apr 13, 2017
Old buggy hardware incorrectly executes the shift-left-K then shift-right-K idiom for clearing K leftmost bits. Use a right rotate instead of shift to avoid triggering the bug. Fixes golang#19809. Change-Id: I6dc646b183c29e9d01aef944729f34388dcc687d Reviewed-on: https://go-review.googlesource.com/39310 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
lewurm
added a commit
to lewurm/mono
that referenced
this issue
May 16, 2017
we got a couple of bug reports, all with the same failure: https://bugzilla.xamarin.com/show_bug.cgi?id=44907 https://bugzilla.xamarin.com/show_bug.cgi?id=46482 https://bugzilla.xamarin.com/show_bug.cgi?id=51791 The bugs are private, therefore here what I wrote: `` Thank you Matthias, this was very helpful. The crash happens on this assignment: https://github.com/mono/mono/blob/de1865dad5c0350f391fedcaa08f02f610530d3f/mono/mini/mini-generic-sharing.c#L418 Here the according disassembly: https://gist.github.com/lewurm/b1094749027c9e5ea19fdc4fac7905a7 The crash happens in the last loop iteration (`r9=i`, `r7=slot`). Looking at the machine code it just _cannot_ happen, which is confirmed by the C code as well: `*oti` successfully happens in the if check, but after returning from `alloc_oti()`, `oti` doesn't contain a valid address anymore. I suspect some weird hardware issue that fails to restore all registers properly from the stack. ``` [...] ``` Thanks again Matthias. Unfortunately, I'm out of ideas, and I can't blame anything but the hardware. The situation we see is too weird. We segfault at offset `0xdda60: str sl, [r4]`, but really we should already segfault at offset `0xdda2c: ldrge sl, [r4, mono#8]!"`. So I suspect two things why this could happen: (1) The instruction at `dda2c` fails to do the post-increment correctly for *whatever* reason. (2) Something along the execution path corrupts the stackslot, where `r4` is saved, in such a way that it *exactly* masks it with `0xf`. Everything else on the stack looks fine, so this is sort of very unlikely to be honest. This only happens on a very specific device: The Nexus 9 is the only device that was ever shipped with the Tegra K1 T132. I suspect an issue in the binary translation layer of `armv7` instruction set to the internal micro-ops of the CPU. I tried to stress test the instruction in question (see https://github.com/lewurm/ldrinsntest), however I was not able to trigger a crash. So either, I'm missing some context in order to trigger the bug or I'm on a completely wrong track. That said, even if we could proof that it is indeed a hardware issue, the workaround is also non-trivial (it would then either need a fix in gcc or require a microcode update by the chip vendor). ``` And then we saw: golang/go#19809 (comment) Another hint that this device is buggy.
lewurm
added a commit
to lewurm/mono
that referenced
this issue
May 16, 2017
we got a couple of bug reports, all with the same failure: https://bugzilla.xamarin.com/show_bug.cgi?id=44907 https://bugzilla.xamarin.com/show_bug.cgi?id=46482 https://bugzilla.xamarin.com/show_bug.cgi?id=51791 The bugs are private, therefore here what I wrote: `` Thank you Matthias, this was very helpful. The crash happens on this assignment: https://github.com/mono/mono/blob/de1865dad5c0350f391fedcaa08f02f610530d3f/mono/mini/mini-generic-sharing.c#L418 Here the according disassembly: https://gist.github.com/lewurm/b1094749027c9e5ea19fdc4fac7905a7 The crash happens in the last loop iteration (`r9=i`, `r7=slot`). Looking at the machine code it just _cannot_ happen, which is confirmed by the C code as well: `*oti` successfully happens in the if check, but after returning from `alloc_oti()`, `oti` doesn't contain a valid address anymore. I suspect some weird hardware issue that fails to restore all registers properly from the stack. ``` [...] ``` Thanks again Matthias. Unfortunately, I'm out of ideas, and I can't blame anything but the hardware. The situation we see is too weird. We segfault at offset `0xdda60: str sl, [r4]`, but really we should already segfault at offset `0xdda2c: ldrge sl, [r4, mono#8]!"`. So I suspect two things why this could happen: (1) The instruction at `dda2c` fails to do the post-increment correctly for *whatever* reason. (2) Something along the execution path corrupts the stackslot, where `r4` is saved, in such a way that it *exactly* masks it with `0xf`. Everything else on the stack looks fine, so this is sort of very unlikely to be honest. This only happens on a very specific device: The Nexus 9 is the only device that was ever shipped with the Tegra K1 T132. I suspect an issue in the binary translation layer of `armv7` instruction set to the internal micro-ops of the CPU. I tried to stress test the instruction in question (see https://github.com/lewurm/ldrinsntest), however I was not able to trigger a crash. So either, I'm missing some context in order to trigger the bug or I'm on a completely wrong track. That said, even if we could proof that it is indeed a hardware issue, the workaround is also non-trivial (it would then either need a fix in gcc or require a microcode update by the chip vendor). ``` And then we saw: golang/go#19809 (comment) Another hint that this device is buggy.
lewurm
added a commit
to lewurm/mono
that referenced
this issue
May 16, 2017
we got a couple of bug reports, all with the same failure: https://bugzilla.xamarin.com/show_bug.cgi?id=44907 https://bugzilla.xamarin.com/show_bug.cgi?id=46482 https://bugzilla.xamarin.com/show_bug.cgi?id=51791 The bugs are private, therefore here what I wrote: ``` Thank you Matthias, this was very helpful. >>> 09-30 12:24:50.347: A/DEBUG(6211): pid: 6191, tid: 6209, name: Thread-4 >>> com.distech.x50.ui.droid <<< >>> 09-30 12:24:50.347: A/DEBUG(6211): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8 >>> 09-30 12:24:50.347: A/DEBUG(6211): r0 d6b58728 r1 00004001 r2 00000000 r3 d6b58738 >>> 09-30 12:24:50.347: A/DEBUG(6211): r4 00000008 r5 ea97427c r6 00001f3c r7 00000015 >>> 09-30 12:24:50.347: A/DEBUG(6211): r8 00001f38 r9 00000015 sl d6b58728 fp d67fed90 >>> 09-30 12:24:50.347: A/DEBUG(6211): ip ea9742c0 sp d67fed60 lr ea7a4504 pc ea716a58 cpsr 800e0010 The crash happens on this assignment: https://github.com/mono/mono/blob/de1865dad5c0350f391fedcaa08f02f610530d3f/mono/mini/mini-generic-sharing.c#L418 Here the according disassembly: https://gist.github.com/lewurm/b1094749027c9e5ea19fdc4fac7905a7 The crash happens in the last loop iteration (`r9=i`, `r7=slot`). Looking at the machine code it just _cannot_ happen, which is confirmed by the C code as well: `*oti` successfully happens in the if check, but after returning from `alloc_oti()`, `oti` doesn't contain a valid address anymore. I suspect some weird hardware issue that fails to restore all registers properly from the stack. ``` [...] ``` Thanks again Matthias. Unfortunately, I'm out of ideas, and I can't blame anything but the hardware. The situation we see is too weird. We segfault at offset `0xdda60: str sl, [r4]`, but really we should already segfault at offset `0xdda2c: ldrge sl, [r4, mono#8]!"`. So I suspect two things why this could happen: (1) The instruction at `dda2c` fails to do the post-increment correctly for *whatever* reason. (2) Something along the execution path corrupts the stackslot, where `r4` is saved, in such a way that it *exactly* masks it with `0xf`. Everything else on the stack looks fine, so this is sort of very unlikely to be honest. This only happens on a very specific device: The Nexus 9 is the only device that was ever shipped with the Tegra K1 T132. I suspect an issue in the binary translation layer of `armv7` instruction set to the internal micro-ops of the CPU. I tried to stress test the instruction in question (see https://github.com/lewurm/ldrinsntest), however I was not able to trigger a crash. So either, I'm missing some context in order to trigger the bug or I'm on a completely wrong track. That said, even if we could proof that it is indeed a hardware issue, the workaround is also non-trivial (it would then either need a fix in gcc or require a microcode update by the chip vendor). ``` And then we saw: golang/go#19809 (comment) Another hint that this device is buggy.
lewurm
added a commit
to mono/mono
that referenced
this issue
May 29, 2017
we got a couple of bug reports, all with the same failure: https://bugzilla.xamarin.com/show_bug.cgi?id=44907 https://bugzilla.xamarin.com/show_bug.cgi?id=46482 https://bugzilla.xamarin.com/show_bug.cgi?id=51791 The bugs are private, therefore here what I wrote: ``` Thank you Matthias, this was very helpful. >>> 09-30 12:24:50.347: A/DEBUG(6211): pid: 6191, tid: 6209, name: Thread-4 >>> com.distech.x50.ui.droid <<< >>> 09-30 12:24:50.347: A/DEBUG(6211): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8 >>> 09-30 12:24:50.347: A/DEBUG(6211): r0 d6b58728 r1 00004001 r2 00000000 r3 d6b58738 >>> 09-30 12:24:50.347: A/DEBUG(6211): r4 00000008 r5 ea97427c r6 00001f3c r7 00000015 >>> 09-30 12:24:50.347: A/DEBUG(6211): r8 00001f38 r9 00000015 sl d6b58728 fp d67fed90 >>> 09-30 12:24:50.347: A/DEBUG(6211): ip ea9742c0 sp d67fed60 lr ea7a4504 pc ea716a58 cpsr 800e0010 The crash happens on this assignment: https://github.com/mono/mono/blob/de1865dad5c0350f391fedcaa08f02f610530d3f/mono/mini/mini-generic-sharing.c#L418 Here the according disassembly: https://gist.github.com/lewurm/b1094749027c9e5ea19fdc4fac7905a7 The crash happens in the last loop iteration (`r9=i`, `r7=slot`). Looking at the machine code it just _cannot_ happen, which is confirmed by the C code as well: `*oti` successfully happens in the if check, but after returning from `alloc_oti()`, `oti` doesn't contain a valid address anymore. I suspect some weird hardware issue that fails to restore all registers properly from the stack. ``` [...] ``` Thanks again Matthias. Unfortunately, I'm out of ideas, and I can't blame anything but the hardware. The situation we see is too weird. We segfault at offset `0xdda60: str sl, [r4]`, but really we should already segfault at offset `0xdda2c: ldrge sl, [r4, #8]!"`. So I suspect two things why this could happen: (1) The instruction at `dda2c` fails to do the post-increment correctly for *whatever* reason. (2) Something along the execution path corrupts the stackslot, where `r4` is saved, in such a way that it *exactly* masks it with `0xf`. Everything else on the stack looks fine, so this is sort of very unlikely to be honest. This only happens on a very specific device: The Nexus 9 is the only device that was ever shipped with the Tegra K1 T132. I suspect an issue in the binary translation layer of `armv7` instruction set to the internal micro-ops of the CPU. I tried to stress test the instruction in question (see https://github.com/lewurm/ldrinsntest), however I was not able to trigger a crash. So either, I'm missing some context in order to trigger the bug or I'm on a completely wrong track. That said, even if we could proof that it is indeed a hardware issue, the workaround is also non-trivial (it would then either need a fix in gcc or require a microcode update by the chip vendor). ``` And then we saw: golang/go#19809 (comment) Another hint that this device is buggy.
jonpryor
pushed a commit
to jonpryor/mono
that referenced
this issue
Jun 1, 2017
we got a couple of bug reports, all with the same failure: https://bugzilla.xamarin.com/show_bug.cgi?id=44907 https://bugzilla.xamarin.com/show_bug.cgi?id=46482 https://bugzilla.xamarin.com/show_bug.cgi?id=51791 The bugs are private, therefore here what I wrote: ``` Thank you Matthias, this was very helpful. >>> 09-30 12:24:50.347: A/DEBUG(6211): pid: 6191, tid: 6209, name: Thread-4 >>> com.distech.x50.ui.droid <<< >>> 09-30 12:24:50.347: A/DEBUG(6211): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8 >>> 09-30 12:24:50.347: A/DEBUG(6211): r0 d6b58728 r1 00004001 r2 00000000 r3 d6b58738 >>> 09-30 12:24:50.347: A/DEBUG(6211): r4 00000008 r5 ea97427c r6 00001f3c r7 00000015 >>> 09-30 12:24:50.347: A/DEBUG(6211): r8 00001f38 r9 00000015 sl d6b58728 fp d67fed90 >>> 09-30 12:24:50.347: A/DEBUG(6211): ip ea9742c0 sp d67fed60 lr ea7a4504 pc ea716a58 cpsr 800e0010 The crash happens on this assignment: https://github.com/mono/mono/blob/de1865dad5c0350f391fedcaa08f02f610530d3f/mono/mini/mini-generic-sharing.c#L418 Here the according disassembly: https://gist.github.com/lewurm/b1094749027c9e5ea19fdc4fac7905a7 The crash happens in the last loop iteration (`r9=i`, `r7=slot`). Looking at the machine code it just _cannot_ happen, which is confirmed by the C code as well: `*oti` successfully happens in the if check, but after returning from `alloc_oti()`, `oti` doesn't contain a valid address anymore. I suspect some weird hardware issue that fails to restore all registers properly from the stack. ``` [...] ``` Thanks again Matthias. Unfortunately, I'm out of ideas, and I can't blame anything but the hardware. The situation we see is too weird. We segfault at offset `0xdda60: str sl, [r4]`, but really we should already segfault at offset `0xdda2c: ldrge sl, [r4, mono#8]!"`. So I suspect two things why this could happen: (1) The instruction at `dda2c` fails to do the post-increment correctly for *whatever* reason. (2) Something along the execution path corrupts the stackslot, where `r4` is saved, in such a way that it *exactly* masks it with `0xf`. Everything else on the stack looks fine, so this is sort of very unlikely to be honest. This only happens on a very specific device: The Nexus 9 is the only device that was ever shipped with the Tegra K1 T132. I suspect an issue in the binary translation layer of `armv7` instruction set to the internal micro-ops of the CPU. I tried to stress test the instruction in question (see https://github.com/lewurm/ldrinsntest), however I was not able to trigger a crash. So either, I'm missing some context in order to trigger the bug or I'm on a completely wrong track. That said, even if we could proof that it is indeed a hardware issue, the workaround is also non-trivial (it would then either need a fix in gcc or require a microcode update by the chip vendor). ``` And then we saw: golang/go#19809 (comment) Another hint that this device is buggy.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
testing/quick's int64 chooser is written to return values in the range [-2⁶²,2⁶²).
That's a mistake; fixing that is #19808.
But the code should run as written, and yet on @dr2chase's Nexus 9
running a linux/arm64 toolchain built from a android/arm64 toolchain
cross-compiled from elsewhere, empirically it generates values outside
that range. (That helped find #19807.)
Using a linux/arm64 toolchain built from a linux/arm64 toolchain
cross-compiled from elsewhere on an Odroid works correctly.
Before fixing testing/quick to generate the full range, we should figure
out why the current code generates values outside the narrower range
in this configuration.
math/rand's (*Rand).Int63 says:
and (*rngSource).Int63 says:
where:
and then testing/quick's randInt64 function does:
So either the
& _MASK
or the- 1<<62
is not doing its job.It could be that the Nexus 9 ARM64 hardware is buggy.
It could be that the android/arm64 toolchain being used for
bootstrap on the Nexus 9 is buggy, causing the natively compiled
linux/arm64 toolchain to be buggy, causing the testing/quick
code to generate unexpected values. We don't know.
The text was updated successfully, but these errors were encountered: