New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: SIGSEGV in misc/cgo/testshared on s390x since CL 358674 #49386
Comments
This is a regression with a fairly clear starting point, so marking as release-blocker for Go 1.18. |
@mknyszek, I started debugging this yesterday to see if I can help in any way. I'm still trying to make sense of it but it seems to be related to the writeBarrier checks that are enabled when the GODEBUG flag I can reproduce the same failure on go1.17.3 on s390x in the misc/cgo/testshared TestGopathShlib test by running: go test -c
GODEBUG=cgocheck=2 ./testshared.test -test.run TestGopathShlib I ran (gdb) x/40ni $pc
=> 0x2aa00093f48 <os.newFile+24>: lg %r0,80(%r15)
<... skip some lines ...>
0x2aa00093fca <os.newFile+154>: lgrl %r11,0x2aa00112bf8
0x2aa00093fd0 <os.newFile+160>: llgf %r2,0(%r11)
0x2aa00093fd6 <os.newFile+166>: cijne %r2,0,0x2aa00093fec <os.newFile+188>
0x2aa00093fdc <os.newFile+172>: lg %r4,88(%r15)
0x2aa00093fe2 <os.newFile+178>: stg %r4,56(%r1)
0x2aa00093fe8 <os.newFile+184>: j 0x2aa00093ffe <os.newFile+206>
0x2aa00093fec <os.newFile+188>: aghik %r2,%r1,56
0x2aa00093ff2 <os.newFile+194>: lg %r3,88(%r15)
0x2aa00093ff8 <os.newFile+200>: brasl %r14,0x2aa00066610 <runtime.gcWriteBarrier@plt>
0x2aa00093ffe <os.newFile+206>: llgc %r4,63(%r15)
(gdb) b *0x2aa00093fd0
Breakpoint 2 at 0x2aa00093fd0
(gdb) c
Continuing.
Thread 1 "exe" hit Breakpoint 2, 0x000002aa00093fd0 in os.newFile ()
(gdb) info reg $r11
r11 0x3fffdf32220 4398012113440
(gdb) x/16xb 0x3fffdf32220
0x3fffdf32220 <runtime.writeBarrier>: 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00
0x3fffdf32228 <runtime.writeBarrier+8>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 and then continuing will generate the segfault. IIUC, the two If I do the same with (gdb) x/16xb 0x3fffdf32220
0x3fffdf32220 <runtime.writeBarrier>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x3fffdf32228 <runtime.writeBarrier+8>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 and no segfault. But if I do the same with a build from the source checked out at 961aab2 where the s390x builder started failing using (gdb) x/16xb 0x3fffdf66220
0x3fffdf66220 <runtime.writeBarrier>: 0x01 0x00 0x00 0x00 0x01 0x00 0x00 0x00
0x3fffdf66228 <runtime.writeBarrier+8>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 If I break at the beginning of the So I think there is a bug on s390x with |
@jonathan-albrecht-ibm Thanks for looking into it and for the detailed analysis! It confirms my suspicions. The I suspect the |
Thanks for the explanation @mknyszek. That helps clear things up a bit. I'll continue looking at it but will likely be slow going. Note that s390x trial vms are available at https://linuxone.cloud.marist.edu/ if anyone is interested. |
From stepping through the code, I think the PLT symbol code (not sure what to call it) might be clobbering some registers. I found the source at |
This builder seems to have gone away. @cherrymui has some ideas on quick tests to help diagnose. Moving this to OK after Beta1. |
I had a look at the builders and they look ok. Would they have been disabled after some number of failures? Glad to hear @cherrymui has some ideas on testing. Let me know if I can help with that. My guess is that register R1 and maybe others are being clobbered by the branch to
where inside
|
@jonathan-albrecht-ibm Could you test if CL https://go-review.googlesource.com/c/go/+/363698 helps? Thanks! (I cannot test it myself as the builder disappears.) |
Change https://golang.org/cl/363698 mentions this issue: |
Thanks @cherrymui, it looks good. I ran
and that also passes. |
@jonathan-albrecht-ibm thanks! |
If the call to gcWriteBarrier is via PLT, the PLT stub will clobber R1. Mark R1 clobbered. For #49386. Change-Id: I72df5bb3b8d10381fec5c567b15749aaf7d2ad70 Reviewed-on: https://go-review.googlesource.com/c/go/+/363698 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
Fixed by the CL above. |
greplogs --dashboard -md -l -e '(?ms)\Alinux-s390x.*FAIL: TestGopathShlib.*signal SIGSEGV'
2021-11-05T07:00:05-6fefb7f/linux-s390x-ibm
2021-11-05T05:30:39-b68c02e/linux-s390x-ibm
2021-11-05T05:29:10-3b5add5/linux-s390x-ibm
2021-11-05T04:20:33-0a5ca24/linux-s390x-ibm
2021-11-05T00:52:09-bd580a0/linux-s390x-ibm
2021-11-05T00:52:08-35c7234/linux-s390x-ibm
2021-11-05T00:52:06-3839b60/linux-s390x-ibm
2021-11-05T00:52:04-1c4cfd8/linux-s390x-ibm
2021-11-04T23:56:29-0e5f287/linux-s390x-ibm
2021-11-04T23:35:26-256a8fc/linux-s390x-ibm
2021-11-04T21:53:05-76c48e9/linux-s390x-ibm
2021-11-04T21:52:51-1e0c3b2/linux-s390x-ibm
2021-11-04T21:52:36-8ad0a7e/linux-s390x-ibm
2021-11-04T21:52:06-37634ee/linux-s390x-ibm
2021-11-04T21:50:21-bfd74fd/linux-s390x-ibm
2021-11-04T21:41:49-156abe5/linux-s390x-ibm
2021-11-04T21:40:51-2c32f29/linux-s390x-ibm
2021-11-04T21:33:23-71fc881/linux-s390x-ibm
2021-11-04T20:43:07-8248152/linux-s390x-ibm
2021-11-04T20:42:35-1f9dce7/linux-s390x-ibm
2021-11-04T20:31:02-978e39e/linux-s390x-ibm
2021-11-04T20:24:01-99699d1/linux-s390x-ibm
2021-11-04T20:01:22-6d1fffa/linux-s390x-ibm
2021-11-04T20:01:11-fc5e8cd/linux-s390x-ibm
2021-11-04T20:01:10-9b2dd1f/linux-s390x-ibm
2021-11-04T20:00:54-961aab2/linux-s390x-ibm
The text was updated successfully, but these errors were encountered: