Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "signal: segmentation fault (core dumped)" on several builders #35248

Closed
eliasnaur opened this issue Oct 30, 2019 · 12 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@eliasnaur eliasnaur changed the title runtime: "signal: segmentation fault (core dumped)" on several builder runtime: "signal: segmentation fault (core dumped)" on several builders Oct 30, 2019
@mvdan mvdan added this to the Go1.14 milestone Oct 30, 2019
@mvdan
Copy link
Member

mvdan commented Oct 30, 2019

I was able to reproduce on the first try on my laptop:

$ go version
go version devel +f4e32aeed1 Wed Oct 30 08:17:29 2019 +0000 linux/amd64
$ GOMAXPROCS=2 go test runtime -cpu=1,2,4 -quick
signal: segmentation fault (core dumped)
FAIL    runtime 36.302s
FAIL

I'll continue digging, and I'll post again when I find something useful.

Edit: Second and third attempts also resulted in a segfault, happening at 190s and 120s respectively.

@thanm
Copy link
Contributor

thanm commented Oct 30, 2019

I've done a little experimenting with this as well. My suspicion is that the problem is with the runtime's "TestSignalM" test, which is relatively recently added -- when I do repeated runs in parallel of

GOMAXPROCS=2 go test -test.v runtime -cpu=1,2,4 -quick

that's the last testpoint mentioned before the crash. On the other hand, when I run

go test -i -o runtime.test
stress ./runtime.test -test.run=TestSignalM -test.cpu=10

I don't see any failures, so it's possible that my theory isn't valid.

@bcmills
Copy link
Contributor

bcmills commented Oct 30, 2019

@danscales
Copy link
Contributor

I tried, but I haven't been able to reproduce at all on my workstation (using the commands above).

@mvdan Did you actually get a core that might have a stack trace? (I see '(core dumped)' above).

@mvdan
Copy link
Member

mvdan commented Oct 30, 2019

Yes, systemd has multiple of these core dumps, but I don't know what to do with them. If anyone has the magic coredumpctl command I can run to get a stack trace, I'm happy to run it. coredump info just gives a stack trace with no source information:

Stack trace of thread 50334:
#0  0x000000000046898c n/a (/tmp/go-build494200099/b001/runtime.test)
#1  0x0000000000466311 n/a (/tmp/go-build494200099/b001/runtime.test)
#2  0x00000000004482f0 n/a (/tmp/go-build494200099/b001/runtime.test)
#3  0x0000000000447923 n/a (/tmp/go-build494200099/b001/runtime.test)
#4  0x000000000046a673 n/a (/tmp/go-build494200099/b001/runtime.test)
#5  0x00007fba8808f930 __restore_rt (libpthread.so.0)
#6  0x000000000046a8d3 n/a (/tmp/go-build494200099/b001/runtime.test)
#7  0x000000000040d17e n/a (/tmp/go-build494200099/b001/runtime.test)
#8  0x00000000008221a0 n/a (/tmp/go-build494200099/b001/runtime.test)

@aclements
Copy link
Member

I haven't been able to reproduce any TestSignalM failures locally either, but the theory seems fairly likely. If you get a SIGSEGV while in a signal handler, this is exactly what you would see.

@mvdan, you should be able to build that exactly runtime.test binary with go test -c runtime and then resolve those PCs yourself by pasting them into addr2line -Cfipe runtime.test.

@bcmills
Copy link
Contributor

bcmills commented Oct 31, 2019

Doesn't seem to depend on the architecture:

freebsd-386-11_2:
https://build.golang.org/log/6d35287bc8891f3c66d906819bdc78d357de82f2

freebsd-arm64-dmgk:
https://build.golang.org/log/4028750640de86113d7081d6579ca68506561eec

@cuonglm
Copy link
Member

cuonglm commented Oct 31, 2019

@cuonglm
Copy link
Member

cuonglm commented Oct 31, 2019

@bcmills
Copy link
Contributor

bcmills commented Nov 1, 2019

@cuonglm, I suspect that one is related to #35272.

@aclements
Copy link
Member

Reproduced. The problem is the write barrier in the testSigusr1 callback, which runs on the signal stack and thus may not have a P. If GC is active, and the signal arrives without a P, it crashes. The fix is easy, since all we really care about is the M's ID.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  runtime.gcWriteBarrier ()
    at /usr/local/google/home/austin/go.dev/src/runtime/asm_amd64.s:1407
1407            MOVQ    (p_wbBuf+wbBuf_next)(R13), R14
[Current thread is 1 (Thread 0x7fd9a90c6700 (LWP 193547))]
Loading Go Runtime support.
(gdb) bt
#0  runtime.gcWriteBarrier ()
    at /usr/local/google/home/austin/go.dev/src/runtime/asm_amd64.s:1407
#1  0x0000000000466e31 in runtime.WaitForSigusr1.func1 (gp=0xc0015fb680, 
    ~r1=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/export_unix_test.go:44
#2  0x0000000000448e55 in runtime.sighandler (sig=10, info=0xc0000f7bf0, 
    ctxt=0xc0000f7ac0, gp=0xc0015fb680)
    at /usr/local/google/home/austin/go.dev/src/runtime/signal_unix.go:508
#3  0x0000000000448453 in runtime.sigtrampgo (sig=10, info=0xc0000f7bf0, 
    ctx=0xc0000f7ac0)
    at /usr/local/google/home/austin/go.dev/src/runtime/signal_unix.go:421
#4  0x000000000046b343 in runtime.sigtramp ()
    at /usr/local/google/home/austin/go.dev/src/runtime/sys_linux_amd64.s:384
#5  <signal handler called>
#6  runtime.futex ()
    at /usr/local/google/home/austin/go.dev/src/runtime/sys_linux_amd64.s:563
#7  0x000000000042fc24 in runtime.futexsleep (
    addr=0x828200 <runtime.waitForSigusr1>, val=0, ns=1000000000)
    at /usr/local/google/home/austin/go.dev/src/runtime/os_linux.go:50
#8  0x000000000040d0ae in runtime.notetsleep_internal (
    n=0x828200 <runtime.waitForSigusr1>, ns=1000000000, ~r2=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/lock_futex.go:193
#9  0x000000000040d20c in runtime.notetsleepg (
    n=0x828200 <runtime.waitForSigusr1>, ns=1000000000, ~r2=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/lock_futex.go:228
#10 0x00000000004621e9 in runtime.WaitForSigusr1 (
    ready={void (runtime.m *)} 0xc001603780, timeoutNS=1000000000, 
    ~r2=<optimized out>, ~r3=<optimized out>)
    at /usr/local/google/home/austin/go.dev/src/runtime/export_unix_test.go:47
#11 0x00000000005c983b in runtime_test.TestSignalM.func1 (ready=0xc0003188a0, 
    &want=0xc000016a68, &got=0xc000016a70, &wg=0xc000016a80)
    at /usr/local/google/home/austin/go.dev/src/runtime/crash_unix_test.go:321
#12 0x0000000000469471 in runtime.goexit ()
    at /usr/local/google/home/austin/go.dev/src/runtime/asm_amd64.s:1375
#13 0x000000c0003188a0 in ?? ()
#14 0x000000c000016a68 in ?? ()
#15 0x000000c000016a70 in ?? ()
#16 0x000000c000016a80 in ?? ()
#17 0x0000000000000000 in ?? ()
(gdb) print $r13
$1 = 0

@gopherbot
Copy link

Change https://golang.org/cl/204620 mentions this issue: runtime: remove write barrier in WaitForSigusr1

@FiloSottile FiloSottile added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 4, 2019
@golang golang locked and limited conversation to collaborators Nov 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

9 participants