Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/signal: "morestack on gsignal" on NetBSD #19652

Closed
bradfitz opened this issue Mar 22, 2017 · 14 comments
Closed

os/signal: "morestack on gsignal" on NetBSD #19652

bradfitz opened this issue Mar 22, 2017 · 14 comments

Comments

@bradfitz
Copy link
Contributor

While debugging NetBSD failures, I got:

$ gomote run user-bradfitz-netbsd-amd64-70-0 go/bin/go test -short -v os/signal  
=== RUN   TestSignal  
--- PASS: TestSignal (0.00s)  
        signal_test.go:40: sighup...  
        signal_test.go:49: sigwinch...  
        signal_test.go:56: sighup...  
        signal_test.go:59: sighup...  
=== RUN   TestStress  
fatal: morestack on gsignal  
SIGTRAP: trace trap  
PC=0x453ff8 m=4 sigcode=1  
  
goroutine 11 [running]:  
runtime.morestack()  
        /tmp/workdir/go/src/runtime/asm_amd64.s:381 +0x38 fp=0xc420023f98 sp=0xc420023f90  
created by os/signal.TestStress  
        /tmp/workdir/go/src/os/signal/signal_test.go:89 +0x10f  
  
goroutine 1 [chan receive]:  
testing.(*T).Run(0xc420068410, 0x541530, 0xa, 0x549100, 0x468f01)  
        /tmp/workdir/go/src/testing/testing.go:702 +0x2f3  
testing.runTests.func1(0xc420068410)  
        /tmp/workdir/go/src/testing/testing.go:888 +0x67  
testing.tRunner(0xc420068410, 0xc420035e10)  
        /tmp/workdir/go/src/testing/testing.go:659 +0x98  
testing.runTests(0xc42000a100, 0x5f7a40, 0x7, 0x7, 0x5e4c00)  
        /tmp/workdir/go/src/testing/testing.go:886 +0x2b5  
testing.(*M).Run(0xc420035f20, 0xc420035f78)  
        /tmp/workdir/go/src/testing/testing.go:828 +0xf7  
main.main()  
        os/signal/_test/_testmain.go:58 +0xdb  
  
goroutine 5 [syscall]:  
os/signal.signal_recv(0x5e4d80)  
        /tmp/workdir/go/src/runtime/sigqueue.go:116 +0xa8  
os/signal.loop()  
        /tmp/workdir/go/src/os/signal/signal_unix.go:22 +0x22  
created by os/signal.init.1  
        /tmp/workdir/go/src/os/signal/signal_unix.go:28 +0x41  
  
goroutine 9 [sleep]:  
time.Sleep(0x5f5e100)  
        /tmp/workdir/go/src/runtime/time.go:64 +0x12d  
os/signal.TestStress(0xc4200685b0)  
        /tmp/workdir/go/src/os/signal/signal_test.go:102 +0x11d  
testing.tRunner(0xc4200685b0, 0x549100)  
        /tmp/workdir/go/src/testing/testing.go:659 +0x98  
created by testing.(*T).Run  
        /tmp/workdir/go/src/testing/testing.go:701 +0x2d5  
  
goroutine 7 [select, locked to thread]:  
runtime.gopark(0x5494c0, 0x0, 0x540b75, 0x6, 0x18, 0x1)  
        /tmp/workdir/go/src/runtime/proc.go:267 +0x12c  
runtime.selectgo(0xc42003bf50, 0xc42004a420)  
        /tmp/workdir/go/src/runtime/select.go:395 +0xc68  
runtime.ensureSigM.func1()  
        /tmp/workdir/go/src/runtime/signal_unix.go:492 +0x1f4  
runtime.goexit()  
        /tmp/workdir/go/src/runtime/asm_amd64.s:2152 +0x1  
  
goroutine 10 [select]:  
os/signal.TestStress.func1(0xc42004a960, 0xc42004a9c0)  
        /tmp/workdir/go/src/os/signal/signal_test.go:81 +0x16a  
created by os/signal.TestStress  
        /tmp/workdir/go/src/os/signal/signal_test.go:75 +0xe3  
  
rax    0x1c  
rbx    0xc420025000  
rcx    0x457935  
rdx    0x0  
rdi    0x2  
rsi    0x544f60  
rbp    0xc420023fc0  
rsp    0xc420023f90  
r8     0x0  
r9     0x0  
r10    0x0  
r11    0x206  
r12    0x0  
r13    0x0  
r14    0x11  
r15    0x0  
rip    0x453ff8  
rflags 0x206  
cs     0x47  
fs     0x0  
gs     0x0  
exit status 2  
FAIL    os/signal       0.102s  
Error running run: exit status 1  

Sometimes it passes, though.

(I meant to run a different package, but ended up testing os/signal instead.)

/cc @bcmills @ianlancetaylor

@bradfitz bradfitz added this to the Go1.9Maybe milestone Mar 22, 2017
@bradfitz
Copy link
Contributor Author

And also on the build.golang.org dashboard:
https://build.golang.org/log/259809ea5dd7341d71eea8f4e7d3c5f31c284321

And:
https://build.golang.org/log/da3b3b8e2d46c3f2bb98f7e3cab3d3be7790debf -- "fatal: morestack on g0" first

@bcmills
Copy link
Contributor

bcmills commented Mar 22, 2017

I don't have a lot of context on how morestack is intended to work. Does the check that invokes it just check the stack size, or does it also check for GC preemption?

If it only checks for stack size, then we're looking for a path in the signal handler that fails to restore g properly before returning from the handler, or perhaps a path that unblocks signals while still on the signal stack. Each signal we receive on the signal stack consumes some of the available space, so if we've received many reentrant signals (which shouldn't be possible!) that could run us out of stack space.

If it also checks for GC preemption, then I suspect we just need more //go:nosplit annotations on the functions (transitively) called by sigtrampgo: sigtrampgo should not be preemptible.

@ianlancetaylor
Copy link
Contributor

GC (and non-GC) preemption is implemented by storing an impossible value in the stack guard field, so morestack is invoked for both stack space and for preemption.

I think the first thing to do to try to figure this out is to try to recreate the problem with GOTRACEBACK=system. I haven't had time to try that.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 6, 2017

I updated the NetBSD builders to NetBSD 7.1.

Maybe something was fixed. We'll see.

/cc @bsiegert

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 6, 2017

Nope. Still fails:
https://build.golang.org/log/44106a642966f58c3e72b3e65afb7d4e67cc4fe9

I'm going to disable the NetBSD builders (but keep them available via gomote) until builds work. They're eating into resources too much right now.

@zoulasc
Copy link
Contributor

zoulasc commented Apr 19, 2017

$ ./signal.test --test.parallel 1 -test.v
=== RUN TestSignal
--- PASS: TestSignal (0.00s)
signal_test.go:40: sighup...
signal_test.go:49: sigwinch...
signal_test.go:56: sighup...
signal_test.go:59: sighup...
=== RUN TestStress
fatal: morestack on g0
fatal: morestack on gsignal
panic during panic
[signal SIGSEGV: segmentation violation code=0x2 addr=0xffff80011f8cd000 pc=0x42ac0b]

goroutine 0 [idle]:
runtime.startpanic_m()
/usr/pkg/go/src/runtime/panic.go:653 +0x19a fp=0xc4200d5aa0 sp=0xc4200d5a78
runtime.systemstack(0x54e2b0)
/usr/pkg/go/src/runtime/asm_amd64.s:343 +0xab fp=0xc4200d5aa8 sp=0xc4200d5aa0
runtime.startpanic()
/usr/pkg/go/src/runtime/panic.go:569 +0x1e fp=0xc4200d5ac0 sp=0xc4200d5aa8
runtime.sighandler(0xc400000005, 0xc4200d5c60, 0xc4200d5ce0, 0xc4200cc000)
/usr/pkg/go/src/runtime/signal_sighandler.go:81 +0x594 fp=0xc4200d5b40 sp=0xc4200d5ac0
runtime.sigtrampgo(0x5, 0xc4200d5c60, 0xc4200d5ce0)
/usr/pkg/go/src/runtime/signal_unix.go:257 +0x1e3 fp=0xc4200d5c08 sp=0xc4200d5b40
runtime.sigtramp(0x100000005, 0x0, 0x0, 0x1, 0x0, 0xfffffe8100d31a40, 0xffffffff8023cb93, 0xffffffff80c00710, 0x0, 0xfffffe8100d31aa0, ...)
/usr/pkg/go/src/runtime/sys_netbsd_amd64.s:267 +0x3f fp=0xc4200d5c60 sp=0xc4200d5c08
runtime.sigreturn_tramp(0x0, 0x0, 0x1, 0x0, 0xfffffe8100d31a40, 0xffffffff8023cb93, 0xffffffff80c00710, 0x0, 0xfffffe8100d31aa0, 0xffffffff8052b170, ...)
/usr/pkg/go/src/runtime/sys_netbsd_amd64.s:220 fp=0xc4200d5c68 sp=0xc4200d5c60

goroutine 19 [syscall]:
syscall.Syscall(0x25, 0x27e8, 0x1e, 0x0, 0x0, 0x0, 0x0)
/usr/pkg/go/src/syscall/asm_unix_amd64.s:19 +0x5 fp=0xc420027748 sp=0xc420027740
syscall.Kill(0x27e8, 0x1e, 0xc4200c81e0, 0x0)
/usr/pkg/go/src/syscall/zsyscall_netbsd_amd64.go:658 +0x4b fp=0xc420027798 sp=0xc420027748
os/signal.TestStress.func2(0xc4200c81e0, 0xc4200c8240)
/usr/pkg/go/src/os/signal/signal_test.go:96 +0x68 fp=0xc4200277d0 sp=0xc420027798
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc4200277d8 sp=0xc4200277d0
created by os/signal.TestStress
/usr/pkg/go/src/os/signal/signal_test.go:101 +0x10f

goroutine 1 [chan receive]:
runtime.gopark(0x54e240, 0xc4200c81d8, 0x5462f7, 0xc, 0xc4200f2017, 0x3)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc42003fb20 sp=0xc42003faf0
runtime.goparkunlock(0xc4200c81d8, 0x5462f7, 0xc, 0x17, 0x3)
/usr/pkg/go/src/runtime/proc.go:277 +0x5e fp=0xc42003fb60 sp=0xc42003fb20
runtime.chanrecv(0x5137c0, 0xc4200c8180, 0x0, 0xc42003fc01, 0x4b375a)
/usr/pkg/go/src/runtime/chan.go:513 +0x371 fp=0xc42003fc00 sp=0xc42003fb60
runtime.chanrecv1(0x5137c0, 0xc4200c8180, 0x0)
/usr/pkg/go/src/runtime/chan.go:395 +0x35 fp=0xc42003fc38 sp=0xc42003fc00
testing.(*T).Run(0xc4200704e0, 0x545be2, 0xa, 0x54deb0, 0xc42003fd01)
/usr/pkg/go/src/testing/testing.go:698 +0x2f4 fp=0xc42003fce0 sp=0xc42003fc38
testing.runTests.func1(0xc4200704e0)
/usr/pkg/go/src/testing/testing.go:882 +0x67 fp=0xc42003fd30 sp=0xc42003fce0
testing.tRunner(0xc4200704e0, 0xc42003fde0)
/usr/pkg/go/src/testing/testing.go:657 +0x96 fp=0xc42003fd58 sp=0xc42003fd30
testing.runTests(0xc42000ab80, 0x5e2c40, 0x7, 0x7, 0x7f7ff7fbf000)
/usr/pkg/go/src/testing/testing.go:888 +0x2c1 fp=0xc42003fe10 sp=0xc42003fd58
testing.(*M).Run(0xc42003ff20, 0xc42003ff20)
/usr/pkg/go/src/testing/testing.go:822 +0xfc fp=0xc42003ff00 sp=0xc42003fe10
main.main()
os/signal/_test/_testmain.go:56 +0xf7 fp=0xc42003ff88 sp=0xc42003ff00
runtime.main()
/usr/pkg/go/src/runtime/proc.go:185 +0x20a fp=0xc42003ffe0 sp=0xc42003ff88
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42003ffe8 sp=0xc42003ffe0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x54e240, 0x5e4ab0, 0x546ba8, 0xf, 0x54e114, 0x1)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc42002a768 sp=0xc42002a738
runtime.goparkunlock(0x5e4ab0, 0x546ba8, 0xf, 0xc420000314, 0x1)
/usr/pkg/go/src/runtime/proc.go:277 +0x5e fp=0xc42002a7a8 sp=0xc42002a768
runtime.forcegchelper()
/usr/pkg/go/src/runtime/proc.go:226 +0x9e fp=0xc42002a7e0 sp=0xc42002a7a8
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42002a7e8 sp=0xc42002a7e0
created by runtime.init.4
/usr/pkg/go/src/runtime/proc.go:215 +0x35

goroutine 3 [GC sweep wait]:
runtime.gopark(0x54e240, 0x5e4c60, 0x546551, 0xd, 0x41ca14, 0x1)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc42002af58 sp=0xc42002af28
runtime.goparkunlock(0x5e4c60, 0x546551, 0xd, 0x14, 0x1)
/usr/pkg/go/src/runtime/proc.go:277 +0x5e fp=0xc42002af98 sp=0xc42002af58
runtime.bgsweep(0xc420016070)
/usr/pkg/go/src/runtime/mgcsweep.go:56 +0xb6 fp=0xc42002afd8 sp=0xc42002af98
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42002afe0 sp=0xc42002afd8
created by runtime.gcenable
/usr/pkg/go/src/runtime/mgc.go:212 +0x61
goroutine 4 [finalizer wait]:
runtime.gopark(0x54e240, 0x5ff528, 0x5468cb, 0xe, 0x14, 0x1)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc42002b718 sp=0xc42002b6e8
runtime.goparkunlock(0x5ff528, 0x5468cb, 0xe, 0x14, 0x1)
/usr/pkg/go/src/runtime/proc.go:277 +0x5e fp=0xc42002b758 sp=0xc42002b718
runtime.runfinq()
/usr/pkg/go/src/runtime/mfinal.go:161 +0xb2 fp=0xc42002b7e0 sp=0xc42002b758
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42002b7e8 sp=0xc42002b7e0
created by runtime.createfing
/usr/pkg/go/src/runtime/mfinal.go:142 +0x62

goroutine 5 [syscall]:
runtime.notetsleepg(0x5ff740, 0xffffffffffffffff, 0xffffffffffffffff)
/usr/pkg/go/src/runtime/lock_sema.go:257 +0x42 fp=0xc42002bf80 sp=0xc42002bf40
os/signal.signal_recv(0x5cfe60)
/usr/pkg/go/src/runtime/sigqueue.go:116 +0x104 fp=0xc42002bfa8 sp=0xc42002bf80
os/signal.loop()
/usr/pkg/go/src/os/signal/signal_unix.go:22 +0x22 fp=0xc42002bfe0 sp=0xc42002bfa8
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc42002bfe8 sp=0xc42002bfe0
created by os/signal.init.1
/usr/pkg/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 17 [sleep]:
runtime.gopark(0x54e240, 0x5e4d00, 0x544f38, 0x5, 0x1821d913, 0x2)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc4200266e0 sp=0xc4200266b0
runtime.goparkunlock(0x5e4d00, 0x544f38, 0x5, 0xc420026713, 0x2)
/usr/pkg/go/src/runtime/proc.go:277 +0x5e fp=0xc420026720 sp=0xc4200266e0
time.Sleep(0xb2d05e00)
/usr/pkg/go/src/runtime/time.go:59 +0xf9 fp=0xc420026760 sp=0xc420026720
os/signal.TestStress(0xc4200f6000)
/usr/pkg/go/src/os/signal/signal_test.go:102 +0x11d fp=0xc4200267a8 sp=0xc420026760
testing.tRunner(0xc4200f6000, 0x54deb0)
/usr/pkg/go/src/testing/testing.go:657 +0x96 fp=0xc4200267d0 sp=0xc4200267a8
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc4200267d8 sp=0xc4200267d0
created by testing.(*T).Run
/usr/pkg/go/src/testing/testing.go:697 +0x2ca

goroutine 7 [select, locked to thread]:
runtime.gopark(0x54e278, 0x0, 0x54510b, 0x6, 0x18, 0x2)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc420045c70 sp=0xc420045c40
runtime.selectgoImpl(0xc420045f50, 0x0, 0x18)
/usr/pkg/go/src/runtime/select.go:423 +0x1364 fp=0xc420045ee8 sp=0xc420045c70
runtime.selectgo(0xc420045f50)
/usr/pkg/go/src/runtime/select.go:238 +0x1c fp=0xc420045f10 sp=0xc420045ee8
runtime.ensureSigM.func1()
/usr/pkg/go/src/runtime/signal_unix.go:434 +0x2ee fp=0xc420045fe0 sp=0xc420045f10
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420045fe8 sp=0xc420045fe0
created by runtime.ensureSigM
/usr/pkg/go/src/runtime/signal_unix.go:447 +0xda

goroutine 18 [select]:
runtime.gopark(0x54e278, 0x0, 0x54510b, 0x6, 0x18, 0x2)
/usr/pkg/go/src/runtime/proc.go:271 +0x13a fp=0xc420046c58 sp=0xc420046c28
runtime.selectgoImpl(0xc420046f40, 0x0, 0x18)
/usr/pkg/go/src/runtime/select.go:423 +0x1364 fp=0xc420046ed0 sp=0xc420046c58
runtime.selectgo(0xc420046f40)
/usr/pkg/go/src/runtime/select.go:238 +0x1c fp=0xc420046ef8 sp=0xc420046ed0
os/signal.TestStress.func1(0xc4200c81e0, 0xc4200c8240)
/usr/pkg/go/src/os/signal/signal_test.go:81 +0x1b5 fp=0xc420046fd0 sp=0xc420046ef8
runtime.goexit()
/usr/pkg/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420046fd8 sp=0xc420046fd0
created by os/signal.TestStress
/usr/pkg/go/src/os/signal/signal_test.go:88 +0xe3

@zoulasc
Copy link
Contributor

zoulasc commented Apr 21, 2017

I fixed it in the pkgsrc version of go. It was a NetBSD kernel bug.

@bradfitz
Copy link
Contributor Author

@zoulasc, those statements are not consistent.

If it was a NetBSD kernel bug, you should've fixed it in NetBSD.

If it was a Go bug, then we should fix it in Go.

The pkgsrc version of Go should have zero NetBSD-specific patches or workarounds ideally.

I assume you mean you worked around the kernel bug in the pkgsrc patches? Where? Got a URL?

@zoulasc
Copy link
Contributor

zoulasc commented Apr 21, 2017 via email

@bradfitz
Copy link
Contributor Author

@ianlancetaylor, you might be interested in the patch and mailing list discussion.

@ianlancetaylor
Copy link
Contributor

I definitely do not understand the argument in tech-kern that the new thread should inherit the signal stack. It can not possibly work to have two threads using the same signal stack. It seems pointless to force any thread with a signal stack creating a new thread to have to do a little dance to allocate a new signal stack for itself. Not only pointless, but unlike every other Unix system.

@zoulasc
Copy link
Contributor

zoulasc commented Apr 21, 2017

I agree with you @ianlancetaylor -- sharing the signal stack can't possibly work. My proposed patch had the necessary functionality in the kernel so that the c library _lwp_makecontext could automatically DTRT for the signal stack, but I ended up committing a patch in the kernel that will just give the new thread an SS_DISABLE stack_t. That does not prevent future improvements :-)

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jun 28, 2017
@gopherbot
Copy link

CL https://golang.org/cl/47036 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 28, 2017
Updates #20836
Updates #19339
Updates #19652
Updates #20835
Updates #16511
Updates #10166
Updates #8574

Change-Id: If9a7f560489f1a8d628dafab227925bd8989326e
Reviewed-on: https://go-review.googlesource.com/47036
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bradfitz
Copy link
Contributor Author

Fixed by requiring NetBSD 8+ for Go 1.10.

Documentation bug is #22911

@golang golang locked and limited conversation to collaborators Nov 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants