Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "morestack on gsignal" on linux-arm64-packet builder #35235

Closed
bcmills opened this issue Oct 29, 2019 · 11 comments
Closed

runtime: "morestack on gsignal" on linux-arm64-packet builder #35235

bcmills opened this issue Oct 29, 2019 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Oct 29, 2019

A mysterious error on the linux-arm64-packet builder https://build.golang.org/log/7b87e729fd62b071ed8bd6b8c709bd41a7d13e23:

fatal: morestack on gsignal
FAIL	os	0.111s

I've only seen the one failure so far, so I'm not sure whether it's related to the various changes in 1.14.

CC @aclements @ianlancetaylor @danscales

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 29, 2019
@bcmills bcmills added this to the Go1.14 milestone Oct 29, 2019
@ianlancetaylor
Copy link
Contributor

Well that is an unhelpful error message.

One way to tackle this might be to add a go:nosplitrec annotation. Even a version of that that reported any callees that were not go:nosplit would help.

@cherrymui
Copy link
Member

The signal stack has large stack bounds, and it should be ok to call split-stack functions, as it will pass the stack bound check and not call morestack.

My experience is that this is probably some kind of mismatch between the G and the SP. The stack bound check fails because we're actually not on that stack.

@danscales
Copy link
Contributor

I just tried reproducing by running go test -test.count=1 os repeatedly (currently about 100 times) on the linux-arm64-packet builder, and I wasn't able to reproduce. I did get the same SEGV twice:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6ed80]

runtime stack:
runtime: unexpected return pc for runtime.mcommoninit called from 0x40000259e0
stack: frame={sp:0x40002e9df0, fp:0x40002e9e30} stack=[0x40002e8000,0x40002ea000)
00000040002e9cf0:  00000040002e9d50  000000000003d914 <runtime.throw+84> 
00000040002e9d00:  0000004000130180  00000040002e9d48 
00000040002e9d10:  000000000003d914 <runtime.throw+84>  00000040002e9d28 
00000040002e9d20:  000000000003d8fc <runtime.throw+60>  000000000006a410 <runtime.fatalthrow.func1+0> 
00000040002e9d30:  0000004000130180  000000000003d914 <runtime.throw+84> 
00000040002e9d40:  00000040002e9d50  00000040002e9d78 
00000040002e9d50:  0000000000053df8 <runtime.sigpanic+1096>  00000040002e9d60 
00000040002e9d60:  000000000006a390 <runtime.throw.func1+0>  00000000001a7c01 
00000040002e9d70:  000000000000002a  00000040002e9dd8 
00000040002e9d80:  000000000006ed80 <runtime.nanotime1+96>  00000000001a7c01 
00000040002e9d90:  000000000000002a  0000000000000000 
00000040002e9da0:  0000000000000001  00000040002e9d00 
00000040002e9db0:  0000000000040cb8 <runtime.mcommoninit+152>  00000040002e9e08 
00000040002e9dc0:  0000000000040c5c <runtime.mcommoninit+60>  00000040002e9e18 
00000040002e9dd0:  0000000000044e10 <runtime.findrunnable+2144>  00000040002e9e08 
00000040002e9de0:  0000000000040cb8 <runtime.mcommoninit+152>  0000000000000000 
00000040002e9df0: <00000040000259e0  0000000000000000 
00000040002e9e00:  0000000000000001  00000040002e9e48 
00000040002e9e10:  0000000000042c88 <runtime.allocm+328>  00000000002dc2d8 
00000040002e9e20:  0000000000042c60 <runtime.allocm+288>  0000000000000380 
00000040002e9e30: >000000000019c0c0  0000000000000001 
00000040002e9e40:  0000004000308000  00000040002e9e98 
00000040002e9e50:  0000000000043530 <runtime.newm+48>  0000004000308000 
00000040002e9e60:  0000004000308000  0000000000000000 
00000040002e9e70:  0000000000000000  00000040002e9ec8 
00000040002e9e80:  0000004000308000  0000000000000000 
00000040002e9e90:  0000004000130180  00000040002e9ec8 
00000040002e9ea0:  0000000000043b1c <runtime.startm+284>  0000004000020a00 
00000040002e9eb0:  00000000001aaf20  0000000000000000 
00000040002e9ec0:  0000000000000000  00000040002e9ef8 
00000040002e9ed0:  0000000000045698 <runtime.resetspinning+184>  00000000001aaf20 
00000040002e9ee0:  0000004000020a00  0000004000020a00 
00000040002e9ef0:  0000000000000000  00000040002e9f18 
00000040002e9f00:  0000000000045b60 <runtime.schedule+704>  0000000000000000 
00000040002e9f10:  0000004000025401  00000040002e9f98 
00000040002e9f20:  00000000000422a0 <runtime.mstart1+128>  0000004000001680 
runtime.throw(0x1a7c01, 0x2a)
        /workdir/go/src/runtime/panic.go:1045 +0x54
runtime.sigpanic()
        /workdir/go/src/runtime/signal_unix.go:578 +0x448
runtime.nanotime1(0x0)
        /workdir/go/src/runtime/sys_linux_arm64.s:300 +0x60
runtime: unexpected return pc for runtime.mcommoninit called from 0x40000259e0
...

Unrelated, I assume? I also ran the same test on linux-amd64 60 times, no problems at all, as expected.

@ianlancetaylor
Copy link
Contributor

Ah, right, thanks.

@ianlancetaylor
Copy link
Contributor

@danscales That looks like #34391. Are you synced past https://golang.org/cl/202759 (758eb02)?

@cherrymui
Copy link
Member

Hmmm. The stack trace looks like that this is failing from code that added in CL 202759. It looks like g.m.gsignal is not nil but its stack is 0? I'll look into it.

@danscales
Copy link
Contributor

Unfortunately, I am synced past 758eb02 (i'm at 59a6847, which is at least Oct 26th). But on the other hand, #34391 looks like it is a hang, whereas the thing that I'm running into is a SEGV.

@cherrymui
Copy link
Member

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 60a15c1e9c..2c50a08b1d 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -1190,6 +1190,7 @@ func mexit(osStack bool) {
 	// Free the gsignal stack.
 	if m.gsignal != nil {
 		stackfree(m.gsignal.stack)
+		m.gsignal = nil
 	}
 
 	// Remove m from allm.

Does this patch fix it? I need to leave for a while now. I'll send a CL when I come back, if this works. Thanks!

@danscales
Copy link
Contributor

@cherrymui Yes, that changed fixed the SEGV. Just ran 100 times on linux-arm64-packet with that change with no SEGV and no repro of the "morestack on signal" issue.

@cherrymui
Copy link
Member

Thanks, @danscales ! Sent CL http://golang.org/cl/204158 .

@gopherbot
Copy link

Change https://golang.org/cl/204158 mentions this issue: runtime: clear m.gsignal when the M exits

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants