Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "morestack on g0" in TestSegv on darwin-amd64 builders #39457

Open
bcmills opened this issue Jun 8, 2020 · 13 comments
Open

runtime: "morestack on g0" in TestSegv on darwin-amd64 builders #39457

bcmills opened this issue Jun 8, 2020 · 13 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jun 8, 2020

2020-06-08T17:59:37-2603d9a/darwin-amd64-race

--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/Segv (0.02s)
        crash_test.go:105: /var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/tmp/go-build172134279/testprogcgo.exe SegvInCgo exit status: exit status 2
        crash_cgo_test.go:569: fatal: morestack on g0
            SIGTRAP: trace trap
            PC=0x406b702 m=0 sigcode=1
            
            goroutine 0 [idle]:
            runtime.abort()
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/asm_amd64.s:860 +0x2
            runtime.morestack()
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/asm_amd64.s:416 +0x25
            
            goroutine 19 [syscall]:
            runtime.cgocall(0x4123600, 0xc00003a7c0, 0x4123600)
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/cgocall.go:133 +0x5b fp=0xc00003a790 sp=0xc00003a758 pc=0x400503b
            main._Cfunc_nop()
            	_cgo_gotypes.go:329 +0x45 fp=0xc00003a7c0 sp=0xc00003a790 pc=0x411a2a5
            main.SegvInCgo.func1(0xc00008e120)
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/segv.go:46 +0x30 fp=0xc00003a7d8 sp=0xc00003a7c0 pc=0x41224b0
            runtime.goexit()
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00003a7e0 sp=0xc00003a7d8 pc=0x406b8e1
            created by main.SegvInCgo
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/segv.go:43 +0x5c
            
            goroutine 1 [sleep]:
            time.Sleep(0x3b9aca00)
            	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/time.go:188 +0xbf
            main.SegvInCgo()
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/segv.go:55 +0x9c
            main.main()
            	/private/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/testdata/testprogcgo/main.go:34 +0x1da
            
            rax    0x17
            rbx    0xc00003a710
            rcx    0x4265d40
            rdx    0x0
            rdi    0x2
            rsi    0xc00003a6b0
            rbp    0xc00003a780
            rsp    0xc00003a738
            r8     0x4265d40
            r9     0x0
            r10    0xc00003a710
            r11    0x202
            r12    0xf1
            r13    0x0
            r14    0x418de44
            r15    0x0
            rip    0x406b702
            rflags 0x202
            cs     0x2b
            fs     0x0
            gs     0x0
            
        crash_cgo_test.go:571: expected crash from signal
FAIL
FAIL	runtime	69.144s

CC @aclements @ianlancetaylor @cherrymui

@bcmills bcmills added OS-Darwin NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 8, 2020
@bcmills bcmills added this to the Backlog milestone Jun 8, 2020
@golang golang deleted a comment Jun 10, 2020
@cherrymui
Copy link
Member

On darwin/amd64, to work around a kernel issue we rewrite SI_USER SIGEGV to kernel-generated: https://tip.golang.org/src/runtime/signal_darwin_amd64.go#L72 . So, in this case, an actual user-sent SIGSEGV will be treated as kernel-generated signal, and cause it to inject a sigpanic. If the signal lands at a bad time, e.g. we're right in the middle of a stack switch, where the g and the stack don't match, bad things will happen.

I'm not sure what the best solution is. A few possibilities:

  • do nothing (maybe skip/relax the test). It isn't too bad in that it will crash the program anyway (sigpanic will throw for this particular bad address), unless PanicOnFault is set.
  • remove the workaround (at least the sigcode part, we could still change the faulting address). A malformed address will be treated as user-sent SIGSEGV, which will crash the program now. PanicOnFault is still a problem.

Not sure what to do with PanicOnFault. Due to the kernel issue, it seems we cannot distinguish malformed address vs. user-sent SIGSEGV. We have to make both recoverable or non-recoverable...

(The workaround was added for OS X 10.9. The kernel issue seems still there for macOS 10.15...)

@cherrymui
Copy link
Member

Another possibility: when switching from user stack to system stack (e.g. in systemstack, asmcgocall, etc.), we always do (1) set user g's g.throwsplit to true, (2) change SP, (3) change the g register to g0. And do it in the opposite order when switching back. This might solve the immediate SIGSEGV-landing-in-stack-switch problem. Not sure if there is any other problem. Seems pretty complicated, though.

@bcmills bcmills changed the title runtime: "morestack on g0" in TestSegv on darwin-amd64-race builder runtime: "morestack on g0" in TestSegv on darwin-amd64 builders Aug 31, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Aug 31, 2020

Hmm... Why do we ignore user-generated SIGSEGV signals in the first place? I explicitly sent a program SIGSEGV on the command line, I would generally expect to get a core dump (since that is the SIG_DFL behavior of the signal to begin with).

@ianlancetaylor
Copy link
Contributor

We don't ignore user-generated SIGSEGV signals. That's the point of the test. I'm not sure what you are saying here.

The test failure logs suggest that the problem is that we somehow think that we are out of stack space while handling a signal. I'm not sure how that could happen.

@cherrymui
Copy link
Member

In my experience, "morestack on g0" is usually not we are actually running out of stack space on g0, but somehow the SP and and the G (and so the stack bounds) don't match. My comment above mentioned some possibilities, e.g. signal lands right in the middle of a stack switch.

As @ianlancetaylor said, we don't ignore user-generated SIGSEGV (we did it in the past, but not now). The specialness of darwin is that we treat user-generated SIGSEGV (which should crash the runtime) as kernel-generated (which causes a panic), due to a kernel bug ( https://tip.golang.org/src/runtime/signal_darwin_amd64.go#L72 ). Because of that, we inject a sigpanic call, instead of just throw, and somewhere down the panic path there are non-nosplit functions that check stack bounds. If the G and stack bounds don't match, it could crash like this.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 2, 2022

darwin/amd64 is a first class port, and this test has been failing intermittently on the builder for over a year and a half.

If the behavior covered by this test is important then we really ought to find a solution for it; otherwise, the test should be skipped to reduce noise on the builders.

@gopherbot
Copy link

Change https://golang.org/cl/382395 mentions this issue: runtime: skip TestSegv failures with "morestack on g0" on darwin/amd64

gopherbot pushed a commit that referenced this issue Feb 3, 2022
This failure mode has been present since at least 2020-06-08. We have
enough information to diagnose it, and further failures don't seem to
be adding any new information at this point: they can only add noise,
both on the Go project's builders and in users' own modules (for
example, when run as part of 'go test all').

For #39457

Change-Id: I2379631da0c8af69598fa61c0cc5ac0ea6ba8267
Reviewed-on: https://go-review.googlesource.com/c/go/+/382395
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@evanw
Copy link

evanw commented Feb 18, 2022

I have a user report about this error on a Windows machine with esbuild, which is written in Go but doesn't use cgo: evanw/esbuild#2031. I searched and found this issue and that report seemed potentially related, so I'm posting about it here in case it helps.

@cherrymui
Copy link
Member

@evanw that looks like a different issue. Could you open a new issue? Thanks.

This issue is very specific to darwin (macOS) when a program receives an external SIGSEGV signal (e.g. by kill command or syscall).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
Status: Triage Backlog
Development

No branches or pull requests

5 participants