Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: mexit leads to SIGSEGV due to g.m becoming nil #52394

Open
4a6f656c opened this issue Apr 17, 2022 · 4 comments
Open

runtime: mexit leads to SIGSEGV due to g.m becoming nil #52394

4a6f656c opened this issue Apr 17, 2022 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@4a6f656c
Copy link
Contributor

Issue is reproducible on openbsd/amd64 with both go1.17.7 and -tip.

While debugging test failures (for another software package - delve), the go test run will regularly hit a SIGSEGV like the following:

Thread 5 received signal SIGSEGV, Segmentation fault.
[Switching to thread 129486]
0x00000000007a6f2e in runtime.lock2 (l=0xec5e98 <runtime.sched+24>) at /usr/local/go/src/runtime/lock_sema.go:42
42              if gp.m.locks < 0 {
(gdb) bt
#0  0x00000000007a6f2e in runtime.lock2 (l=0xec5e98 <runtime.sched+24>) at /usr/local/go/src/runtime/lock_sema.go:42
#1  0x00000000007dac0d in runtime.lockWithRank (l=<optimized out>, rank=<optimized out>) at /usr/local/go/src/runtime/lockrank_off.go:23
#2  runtime.lock (l=<optimized out>) at /usr/local/go/src/runtime/lock_sema.go:37
#3  runtime.mexit (osStack=true) at /usr/local/go/src/runtime/proc.go:1532
#4  0x00000000007da86f in runtime.mstart0 () at /usr/local/go/src/runtime/proc.go:1372
#5  0x000000000080b425 in runtime.mstart () at /usr/local/go/src/runtime/asm_amd64.s:248
#6  0x0000000000e420f4 in crosscall_amd64 () at gcc_amd64.S:40
#7  0x000000000080b420 in ?? ()
#8  0x000000c0000011e0 in ?? ()
#9  0x00007f7ffffc7b68 in ?? ()
#10 0x0000000000e41ff0 in ?? () at gcc_openbsd_amd64.c:51
#11 0x0000000297c8ee70 in ?? ()
#12 0x000000027d9ca8a8 in ?? ()
#13 0x0000000000e42024 in threadentry (v=<optimized out>) at gcc_openbsd_amd64.c:63

This is occurring due to gp.m becoming nil during the following in mexit (it is non-nil before this line and is nil immediately afterwards):

        // Release the P.
        handoffp(releasep())

The next piece of code is:

        lock(&sched.lock)

And this leads to a gp.m.locks dereference and hence a SIGSEGV.

This seems to be triggered by the fact that these tests run code that calls runtime.LockOSThread - adding a matching runtime.UnlockOSThread call makes the issue disappear, as does removing the runtime.LockOSThread call. Additionally, the openbsd/amd64 port uses system threads, which may play a part.

So far I've been unsuccessful in reducing the test suite to provide a minimal reproducer. However, it is easy to trigger and I can provide further information as needed.

@4a6f656c
Copy link
Contributor Author

This is also reproducible on FreeBSD 13 with Go 1.18:

[Switching to LWP 106157 of process 3253]
runtime.lock2 (l=0xe29178 <runtime.sched+24>) at /usr/local/go/src/runtime/lock_futex.go:53
53		if gp.m.locks < 0 {
(gdb) bt
#0  runtime.lock2 (l=0xe29178 <runtime.sched+24>) at /usr/local/go/src/runtime/lock_futex.go:53
#1  0x00000000007e7be5 in runtime.lockWithRank (l=<optimized out>, rank=<optimized out>) at /usr/local/go/src/runtime/lockrank_off.go:22
#2  runtime.lock (l=<optimized out>) at /usr/local/go/src/runtime/lock_futex.go:47
#3  runtime.mexit (osStack=true) at /usr/local/go/src/runtime/proc.go:1535
#4  0x00000000007e78a9 in runtime.mstart0 () at /usr/local/go/src/runtime/proc.go:1385
#5  0x0000000000812b85 in runtime.mstart () at /usr/local/go/src/runtime/asm_amd64.s:367
#6  0x0000000000dbe1e0 in crosscall_amd64 () at gcc_amd64.S:40
#7  0x0000000801612500 in ?? ()
#8  0x000000c000131d40 in ?? ()
#9  0x0000000000000001 in ?? ()
#10 0x0000000000000001 in ?? ()
#11 0x00007fffdf7f9fc0 in ?? ()
#12 0x0000000000812b80 in ?? ()
#13 0x0000000000dbdd95 in threadentry (v=<optimized out>) at gcc_freebsd_amd64.c:72
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Of interest, this is also a system/pthread created thread (entry is via threadentry in gcc_freebsd_amd64.c). In this case the gp.m is not nil, but a minimal value:

(gdb) print gp.m
$1 = (runtime.m *) 0x137

@4a6f656c
Copy link
Contributor Author

The reproducer is:

$ git clone https://github.com/go-delve/delve
$ cd delve/pkg/terminal
$ go test

4a6f656c added a commit to 4a6f656c/delve that referenced this issue Apr 18, 2022
On FreeBSD and OpenBSD, the use of runtime.LockOSThread is resulting in segfaults
within the Go runtime (see golang/go#52394) - while it
should not be necessary, calling runtime.UnlockOSThread upon exit from
handlePtraceFuncs avoids this issue and allows the tests to run correctly.
@thanm thanm added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 18, 2022
@thanm
Copy link
Contributor

thanm commented Apr 18, 2022

Thanks for the report. Could you please attach a more complete "go env" with hardware, OS version ,etc? Thanks. It would also help to understand which testpoint is the one that triggers the error.

@4a6f656c
Copy link
Contributor Author

Thanks for the report. Could you please attach a more complete "go env" with hardware, OS version ,etc?

On openbsd/amd64:

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/joel/.cache/go-build"
GOENV="/home/joel/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="openbsd"
GOINSECURE=""
GOMODCACHE="/home/joel/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="openbsd"
GOPATH="/home/joel/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/openbsd_amd64"
GOVCS=""
GOVERSION="go1.17.7"
GCCGO="gccgo"
AR="ar"
CC="cc"
CXX="c++"
CGO_ENABLED="1"
GOMOD="/home/joel/src/delve/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3281329351=/tmp/go-build -gno-record-gcc-switches"

On freebsd/amd64:

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/joel/.cache/go-build"
GOENV="/home/joel/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOINSECURE=""
GOMODCACHE="/home/joel/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="freebsd"
GOPATH="/home/joel/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GOVCS=""
GOVERSION="go1.18"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="cc"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/usr/home/joel/src/delve/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1993072459=/tmp/go-build -gno-record-gcc-switches"

Another data point is that both of these are virtual machines that have a single vCPU.

It would also help to understand which testpoint is the one that triggers the error.

As far as I can determine, there is no single test case that triggers it - I suspect it relates to the cycling of goroutines and threads that have called runtime.LockOSThread.

@thanm thanm added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Apr 18, 2022
4a6f656c added a commit to 4a6f656c/delve that referenced this issue Apr 21, 2022
On FreeBSD and OpenBSD, the use of runtime.LockOSThread is resulting in segfaults
within the Go runtime (see golang/go#52394) - while it
should not be necessary, calling runtime.UnlockOSThread upon exit from
handlePtraceFuncs avoids this issue and allows the tests to run correctly.
derekparker pushed a commit to go-delve/delve that referenced this issue Apr 26, 2022
On FreeBSD and OpenBSD, the use of runtime.LockOSThread is resulting in segfaults
within the Go runtime (see golang/go#52394) - while it
should not be necessary, calling runtime.UnlockOSThread upon exit from
handlePtraceFuncs avoids this issue and allows the tests to run correctly.
@seankhliao seankhliao added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 20, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants