Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: segfault in runtime.fpTracebackPCs during deferred call after recovering from panic #61766

Closed
nsrip-dd opened this issue Aug 4, 2023 · 10 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Aug 4, 2023

What version of Go are you using (go version)?

$ go version
go version go1.21rc4 darwin/arm64

Does this issue reproduce with the latest release?

No, it does not reproduce with Go 1.20.7.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/nick.ripley/Library/Caches/go-build'
GOENV='/Users/nick.ripley/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/nick.ripley/go/pkg/mod'
GONOPROXY='REDACTED'
GONOSUMDB='REDACTED'
GOOS='darwin'
GOPATH='/Users/nick.ripley/go'
GOPRIVATE='REDACTED'
GOPROXY='proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21rc4'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/nick.ripley/sandbox/go/tracepanic/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/f3/g91d13pd6kd3vdxts_gsgd1r0000gn/T/go-build3821548427=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Ran the following program, which records an execution trace event during a deferred call after recovering from a panic:

package main

import (
	"context"
	"io"
	"runtime/trace"
)

func main() {
	if err := trace.Start(io.Discard); err != nil {
		panic(err)
	}
	defer trace.Stop()
	recoverTrace()
}

func recoverTrace() {
	defer func() {
		_, t := trace.NewTask(context.Background(), "foobar")
		t.End()
	}()
	defer func() {
		recover()
	}()
	panic(42)
}

This is inspired by TestCallersAfterRecovery, which failed for me while I was attempting to use frame pointer unwinding in runtime.Callers.

What did you expect to see?

I expected the program to exit successfully.

What did you see instead?

The program crashed due to a segfault during frame pointer unwinding. It does not crash if I disable frame pointer unwinding for tracing by setting GODEBUG=tracefpunwindoff=1.

Here is an example output:

Crash output
% go run .
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x202c7261626f6f6e pc=0x1049fc790]

goroutine 1 [running]:
runtime.throw({0x104a22717?, 0x1049bbcc0?})
  /usr/local/go/src/runtime/panic.go:1077 +0x40 fp=0x14000070be0 sp=0x14000070bb0 pc=0x1049da050
runtime.sigpanic()
  /usr/local/go/src/runtime/signal_unix.go:845 +0x378 fp=0x14000070c40 sp=0x14000070be0 pc=0x1049f0128
runtime.fpTracebackPCs(...)
  /usr/local/go/src/runtime/trace.go:1014
runtime.traceStackID(0xf0eb25d4241?, {0x14b7c4018, 0x80, 0x80}, 0x104a1ce8e?)
  /usr/local/go/src/runtime/trace.go:979 +0x1e0 fp=0x14000070ca0 sp=0x14000070c50 pc=0x1049fc790
runtime.traceEventLocked(0x140000231d8?, 0x0?, 0x4a1ce8e?, 0x140000231d8, 0x2d, 0x0, 0x3, {0x14000070d78, 0x3, 0x14000070db8?})
  /usr/local/go/src/runtime/trace.go:830 +0x240 fp=0x14000070d20 sp=0x14000070ca0 pc=0x1049fbe30
runtime/trace.userTaskCreate(0x1, 0x0, {0x104a1ce8e, 0x6})
  /usr/local/go/src/runtime/trace.go:1715 +0xb0 fp=0x14000070da0 sp=0x14000070d20 pc=0x104a06c20
runtime/trace.NewTask({0x104a52d68, 0x104ae50e0}, {0x104a1ce8e, 0x6})
  /usr/local/go/src/runtime/trace/annotation.go:41 +0xb8 fp=0x14000070df0 sp=0x14000070da0 pc=0x104a1c388
main.recoverTrace.func1()
  /Users/nick.ripley/sandbox/go/tracepanic/main.go:19 +0x38 fp=0x14000070e20 sp=0x14000070df0 pc=0x104a1c958
runtime.deferCallSave(0x0, 0x104b9c108?)
  /usr/local/go/src/runtime/panic.go:798 +0x90 fp=0x14000070e30 sp=0x14000070e20 pc=0x1049d9890
runtime.runOpenDeferFrame(0x140000200a0)
  /usr/local/go/src/runtime/panic.go:771 +0x1b4 fp=0x14000070e80 sp=0x14000070e30 pc=0x1049d96c4
runtime.deferreturn()
  /usr/local/go/src/runtime/panic.go:461 +0xac fp=0x14000070ec0 sp=0x14000070e80 pc=0x1049d8efc
main.recoverTrace()
  /Users/nick.ripley/sandbox/go/tracepanic/main.go:25 +0x68 fp=0x14000070f00 sp=0x14000070ec0 pc=0x104a1c908
main.main()
  /Users/nick.ripley/sandbox/go/tracepanic/main.go:14 +0x50 fp=0x14000070f30 sp=0x14000070f00 pc=0x104a1c860
runtime.main()
  /usr/local/go/src/runtime/proc.go:267 +0x2bc fp=0x14000070fd0 sp=0x14000070f30 pc=0x1049dc8bc
runtime.goexit()
  /usr/local/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x14000070fd0 sp=0x14000070fd0 pc=0x104a09e74

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
  /usr/local/go/src/runtime/proc.go:398 +0xc8 fp=0x14000042f90 sp=0x14000042f70 pc=0x1049dcce8
runtime.goparkunlock(...)
  /usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
  /usr/local/go/src/runtime/proc.go:322 +0xb8 fp=0x14000042fd0 sp=0x14000042f90 pc=0x1049dcb78
runtime.goexit()
  /usr/local/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x14000042fd0 sp=0x14000042fd0 pc=0x104a09e74
created by runtime.init.6 in goroutine 1
  /usr/local/go/src/runtime/proc.go:310 +0x24

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
  /usr/local/go/src/runtime/proc.go:398 +0xc8 fp=0x14000043760 sp=0x14000043740 pc=0x1049dcce8
runtime.goparkunlock(...)
  /usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
  /usr/local/go/src/runtime/mgcsweep.go:280 +0xa0 fp=0x140000437b0 sp=0x14000043760 pc=0x1049ca170
runtime.gcenable.func1()
  /usr/local/go/src/runtime/mgc.go:200 +0x28 fp=0x140000437d0 sp=0x140000437b0 pc=0x1049beec8
runtime.goexit()
  /usr/local/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x140000437d0 sp=0x140000437d0 pc=0x104a09e74
created by runtime.gcenable in goroutine 1
  /usr/local/go/src/runtime/mgc.go:200 +0x6c

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x1400005a000?, 0x104a38348?, 0x1?, 0x0?, 0x14000002d00?)
  /usr/local/go/src/runtime/proc.go:398 +0xc8 fp=0x14000043f50 sp=0x14000043f30 pc=0x1049dcce8
runtime.goparkunlock(...)
  /usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x104ab1680)
  /usr/local/go/src/runtime/mgcscavenge.go:425 +0x5c fp=0x14000043f80 sp=0x14000043f50 pc=0x1049c79dc
runtime.bgscavenge(0x0?)
  /usr/local/go/src/runtime/mgcscavenge.go:653 +0x44 fp=0x14000043fb0 sp=0x14000043f80 pc=0x1049c7f34
runtime.gcenable.func2()
  /usr/local/go/src/runtime/mgc.go:201 +0x28 fp=0x14000043fd0 sp=0x14000043fb0 pc=0x1049bee68
runtime.goexit()
  /usr/local/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x14000043fd0 sp=0x14000043fd0 pc=0x104a09e74
created by runtime.gcenable in goroutine 1
  /usr/local/go/src/runtime/mgc.go:201 +0xac

goroutine 5 [finalizer wait]:
runtime.gopark(0x140000425c8?, 0x104a1bf70?, 0x0?, 0x0?, 0x104a1d6a7?)
  /usr/local/go/src/runtime/proc.go:398 +0xc8 fp=0x14000042580 sp=0x14000042560 pc=0x1049dcce8
runtime.runfinq()
  /usr/local/go/src/runtime/mfinal.go:193 +0x108 fp=0x140000427d0 sp=0x14000042580 pc=0x1049bdfb8
runtime.goexit()
  /usr/local/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x140000427d0 sp=0x140000427d0 pc=0x104a09e74
created by runtime.createfing in goroutine 1
  /usr/local/go/src/runtime/mfinal.go:163 +0x80

goroutine 6 [trace reader (blocked)]:
runtime.gopark(0x14000044770?, 0x0?, 0x0?, 0x0?, 0x0?)
  /usr/local/go/src/runtime/proc.go:398 +0xc8 fp=0x14000044720 sp=0x14000044700 pc=0x1049dcce8
runtime.ReadTrace()
  /usr/local/go/src/runtime/trace.go:529 +0x38 fp=0x14000044790 sp=0x14000044720 pc=0x1049fb398
runtime/trace.Start.func1()
  /usr/local/go/src/runtime/trace/trace.go:130 +0x4c fp=0x140000447d0 sp=0x14000044790 pc=0x104a1c61c
runtime.goexit()
  /usr/local/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x140000447d0 sp=0x140000447d0 pc=0x104a09e74
created by runtime/trace.Start in goroutine 1
  /usr/local/go/src/runtime/trace/trace.go:128 +0x11c
exit status 2

cc @felixge @mknyszek

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 4, 2023
@nsrip-dd
Copy link
Contributor Author

nsrip-dd commented Aug 4, 2023

Here's what I have so far. Some lldb output:

Process 72738 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x202c7261626f6f6e)
    frame #0: 0x0000000100050790 tracepanic`runtime.traceStackID + 480
tracepanic`runtime.traceStackID:
->  0x100050790 <+480>: ldr    x6, [x0, #0x8] #
    0x100050794 <+484>: str    x6, [x1, x5, lsl #3]
    0x100050798 <+488>: ldr    x0, [x0]
    0x10005079c <+492>: add    x5, x5, #0x1
Target 0: (tracepanic) stopped.

The crash is from this line:

pcBuf[i] = *(*uintptr)(unsafe.Pointer(uintptr(fp) + goarch.PtrSize))

Register x0 is fp, x1 points to the pcBuf array, and x5 is i. Here's how far unwinding gets:

(lldb) register read x0
      x0 = 0x202c7261626f6f66 <---- definitely not a valid frame pointer. It's "foobar, " if you assume it's little endian and ASCII-encoded
(lldb) register read x1
      x1 = 0x00000001007d8020
(lldb) register read x5
      x5 = 0x000000000000000a
(lldb) memory read --gdb-format a 0x00000001007d8020
0x1007d8020: 0x000000010004fe30 tracepanic`runtime.traceEventLocked + 576
(lldb) memory read --gdb-format a `0x00000001007d8020 + 8`
0x1007d8028: 0x000000010005ac20 tracepanic`runtime/trace.userTaskCreate + 176
(lldb) memory read --gdb-format a `0x00000001007d8020 + 16`
0x1007d8030: 0x0000000100070388 tracepanic`runtime/trace.NewTask + 184
(lldb) memory read --gdb-format a `0x00000001007d8020 + 24`
0x1007d8038: 0x0000000100070958 tracepanic`main.recoverTrace.func1 + 56
(lldb) memory read --gdb-format a `0x00000001007d8020 + 32`
0x1007d8040: 0x000000010002d890 tracepanic`runtime.deferCallSave + 144
(lldb) memory read --gdb-format a `0x00000001007d8020 + 40`
0x1007d8048: 0x000000010002d6c4 tracepanic`runtime.runOpenDeferFrame + 436
(lldb) memory read --gdb-format a `0x00000001007d8020 + 48`
0x1007d8050: 0x000000010002cefc tracepanic`runtime.deferreturn + 172
(lldb) memory read --gdb-format a `0x00000001007d8020 + 56`
0x1007d8058: 0x0000000100070908 tracepanic`main.recoverTrace + 104
(lldb) memory read --gdb-format a `0x00000001007d8020 + 64`
0x1007d8060: 0x0000000000000006

For what it's worth, the exact return address from runtime.deferreturn is pointing here:

tracepanic`main.recoverTrace:
    0x1000708a0 <+0>:   ldr    x16, [x28, #0x10]
    0x1000708a4 <+4>:   cmp    sp, x16
    0x1000708a8 <+8>:   b.ls   0x100070914               ; <+116>
<----- snip ------>
    0x1000708f8 <+88>:  add    x1, x1, #0x38             ; runtime.egcbss + 3
    0x1000708fc <+92>:  bl     0x10002d8f0               ; runtime.gopanic
    0x100070900 <+96>:  nop    
    0x100070904 <+100>: bl     0x10002ce50               ; runtime.deferreturn
->  0x100070908 <+104>: ldp    x29, x30, [sp, #-0x8]
    0x10007090c <+108>: add    sp, sp, #0x40

Since the last good PC points to main.recoverTrace, it looks like runtime.deferreturn has a frame pointer pointing to the wrong place? Could recovering from the panic be putting something incorrect in the frame pointer register before runtime.deferreturn is called? My current hunch is that we might need to set sched.bp here?

@gopherbot
Copy link

Change https://go.dev/cl/516157 mentions this issue: runtime: restore caller's frame pointer when recovering from panic

@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Aug 7, 2023
@mknyszek mknyszek modified the milestones: Backlog, Go1.22 Aug 7, 2023
@nsrip-dd
Copy link
Contributor Author

nsrip-dd commented Aug 7, 2023

My hunch was right. When a goroutine is recovering from a panic, the runtime.recovery function doesn't restore the expected frame pointer to gp.sched.bp before returning control to the goroutine. The value in gp.sched.bp will point to a stack frame from one of the functions that was handling the panic (saved when using mcall to run runtime.recovery), which can point lower on the stack than the frame of the deferred functions running after the recovery. So that frame pointer can point to whatever gets put on the stack later, like function locals. Hence the crash. https://go.dev/cl/516157 fixes the issue.

@mknyszek
Copy link
Contributor

@gopherbot Please open up a backport issue for Go 1.21.

This causes crashes that while rare (a fairly specific situation is required) are difficult to work around. The only option is turning off FP unwinding for the tracer, but that means a fairly severe regression in overheads, which can be a problem if you've already committed to tracing more frequently at scale. The fix is also small and fairly safe.

(This issue also does not apply to Go 1.20.)

@gopherbot
Copy link

gopherbot commented Aug 15, 2023

Backport issue(s) opened: #62046 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link

Change https://go.dev/cl/519656 mentions this issue: [release-branch.go1.21] runtime: restore caller's frame pointer when recovering from panic

@gopherbot
Copy link

Change https://go.dev/cl/523697 mentions this issue: runtime: fix bp restoration in panic recovery for arm64

@gopherbot
Copy link

Change https://go.dev/cl/523698 mentions this issue: [release-branch.go1.21] runtime: restore caller's frame pointer when recovering from panic

gopherbot pushed a commit that referenced this issue Aug 28, 2023
Previously, the frame pointer wouldn't be restored at all, which could
cause panics during frame pointer unwinding. As of CL 516157, the frame
pointer is restored, but it's restored incorrectly on arm64: on arm64,
the frame pointer points one word below SP, but here it's one below
panic.fp which is the stack pointer of the caller's frame (nothing to do
with the architectural bp).

For #61766.

Change-Id: I86504b85a4d741df5939b51c914d9e7c8d6edaad
Reviewed-on: https://go-review.googlesource.com/c/go/+/523697
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/524315 mentions this issue: runtime: give TestFPUnwindAfterRecovery more opportunities to fail

gopherbot pushed a commit that referenced this issue Aug 30, 2023
…recovering from panic

When recovering from a panic, restore the caller's frame pointer before
returning control to the caller. Otherwise, if the function proceeds to
run more deferred calls before returning, the deferred functions will
get invalid frame pointers pointing to an address lower in the stack.
This can cause frame pointer unwinding to crash, such as if an execution
trace event is recorded during the deferred call on architectures which
support frame pointer unwinding.

Original CL by Nick Ripley, includes fix from CL 523697, and includes a
test update from CL 524315.

This CL also deviates from the original fix by doing some extra
computation to figure out the fp from the sp, since we don't have the fp
immediately available to us in `recovery` on the Go 1.21 branch, and it
would probably be complicated to plumb that through its caller.

For #61766
Fixes #62046

Change-Id: I5a99ca4f909f6b6e209a330d595d1c99987d4359
Reviewed-on: https://go-review.googlesource.com/c/go/+/523698
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/526256 mentions this issue: runtime: check that frame pointers go up the stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants