Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: frame pointer unwinding crash on arm64 during async preemption #63830

Closed
nsrip-dd opened this issue Oct 30, 2023 · 8 comments
Closed
Assignees
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nsrip-dd
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.21.3 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/ec2-user/.cache/go-build'
GOENV='/home/ec2-user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/ec2-user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/ec2-user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/ec2-user/sdk/go1.21.3'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/ec2-user/sdk/go1.21.3/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.21.3'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/ec2-user/fpcrash/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1020512040=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Used the runtime execution tracer with frame pointer unwinding enabled.

What did you expect to see?

No crashes.

What did you see instead?

Crashes when recording an event for async preemption. Example:

SIGSEGV: segmentation violation
PC=0x471c64 m=38 sigcode=1

goroutine 0 [idle]:
runtime.fpTracebackPCs(...)
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:1018
runtime.traceStackID(0x57c27cf78c4c3?, {0xfffe8dfd0018, 0x57c27ce950808?, 0x80}, 0x1?)
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:991 +0x224 fp=0xffff10bcb4f0 sp=0xffff10bcb4a0 pc=0x471c64
runtime.traceEventLocked(0xffff10bcb5e8?, 0x4739e8?, 0x10bcb5c8?, 0x40000c65d8, 0x12, 0x0, 0x1, {0x0, 0x0, 0x0?})
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:834 +0x240 fp=0xffff10bcb570 sp=0xffff10bcb4f0 pc=0x4712d0
runtime.traceEvent(0x0?, 0x1, {0x0, 0x0, 0x0})
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:770 +0x90 fp=0xffff10bcb5e0 sp=0xffff10bcb570 pc=0x471030
runtime.traceGoPreempt(...)
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:1609
runtime.gopreempt_m(0x100ab9d6?)
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/proc.go:3786 +0x50 fp=0xffff10bcb620 sp=0xffff10bcb5e0 pc=0x457270
traceback: unexpected SPWRITE function runtime.mcall // <--- I don't believe this is actually related to the issue?
runtime.mcall()
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/asm_arm64.s:192 +0x54 fp=0xffff10bcb630 sp=0xffff10bcb620 pc=0x482224

goroutine 549321 [running]:
runtime.asyncPreempt2()
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/preempt.go:307 +0x3c fp=0x4004013a40 sp=0x4004013a20 pc=0x44e26c
runtime.asyncPreempt()
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/preempt_arm64.s:47 +0x9c fp=0x4004013c30 sp=0x4004013a40 pc=0x4853ec
golang.org/x/net/http2.(*PingFrame).Header(0x4004013c78?)
    <autogenerated>:1 +0x54 fp=0x4004013c40 sp=0x4004013c40 pc=0x9ab404
golang.org/x/net/http2.(*Framer).checkFrameOrder(0x40d7cf0d20, {0x31947f0?, 0x4128ebc498?})
    /go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:547 +0x74 fp=0x4004013d60 sp=0x4004013c40 pc=0x97f854
golang.org/x/net/http2.(*Framer).ReadFrame(0x40d7cf0d20)
    /go/pkg/mod/golang.org/x/net@v0.17.0/http2/frame.go:516 +0x258 fp=0x4004013e10 sp=0x4004013d60 pc=0x97f5f8
google.golang.org/grpc/internal/transport.(*http2Client).reader(0x4003b71d40, 0x40d95567a8?)
    /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_client.go:1594 +0x1b8 fp=0x4004013fb0 sp=0x4004013e10 pc=0x9cb6f8
google.golang.org/grpc/internal/transport.newHTTP2Client.func11()
    /go/pkg/mod/google.golang.org/grpc@v1.58.2/internal/transport/http2_client.go:397 +0x2c fp=0x4004013fd0 sp=0x4004013fb0 pc=0x9c231c
runtime.goexit()
    /root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/asm_arm64.s:1197 +0x4 fp=0x4004013fd0 sp=0x4004013fd0 pc=0x484834
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 549519

This happens infrequently (observed ~20 times over the course of a week, across a significant number of processes). I believe I have identified the cause. Essentially:

Code is generated for a method like golang.org/x/net/http2.(FrameHeader).Header (example generated code here) which does this:

  1. Decrement SP to allocate a stack frame
  2. Save X29 (the frame pointer register) into the frame and update X29 to point to where the previous one is saved
  3. Do the function body
  4. Increment SP to free the stack frame
  5. Restore X29 and return

If the async preemption signal is delivered after 4, but before 5 (a single instruction) then the async preemption signal handler will allocate a new stack frame, overwriting where the interrupted function's stack frame was. It will save all of the registers into that frame. But X29 will point somewhere in the old, overwritten frame, meaning it will point to an address containing the random contents of a register saved on the stack. In particular, it'll point to where the floating point registers are saved (ref).

I can reproduce this with the test program here: https://gist.github.com/nsrip-dd/d85ff0d05d2afa6ca0c12796e992ea91

I'm not sure what the right fix here is. Make sure the frame pointer is restored before incrementing the stack pointer at the end of the function, perhaps?

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Oct 30, 2023
@randall77
Copy link
Contributor

That generated epilogue looks wrong to me even beyond this issue.

	0x0000 00000 (tmp1.go:25)	MOVD.W	R30, -48(RSP)
	0x0004 00004 (tmp1.go:25)	MOVD	R29, -8(RSP)
	0x0008 00008 (tmp1.go:25)	SUB	$8, RSP, R29
...
	0x004c 00076 (tmp1.go:25)	ADD	$48, RSP
	0x0050 00080 (tmp1.go:25)	SUB	$8, RSP, R29
	0x0054 00084 (tmp1.go:25)	RET	(R30)

That second instruction should be MOVD -56(RSP), R29, as that's where we saved the caller's R29. Otherwise we're not restoring the frame pointer properly at all, let alone in the right order.

This is for functions that are leaves but have a nonzero frame size. I think the other variants (nonleaf, and leaf+zeroframe) are correct. The nonleaf variant restores R29 before messing with the stack pointer, and leaf+zeroframe leaves R29 untouched.

Here's a simple function that has the problem:

func f(x int) int {
	var a [4]int
	return a[x&3]
}

@gopherbot
Copy link

Change https://go.dev/cl/538635 mentions this issue: cmd/internal/obj/arm64: fix frame pointer restore in epilogue

@cherrymui cherrymui added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 30, 2023
@cherrymui cherrymui added this to the Go1.22 milestone Oct 30, 2023
@randall77
Copy link
Contributor

randall77 commented Oct 30, 2023

Actually, maybe that old code is correct, if misordered. The old R29 should always be old SP - 8?

@felixge
Copy link
Contributor

felixge commented Oct 31, 2023

Yes, I think that's correct @randall77. Fixing the ordering would be great!

Aside: In non-leaf functions I'm seeing this prologue which gets the ordering right:

LDP -8(RSP), (R29, R30) 
ADD $48, RSP, RSP 

Arguably this could be used here as well? I'm saying arguably because we don't need to restore R30 for leaf functions. But if this instruction isn't more expensive than just restoring R29, perhaps having more consistent codgen would be a good thing?

@randall77 randall77 self-assigned this Nov 1, 2023
@felixge
Copy link
Contributor

felixge commented Jan 26, 2024

Is it possible to get this back ported? We have a few places where we need to build with go1.21 while it's still being supported.

I manually tested that the patch cleanly applies against go1.21.6. But I don't know how to send a back port CL.

Thanks 🙇

@prattmic
Copy link
Member

prattmic commented Feb 2, 2024

@gopherbot Please backport to 1.21. This causes crashes with tracing enabled on arm64. Though the FP code has been broken longer, only in 1.21 did tracing start to depend on the frame pointer.

@gopherbot
Copy link

Backport issue(s) opened: #65449 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link

Change https://go.dev/cl/560735 mentions this issue: [release-branch.go1.21] cmd/internal/obj/arm64: fix frame pointer restore in epilogue

gopherbot pushed a commit that referenced this issue Feb 2, 2024
…tore in epilogue

For leaf but nonzero-frame functions.

Currently we're not restoring it properly. We also need to restore
it before popping the stack frame, so that the frame won't get
clobbered by a signal handler in the meantime.

For #63830
Fixes #65449

Needs a test, but I'm not at all sure how we would actually do that. Leaving for inspiration.

Change-Id: I273a25f2a838f05a959c810145cccc5428eaf164
Reviewed-on: https://go-review.googlesource.com/c/go/+/538635
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Eric Fang <eric.fang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
(cherry picked from commit c9888bd)
Reviewed-on: https://go-review.googlesource.com/c/go/+/560735
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants