Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: frame pointer unwinding during C-to-Go calls can crash on amd64 #59830

Closed
nsrip-dd opened this issue Apr 25, 2023 · 1 comment
Closed
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Apr 25, 2023

While attempting to enable frame pointer unwinding for execution tracing on arm64, I ran the TestTraceUnwindCGO test. This test crashes on arm64 during frame pointer unwinding. The test, unaltered, does not crash on amd64. However, I think amd64 is susceptible to the same issue, namely that there's a trace event during a C-to-Go call which currently gets a call stack via frame pointer unwinding when it probably shouldn't.

We have the following logic in place to determine whether we're in a C-to-Go call:

go/src/runtime/proc.go

Lines 860 to 862 in 715d53c

func (mp *m) incgocallback() bool {
return (!mp.incgo && mp.ncgo > 0) || mp.isextra
}

If so, we fall back to regular unwinding, to avoid possibly following bad frame pointers coming from the C code. However, there is a point during the C-to-Go call where the "incgocallback" check is false, because incgo == true, yet we will still emit a trace event, during reentersyscall:

go/src/runtime/cgocall.go

Lines 241 to 250 in 715d53c

gp.m.incgo = true
if gp.m != checkm {
throw("m changed unexpectedly in cgocallbackg")
}
osPreemptExtEnter(gp.m)
// going back to cgo call
reentersyscall(savedpc, uintptr(savedsp))

I can make TestTraceUnwindCGO crash on amd64 by deliberately putting junk in RBP and confirming that the tracer attempts to unwind through the junk address:

diff --git a/src/runtime/testdata/testprogcgo/trace_unix.c b/src/runtime/testdata/testprogcgo/trace_unix.c
index 0fa55c7215..220fe2d4e3 100644
--- a/src/runtime/testdata/testprogcgo/trace_unix.c
+++ b/src/runtime/testdata/testprogcgo/trace_unix.c
@@ -19,6 +19,7 @@ static void* cCalledFromCThread(void *p) {
 }
 
 void cCalledFromGo(void) {
+       asm ("mov $0xdeadbeef, %rbp\n");
        goCalledFromC();
 
        pthread_t thread;

Then go build ./testdata/testprogcgo, then gdb ./testprogcgo:

(gdb) r Trace
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/ec2-user/repos/go/src/runtime/testprogcgo Trace
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffd03e2700 (LWP 18813)]
[New Thread 0x7fffcf9e1700 (LWP 18814)]
[New Thread 0x7fffcefe0700 (LWP 18815)]
[New Thread 0x7fffce5df700 (LWP 18816)]
[New Thread 0x7fffcdb9e700 (LWP 18817)]
[New Thread 0x7fffcd19d700 (LWP 18818)]

Thread 1 "testprogcgo" received signal SIGSEGV, Segmentation fault.
runtime.traceStackID (mp=<optimized out>, pcBuf=[]uintptr = {...}, skip=<optimized out>, ~r0=<optimized out>) at /home/ec2-user/repos/go/src/runtime/trace.go:916
916                             nstk += 1 + fpTracebackPCs(unsafe.Pointer(curgp.sched.bp), skip, pcBuf[2:])
(gdb) bt
#0  runtime.traceStackID (mp=<optimized out>, pcBuf=[]uintptr = {...}, skip=<optimized out>, ~r0=<optimized out>) at /home/ec2-user/repos/go/src/runtime/trace.go:916
#1  0x000000000045a865 in runtime.traceEventLocked (extraBytes=<optimized out>, mp=0x85f0e0 <runtime.m0>, pid=<optimized out>, bufp=0xc00002bbd0, ev=28 '\034', stackID=0, skip=1, 
    args=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /home/ec2-user/repos/go/src/runtime/trace.go:759
#2  0x000000000045a569 in runtime.traceEvent (ev=28 '\034', skip=1, args=[]uint64) at /home/ec2-user/repos/go/src/runtime/trace.go:691
#3  0x000000000045d225 in runtime.traceGoSysCall () at /home/ec2-user/repos/go/src/runtime/trace.go:1573
#4  0x000000000046a42a in runtime.systemstack () at /home/ec2-user/repos/go/src/runtime/asm_amd64.s:509
#5  0x00007fffffffe268 in ?? ()
#6  0x000000000085f0e0 in ?? ()
#7  0x00007fffffffe250 in ?? ()
#8  0x00000000004ae39d in crosscall2 () at /home/ec2-user/repos/go/src/runtime/cgo/asm_amd64.s:30
#9  0x000000000051780e in goCalledFromC () at _cgo_export.c:212
(gdb) disas
... elided ...
=> 0x000000000045b297 <+535>:   mov    0x8(%rsi),%r11
   0x000000000045b29b <+539>:   mov    %r11,(%r9,%rax,8)
   0x000000000045b29f <+543>:   mov    (%rsi),%rsi
   0x000000000045b2a2 <+546>:   inc    %rax
   0x000000000045b2a5 <+549>:   cmp    %r10,%rax
(gdb) p/x $rsi
$2 = 0xdeadbeef

I think we need something like an incgocallback boolean that's set to be true for the entirety of cgocallbackg, or at least something that covers the reentersyscall call at the end of cgocallbackg.

cc @felixge @mknyszek @prattmic

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 25, 2023
@gopherbot
Copy link

Change https://go.dev/cl/488755 mentions this issue: runtime/trace: avoid frame pointer unwinding for events during cgocallbackg

@dmitshur dmitshur added this to the Backlog milestone Apr 25, 2023
@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 25, 2023
@dmitshur dmitshur modified the milestones: Backlog, Go1.21 Apr 28, 2023
@golang golang locked and limited conversation to collaborators Apr 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants