Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: DWARF CFI rejected when function does not allocate stack space on ARM64 #35100

Open
gawen opened this issue Oct 23, 2019 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@gawen
Copy link
Contributor

gawen commented Oct 23, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/gawen/.cache/go-build"
GOENV="/home/gawen/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/opt/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/src/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build297764674=/tmp/go-build -gno-record-gcc-switches"

What did you do?

On a Android phone, with the following Go code compiled for the target android/arm64 with -ldflags="-compressdwarf=false".

// file: crash.go

//go:noinline
func CrashNilPointer() {
	var i2 *int
	*i2 = 0                 // segmentation fault
}

//go:noinline
func CrashDelayed() {
	go func() {
		time.Sleep(1 * time.Second)
		CrashNilPointer()
	}()
}

We made a small Android app which lets the user call CrashNilPointer and CrashDelayed.

We're using Crashlytics which uses Google Breakpad to generate minidumps when a native crash happens.

Using Breakpad's minidump_stackwalk, we are able to unwind the stack of the minidumps.

The minidump_stackwalk output for the crashing thread when CrashDelayed is called is:

Operating system: Android
                  0.0.0 Linux 3.10.73-g89fd15db99aa #1 SMP PREEMPT Thu Oct 11 19:31:31 UTC 2018 aarch64
CPU: arm64
     6 CPUs

GPU: UNKNOWN

Crash reason:  SIGABRT
Crash address: 0x277f00007cd1
Process uptime: not available

Thread 28 (crashed)
 0  libgojni.so!runtime.raise [sys_linux_arm64.s : 156 + 0x0]
     x0 = 0x0000000000000000    x1 = 0x0000000000007cf9
     x2 = 0x0000000000000006    x3 = 0x00000074112b9fcf
     x4 = 0x000000000000001f    x5 = 0x0000007411194f79
     x6 = 0x000000000000003f    x7 = 0x0000000000000030
     x8 = 0x0000000000000083    x9 = 0x000000000000001f
    x10 = 0x000000005da9ad30   x11 = 0x000000001495a96e
    x12 = 0x0000000000000018   x13 = 0x0000000000000000
    x14 = 0x0000000000000000   x15 = 0x001495a96ead900c
    x16 = 0x0000007411153613   x17 = 0x0000007411153610
    x18 = 0x00000074ab628000   x19 = 0x0000000000007cd1
    x20 = 0x000000400003ac80   x21 = 0x0000004000056380
    x22 = 0x0000000000007cd1   x23 = 0x0000000000007cf7
    x24 = 0x000000741118fb74   x25 = 0x000000740e406000
    x26 = 0x000000400003ae18   x27 = 0x0000000000000000
    x28 = 0x0000004000058a80    fp = 0x0000000000000000
     lr = 0x000000741113d604    sp = 0x000000400003adb0
     pc = 0x00000074111540d8
    Found by: given as instruction pointer in context
 1  libgojni.so!runtime.dieFromSignal [signal_unix.go : 437 + 0x0]
     fp = 0x0000000000000000    lr = 0x0000000000000000
     sp = 0x000000400003adb0    pc = 0x000000741113d604
    Found by: previous frame's frame pointer
 2  libgojni.so!runtime.crash [signal_unix.go : 540 + 0x0]
     fp = 0x0000000000000000    sp = 0x000000400003add0
     pc = 0x000000741113d900
    Found by: call frame info
 3  libgojni.so!runtime.fatalpanic [panic.go : 877 + 0x0]
     fp = 0x0000000000000000    sp = 0x000000400003adf0
     pc = 0x0000007411129acc
    Found by: call frame info
 4  libgojni.so!runtime.gopanic [panic.go : 723 + 0x0]
     fp = 0x0000000000000000    sp = 0x000000400003ae50
     pc = 0x0000007411129530
    Found by: call frame info
 5  libgojni.so!runtime.sigpanic [panic.go : 199 + 0x0]
     fp = 0x0000000000000000    sp = 0x000000400003aee0
     pc = 0x000000741113d4c8
    Found by: call frame info
 6  libgojni.so!github.com/.../crashtest/crash.CrashNilPointer [crash.go : 24 + 0x0]
     fp = 0x0000000000000000    sp = 0x000000400003af10
     pc = 0x000000741118a56c
    Found by: call frame info
 7  libgojni.so!github.com/.../crashtest/crash.CrashNilPointer [print.go : 274 + 0x0]
     fp = 0x0000000000000000    sp = 0x000000400003afa0
     pc = 0x000000741118a568
    Found by: call frame info
 8  0x40000b8004
     fp = 0x0000000000000000    sp = 0x000000400003b030
     pc = 0x00000040000b8008
    Found by: call frame info

The minidump_stackwalk output for the crashing thread when CrashNilPointer is called is:

Operating system: Android
                  0.0.0 Linux 3.10.73-g89fd15db99aa #1 SMP PREEMPT Thu Oct 11 19:31:31 UTC 2018 aarch64
CPU: arm64
     6 CPUs

GPU: UNKNOWN

Crash reason:  SIGABRT
Crash address: 0x277f00007b69
Process uptime: not available

Thread 0 (crashed)
 0  libgojni.so!runtime.raise [sys_linux_arm64.s : 156 + 0x0]
     x0 = 0x0000000000000000    x1 = 0x0000000000007b69
     x2 = 0x0000000000000006    x3 = 0x0000007411179fcf
     x4 = 0x000000000000001f    x5 = 0x0000007411054f79
     x6 = 0x000000000000003f    x7 = 0x0000000000000030
     x8 = 0x0000000000000083    x9 = 0x000000000000001f
    x10 = 0x000000005da9ac96   x11 = 0x0000000004e331cc
    x12 = 0x0000000000000018   x13 = 0x0000000000000000
    x14 = 0x0000000000000000   x15 = 0x0004e331ccc7e1d2
    x16 = 0x0000007411013613   x17 = 0x0000007411013610
    x18 = 0x0000000000000008   x19 = 0x0000000000007b69
    x20 = 0x000000400003cad0   x21 = 0x000000400002e000
    x22 = 0x0000007fee94d91c   x23 = 0x0000007412255636
    x24 = 0x0000000000000000   x25 = 0x00000074ad3c9a40
    x26 = 0x000000400003cc68   x27 = 0x0000000000000000
    x28 = 0x0000004000000600    fp = 0x0000007fee94d670
     lr = 0x0000007410ffd604    sp = 0x000000400003cc00
     pc = 0x00000074110140d8
    Found by: given as instruction pointer in context
 1  libgojni.so!runtime.crash [signal_unix.go : 540 + 0x0]
     sp = 0x000000400003cc08    pc = 0x0000007410ffd900
    Found by: stack scanning
 2  0x7400000002
     sp = 0x000000400003cc28    pc = 0x0000007400000006
    Found by: call frame info

The stacktrace when CrashDelayed is called is correct, as it unwinds until the user's code, yet does not unwind until CrashDelayed which spawns the goroutine calling CrashNilPointer. Note all stack frames are discovered with the "call frame information" (DWARF's CFI).

However, the stack when CrashNilPointer is called is not correct, as it stops at runtime.crash. Note also that the stack frame of runtime.crash is discovered by "stack scanning".

The way Breakpad gets a stack frame from an ARM64 minidump is to first check if a CFI exists and is correct, and if not, falls back on extracting the stack frame from the frame pointer, or (if fails again) scanning the stack trace (see breakpad/src/process/stackwalker_arm64.cc:251).

The stacktrace from CrashDelayed is unwinded thanks to DWARF CFIs, in particular for runtime.crash (stack level 2). But the CrashNilPointer's stacktrace unwinds to runtime.crash (stack level 1) with stack scanning and then fails, which means Breakpad didn't find any CFI for it. Plus, runtime.dieFromSignal, which is called by runtime.crash, is not detected by the stack scanner.

Note that Android Studio's LLDB is able to unwind a proper stacktrace for both CrashDelayed and CrashNilPointer when the segmentation fault happens.

What did you expect to see?

We expected to see the full stacktrace for a segmentation fault not happening in a goroutine (CrashNilPointer), where runtime.dieFromSignal is properly unwinded with DWARF's CFI.

What did you see instead?

The stacktrace for CrashNilPointer is truncated and incorrect.

@gawen
Copy link
Contributor Author

gawen commented Oct 23, 2019

I've been able to get the stacktraces for both CrashNilPointer and CrashDelayed.

It seems the CFI of functions which does not allocate stack space are rejected. runtime.crash is one of them.

Problem seems to come around here:

if pcsp.value > 0 {
// The return address is preserved at (CFA-frame_size)
// after a stack frame has been allocated.
deltaBuf = append(deltaBuf, dwarf.DW_CFA_offset_extended_sf)
deltaBuf = dwarf.AppendUleb128(deltaBuf, uint64(thearch.Dwarfreglr))
deltaBuf = dwarf.AppendSleb128(deltaBuf, -spdelta/dataAlignmentFactor)
} else {
// The return address is restored into the link register
// when a stack frame has been de-allocated.
deltaBuf = append(deltaBuf, dwarf.DW_CFA_same_value)
deltaBuf = dwarf.AppendUleb128(deltaBuf, uint64(thearch.Dwarfreglr))
}

Depending on pcsp.value, which is the stack space allocated by the current function, the following DWARF initial rule for .ra are emitted:

// if function allocates stack space (pcsp.value>0)
DW_CFA_offset_extended_sf
REG_LR    // ==.ra
-<space size>/alignment

// if function does not allocate stack space (pcsp.value==0)
DW_CFA_same_value
REG_LR    // ==.ra

When the function does not allocate stack space,the initial rule for .ra is .ra = .ra, which fails because .ra is not defined, and makes Breakpad reject the CFI. By replacing the condition of if from pscp.value > 0 to true, we are able to get correct stacktraces.

We're digging to understand why DW_CFA_same_value is being assigned to .ra. Any help is much appreciated.

@gawen gawen changed the title DWARF Call Frame Info sometimes incorrect on ARM64 DWARF CFI rejected when function does not allocate stack space on ARM64 Oct 23, 2019
@cherrymui cherrymui changed the title DWARF CFI rejected when function does not allocate stack space on ARM64 cmd/link: DWARF CFI rejected when function does not allocate stack space on ARM64 Oct 23, 2019
@cherrymui
Copy link
Member

cc @thanm

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 23, 2019
@gawen
Copy link
Contributor Author

gawen commented Oct 23, 2019

I made a work around patch for Go 1.13.1 which adds a linker flag -alwaysRestoreDWARFLinkRegister[=false], which when enabled force the linker to restore the Link Register from the stack trace, even if the function does not allocate any stack space.

Using this flag, Breakpad is able to use the CFI and unwind the stacktrace properly for both CrashDelayed and CrashNilPointer.

We don't expect this patch to be merged, but if anybody has a similar issue to ours, it might be something to try.

Finally, from our current understanding, we believe Breakpad does not handle properly DW_CFA_same_value for .ra, which should be interpreted as .ra = lr and not .ra = .ra.

@steeve
Copy link
Contributor

steeve commented Oct 23, 2019

cc @eliasnaur you might be interested in this, @gawen is working with us

@steeve
Copy link
Contributor

steeve commented Oct 23, 2019

For the record, Breakpad is what Crashlytics uses (and probably others).

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

6 participants