Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "invalid pc-encoded table" throw caused by bad cgo traceback #44971

Closed
prattmic opened this issue Mar 12, 2021 · 16 comments
Closed

runtime: "invalid pc-encoded table" throw caused by bad cgo traceback #44971

prattmic opened this issue Mar 12, 2021 · 16 comments
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

prattmic commented Mar 12, 2021

When executing cgo code, the signal handler will call cgoTraceback via x_cgo_callers in order to call a traceback from the executing cgo code. This calls the C traceback function provided by the application from runtime.SetCgoTraceback.

The frames returned from cgoTraceback are placed on the top of the recorded stack, followed by a Go runtime-provided trace of the preceding Go callers [1].

Later (in the case of CPU profiling), Frames.Next will call cgoSymbolizer to symbolize C frames and use funcInfo to symbolize Go frames.

... at least, that is how it is supposed to work. In practice, there are no guarantees on what cgoTraceback returns. Though it should only return non-Go PCs, there is nothing preventing it from returning a Go PC.

Generally, that would work OK (i.e., not crash, though the actual stack trace may not make sense): Frames.Next will simply follow the Go path and symbolize the PC as normal.

However, if this PC fell in the alignment region between functions (filled with 0xcc, int 3 on amd64), then:

  1. findfunc will find a funcInfo for this PC (the preceding function), as funcInfos cover the entire range from the start of one function to the start of the next, including the alignment region.
  2. If this funcInfo has inline data, we'll do a PCDATA lookup for our PC. PCDATA only cover the actually function range, so that will cause a throw like this:
$ ./cgo_traceback
runtime: invalid pc-encoded table f=internal/cpu.doinit pc=0x402f65 targetpc=0x402f6d tab=[0/0]0x0
        value=-1 until pc=0x402d3e
        value=0 until pc=0x402d45
        value=-1 until pc=0x402d4b
        value=1 until pc=0x402d52
        value=-1 until pc=0x402d5a
        value=0 until pc=0x402d60
        value=-1 until pc=0x402d68
        value=2 until pc=0x402d6f
        value=-1 until pc=0x402db4
        value=3 until pc=0x402dbb
        value=-1 until pc=0x402dff
        value=4 until pc=0x402e03
        value=-1 until pc=0x402e05
        value=5 until pc=0x402e0c
        value=-1 until pc=0x402e0e
        value=2 until pc=0x402e12
        value=-1 until pc=0x402f65
fatal error: invalid runtime symbol table

goroutine 6 [running]:
runtime.throw(0x4ebfa5, 0x1c)
        /usr/lib/google-golang/src/runtime/panic.go:1123 +0x72 fp=0xc00005ea18 sp=0xc00005e9e8 pc=0x436952
runtime.pcvalue(0x5572b8, 0x587a40, 0x1f3, 0x402f6d, 0x0, 0x50df01, 0x50df6b, 0x13)
        /usr/lib/google-golang/src/runtime/symtab.go:827 +0x5ae fp=0xc00005ead8 sp=0xc00005ea18 pc=0x45434e
runtime.pcdatavalue(0x5572b8, 0x587a40, 0x2, 0x402f6d, 0x0, 0x7fb05d417500)
        /usr/lib/google-golang/src/runtime/symtab.go:936 +0x7b fp=0xc00005eb28 sp=0xc00005ead8 pc=0x454c3b
runtime.(*Frames).Next(0xc0001088f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /usr/lib/google-golang/src/runtime/symtab.go:105 +0x495 fp=0xc00005ec38 sp=0xc00005eb28 pc=0x452915
runtime/pprof.allFrames(0x402f6e, 0xc00009c030, 0x402f6e, 0x5bfd00, 0x0)
        /usr/lib/google-golang/src/runtime/pprof/proto.go:285 +0xfc fp=0xc00005ed98 sp=0xc00005ec38 pc=0x4b4cfc
runtime/pprof.(*profileBuilder).appendLocsForStack(0xc000098000, 0x0, 0x0, 0x0, 0xc000126000, 0x6, 0x6, 0xe0323c7f6117d1bb, 0xc00007e180, 0xc00005eee8)
        /usr/lib/google-golang/src/runtime/pprof/proto.go:482 +0x2e6 fp=0xc00005ee48 sp=0xc00005ed98 pc=0x4b6086
runtime/pprof.(*profileBuilder).build(0xc000098000)
        /usr/lib/google-golang/src/runtime/pprof/proto.go:433 +0x151 fp=0xc00005ef58 sp=0xc00005ee48 pc=0x4b5931
runtime/pprof.profileWriter(0x50b1f8, 0xc000010028)
        /usr/lib/google-golang/src/runtime/pprof/pprof.go:813 +0x105 fp=0xc00005efd0 sp=0xc00005ef58 pc=0x4b21a5
runtime.goexit()
        /usr/lib/google-golang/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00005efd8 sp=0xc00005efd0 pc=0x46a021
created by runtime/pprof.StartCPUProfile
        /usr/lib/google-golang/src/runtime/pprof/pprof.go:784 +0x145

goroutine 1 [chan receive]:
runtime/pprof.StopCPUProfile()
        /usr/lib/google-golang/src/runtime/pprof/pprof.go:829 +0xc5
main.main()
        /usr/local/google/home/mpratt/Downloads/cgo_traceback/main.go:31 +0xff

Source for this repro in https://github.com/prattmic/scratch/tree/main/cgo_traceback_issue44971.

The obvious question here is: why would cgoTraceback include such a bogus PC? The answer depends on the traceback engine in use by cgoTraceback.

For example, https://github.com/ianlancetaylor/cgosymbolizer uses libgcc's unwind functionality, which uses DWARF information to walk the stack. I've not found a way to trick that into providing bogus results (rather than stopping early) short of using flat-out incorrect .cfa directives in assembly.

On the other hand, simpler traceback engines like Abseil's https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/stacktrace.h perform a more naive (but faster) walk simply following RBP frame pointers. They have some heuristics to try to avoid walking off the deep end, but fundamentally can't fully protect against code that has clobbered the frame pointer. This bug was first encountered with an Abseil-based traceback of assembly code that clobbered RBP to use as a simple argument register, thus resulting in garbage frames frames that would occasionally point into the alignment region between Go functions.

I don't think we can reasonably require cgoTraceback to guarantee it always provides valid frames, thus I see a few options here:

  1. Change Frames.Next to perform a non-strict PCDATA lookup. This is the simplest way to prevent crashes and I think the best approach, but it will make it a bit harder to notice bugs in the native runtime tracebacks.
  2. Change either funcInfo or PCDATA to make them consistent: either funcInfo does not cover alignment regions, or PCDATA does. I'm not sure how difficult these would be, but this would also potentially mask bugs in Go's tracebacks.
  3. In Frames, track which callers came from cgoTraceback, and which came from Go's traceback. Only the latter would even attempt to do a funcInfo / PCDATA lookup.

This affects tip, 1.16, and I believe earlier to at least 1.14, though I haven't tested earlier than 1.16 yet.

[1] A similar principle applies to C-to-Go callbacks, except that cgoTraceback is called in the middle of the Go traceback generation.

cc @cherrymui @ianlancetaylor @mknyszek @hyangah

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 12, 2021
@prattmic prattmic added this to the Go1.17 milestone Mar 12, 2021
@prattmic prattmic self-assigned this Mar 12, 2021
@gopherbot
Copy link

Change https://golang.org/cl/301369 mentions this issue: runtime: non-strict InlTreeIndex lookup in Frames.Next

@prattmic
Copy link
Member Author

Using the test from https://golang.org/cl/301369, I can verify this affects tip, 1.16, 1.15, 1.14, 1.13, and I didn't test any earlier than that

@ianlancetaylor ianlancetaylor changed the title runtime: "invalid pc-encoded table" throw caused by back cgo traceback runtime: "invalid pc-encoded table" throw caused by bad cgo traceback Mar 15, 2021
@ianlancetaylor
Copy link
Contributor

Perhaps we could accept bogus data in the traceback only if cgoTraceback is not nil.

@prattmic
Copy link
Member Author

That's an interesting idea, though Cherry made the logically opposite argument on https://golang.org/cl/301369. i.e., Frames.Next should probably not crash the whole runtime if a user passes bad values to CallersFrames. FuncForPC already does a non-strict lookup for the same reason, so I'm slightly inclined to go the same way here.

@prattmic
Copy link
Member Author

@gopherbot please open backport for 1.16 and 1.15. The only workaround is to change the C traceback engine, which isn't usually feasible.

@gopherbot
Copy link

Backport issue(s) opened: #45302 (for 1.15), #45303 (for 1.16).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link

Change https://golang.org/cl/305889 mentions this issue: [release-branch.go1.16] runtime: non-strict InlTreeIndex lookup in Frames.Next

@gopherbot
Copy link

Change https://golang.org/cl/305890 mentions this issue: [release-branch.go1.15] runtime: non-strict InlTreeIndex lookup in Frames.Next

gopherbot pushed a commit that referenced this issue Mar 31, 2021
…ames.Next

When using cgo, some of the frames can be provided by cgoTraceback, a
cgo-provided function to generate C tracebacks. Unlike Go tracebacks,
cgoTraceback has no particular guarantees that it produces valid
tracebacks.

If one of the (invalid) frames happens to put the PC in the alignment
region at the end of a function (filled with int 3's on amd64), then
Frames.Next will find a valid funcInfo for the PC, but pcdatavalue will
panic because PCDATA doesn't cover this PC.

Tolerate this case by doing a non-strict PCDATA lookup. We'll still show
a bogus frame, but at least avoid throwing.

For #44971
Fixes #45302

Change-Id: I9eed728470d6f264179a7615bd19845c941db78c
Reviewed-on: https://go-review.googlesource.com/c/go/+/301369
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
(cherry picked from commit e4a4161)
Reviewed-on: https://go-review.googlesource.com/c/go/+/305890
gopherbot pushed a commit that referenced this issue Mar 31, 2021
…ames.Next

When using cgo, some of the frames can be provided by cgoTraceback, a
cgo-provided function to generate C tracebacks. Unlike Go tracebacks,
cgoTraceback has no particular guarantees that it produces valid
tracebacks.

If one of the (invalid) frames happens to put the PC in the alignment
region at the end of a function (filled with int 3's on amd64), then
Frames.Next will find a valid funcInfo for the PC, but pcdatavalue will
panic because PCDATA doesn't cover this PC.

Tolerate this case by doing a non-strict PCDATA lookup. We'll still show
a bogus frame, but at least avoid throwing.

For #44971
Fixes #45303

Change-Id: I9eed728470d6f264179a7615bd19845c941db78c
Reviewed-on: https://go-review.googlesource.com/c/go/+/301369
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
(cherry picked from commit e4a4161)
Reviewed-on: https://go-review.googlesource.com/c/go/+/305889
@akrylysov
Copy link

This still happens with cgosymbolizer on Go 1.15.11:

Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime: invalid pc-encoded table f=runtime.allocm pc=0x5c7c90 targetpc=0x5c7c9f tab=[0/0]0x0
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=-1 until pc=0x5c7a0f
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=0 until pc=0x5c7a13
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=-1 until pc=0x5c7a14
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=0 until pc=0x5c7a1a
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=-1 until pc=0x5c7a3b
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=1 until pc=0x5c7a3c
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=2 until pc=0x5c7a4c
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=-1 until pc=0x5c7adf
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=3 until pc=0x5c7ae0
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=4 until pc=0x5c7af0
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=-1 until pc=0x5c7b9e
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=5 until pc=0x5c7bcc
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         value=-1 until pc=0x5c7c90
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: fatal error: invalid runtime symbol table
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: goroutine 267720 [running]:
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime.throw(0x29eb42e, 0x1c)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/panic.go:1116 +0x72 fp=0xc098a17ac8 sp=0xc098a17a98 pc=0x5c1ed2
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime.pcvalue(0x2f1fce0, 0x4036580, 0xc10012b53f, 0x5c7c9f, 0xc098a17c28, 0x5e0601, 0x5c7c9f, 0x4036580)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/symtab.go:815 +0x59f fp=0xc098a17b78 sp=0xc098a17ac8 pc=0x5e0d5f
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime.pcdatavalue(0x2f1fce0, 0x4036580, 0x2, 0x5c7c9f, 0xc098a17c28, 0x8f300f)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/symtab.go:903 +0x86 fp=0xc098a17bc8 sp=0xc098a17b78 pc=0x5e14c6
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime/pprof.runtime_expandFinalInlineFrame(0xc0974f4550, 0x0, 0x1, 0x8f7097, 0xc0dbc2245f, 0xc141775d98)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/symtab.go:188 +0x186 fp=0xc098a17d58 sp=0xc098a17bc8 pc=0x5f7a66
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime/pprof.(*profileBuilder).appendLocsForStack(0xc141775ce0, 0xc12f6bf200, 0x0, 0x40, 0xc0974f4550, 0x1, 0x1, 0x0, 0xd, 0x40)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/pprof/proto.go:392 +0x7f fp=0xc098a17e08 sp=0xc098a17d58 pc=0x8f41bf
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime/pprof.(*profileBuilder).build(0xc141775ce0)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/pprof/proto.go:365 +0x137 fp=0xc098a17f58 sp=0xc098a17e08 pc=0x8f3c17
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime/pprof.profileWriter(0x2cc7bc0, 0xc1431ca9f0)
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/pprof/pprof.go:813 +0x105 fp=0xc098a17fd0 sp=0xc098a17f58 pc=0x8f0c45
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: runtime.goexit()
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc098a17fd8 sp=0xc098a17fd0 pc=0x5fb021
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: created by runtime/pprof.StartCPUProfile
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]:         /root/.gimme/versions/go1.15.11.linux.amd64/src/runtime/pprof/pprof.go:784 +0x11f
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: goroutine 1 [chan receive, 10 minutes]:
Apr 07 14:57:52 i-069c39b6e7d046679 mindy[14376]: main.main()

@prattmic
Copy link
Member Author

prattmic commented Apr 7, 2021

Ah, that is a slightly different crash location. I think the same fix will make sense there, but I need to think about it a bit more.

@prattmic prattmic reopened this Apr 9, 2021
@gopherbot
Copy link

Change https://golang.org/cl/309109 mentions this issue: runtime: non-strict InlTreeIndex lookup in expandFinalInlineFrame

@prattmic
Copy link
Member Author

prattmic commented Apr 9, 2021

This is indeed the same situation, I've sent http://golang.org/cl/309109 to fix.

I believe all of the remaining uses of _PCDATA_InlTreeIndex are OK. One use is in isAsyncSafePoint, which gets the PC directly from signal context, so it must be a valid PC. The others are all in gentraceback, which I believe is always used with Go PCs/SPs only.

@gopherbot please open backport for 1.16 and 1.15. The only workaround is to change the C traceback engine, which isn't usually feasible. This is a follow-up CL for a previously missed case.

@bcmills
Copy link
Contributor

bcmills commented Apr 10, 2021

Does this issue perhaps account for #27540? (Can that issue be closed as a duplicate?)

@prattmic
Copy link
Member Author

No, that is definitely a different issue, but I left a comment there about a potential fix.

@gopherbot
Copy link

Change https://golang.org/cl/309550 mentions this issue: [release-branch.go1.15] runtime: non-strict InlTreeIndex lookup in expandFinalInlineFrame

@gopherbot
Copy link

Change https://golang.org/cl/309551 mentions this issue: [release-branch.go1.16] runtime: non-strict InlTreeIndex lookup in expandFinalInlineFrame

gopherbot pushed a commit that referenced this issue May 4, 2021
…pandFinalInlineFrame

This is a follow-up to golang.org/cl/301369, which made the same change
in Frames.Next. The same logic applies here: a profile stack may have
been truncated at an invalid PC provided by cgoTraceback.
expandFinalInlineFrame will then try to lookup the inline tree and
crash.

The same fix applies as well: upon encountering a bad PC, simply leave
it as-is and move on.

For #44971
For #45480
Fixes #45482

Change-Id: I2823c67a1f3425466b05384cc6d30f5fc8ee6ddc
Reviewed-on: https://go-review.googlesource.com/c/go/+/309109
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Trust: Michael Pratt <mpratt@google.com>
(cherry picked from commit aad13cb)
Reviewed-on: https://go-review.googlesource.com/c/go/+/309551
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue May 4, 2021
…pandFinalInlineFrame

This is a follow-up to golang.org/cl/301369, which made the same change
in Frames.Next. The same logic applies here: a profile stack may have
been truncated at an invalid PC provided by cgoTraceback.
expandFinalInlineFrame will then try to lookup the inline tree and
crash.

The same fix applies as well: upon encountering a bad PC, simply leave
it as-is and move on.

For #44971
For #45480
Fixes #45481

Change-Id: I2823c67a1f3425466b05384cc6d30f5fc8ee6ddc
Reviewed-on: https://go-review.googlesource.com/c/go/+/309109
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Trust: Michael Pratt <mpratt@google.com>
(cherry picked from commit aad13cb)
Reviewed-on: https://go-review.googlesource.com/c/go/+/309550
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Apr 12, 2022
@prattmic prattmic self-assigned this Jun 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants