Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: go1.14rc1 fatal error: invalid runtime symbol table: runtime: unexpected return pc for runtime.sigreturn called from 0x7 #37127

Closed
tonyghita opened this issue Feb 7, 2020 · 16 comments
Labels
FrozenDueToAge OS-Linux WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@tonyghita
Copy link

tonyghita commented Feb 7, 2020

What version of Go are you using (go version)?

$ go version
go1.14rc1 linux/amd64

Does this issue reproduce with the latest release?

Indeed

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build146257936=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Built my application with the latest go1.14rc1 image on the official Docker registry, copied it to an alpine-3.8 based image and ran traffic through it.

What did you expect to see?

No crashes. This application has not exhibited this fatal error before go1.14rc1 (currently on go1.13.7).

What did you see instead?

runtime: invalid pc-encoded table f=runtime.recv pc=0x40625f targetpc=0x40625f tab=[0/0]0x0
value=0 until pc=0x4060c7
value=48 until pc=0x40615f
value=0 until pc=0x406160
value=48 until pc=0x406255
value=0 until pc=0x40625f
fatal error: invalid runtime symbol table
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigreturn called from 0x7
stack: frame={sp:0xc29758b380, fp:0xc29758b388} stack=[0xc297584000,0xc29758c000)
000000c29758b280: 000000c297582000 0000000000000000
000000c29758b290: 000000c29758b4b0 000000c29758b380
000000c29758b2a0: 000000c29758b318 0000000000449e69 <runtime.sigtrampgo+409>
000000c29758b2b0: 000000c20000001b 000000c29758b4b0
000000c29758b2c0: 000000c29758b380 000000c297582180
000000c29758b2d0: 0000000000000000 0000000000000000
000000c29758b2e0: 0000000000000000 0000000000000000
000000c29758b2f0: 0000000000000000 0000000000000000
000000c29758b300: 000000c297582180 000000c29758b4b0
000000c29758b310: 000000c29758b380 000000c29758b370
000000c29758b320: 00000000004687e3 <runtime.sigtramp+67> 000000000000001b
000000c29758b330: 000000c29758b4b0 000000c29758b380
000000c29758b340: 0000000000000002 0000000000000002
000000c29758b350: 000000c297593e2c 00000000135c9af0
000000c29758b360: 000000c29758b370 00007fffea5da080
000000c29758b370: 000000c297593e10 00000000004688d0 <runtime.sigreturn+0>
000000c29758b380: <0000000000000007 >0000000000000000
000000c29758b390: 000000c297584000 0000000000000000
000000c29758b3a0: 0000000000008000 00007fffea5db000
000000c29758b3b0: 00065dca43412f3f 00081e2d2dc1e26b
000000c29758b3c0: 0000000000000001 00000000135c9af0
000000c29758b3d0: 000000c297593e2c 0000000000000002
000000c29758b3e0: 0000000000000002 000000c297593e2c
000000c29758b3f0: 0000000000000048 000000c297593e10
000000c29758b400: 00007fffea5da080 0000000000043e86
000000c29758b410: d77c8ba3287f35d6 00000000ffffffff
000000c29758b420: 000000c297593e10 00007fffea5dd9e7
000000c29758b430: 0000000000000a83 002b000000000033
000000c29758b440: 0000000000000000 0000000000000000
000000c29758b450: 0000000000000000 0000000000000000
000000c29758b460: 000000c29758b540 0000000000000000
000000c29758b470: 0000000000000000 0000000000000000
000000c29758b480: 0000000000000000
runtime.throw(0x1faacbd, 0x1c)
/usr/local/go/src/runtime/panic.go:1112 +0x72
runtime.pcvalue(0x25434a8, 0x357aaa0, 0x1027b5, 0x40625f, 0xc29758adc8, 0x2eb7101, 0x0)
/usr/local/go/src/runtime/symtab.go:726 +0x53a
runtime.funcspdelta(0x25434a8, 0x357aaa0, 0x40625f, 0xc29758adc8, 0xc200000000)
/usr/local/go/src/runtime/symtab.go:780 +0x5f
runtime.gentraceback(0x40625f, 0xc297593e98, 0x0, 0xc297582180, 0x0, 0xc29758b020, 0x40, 0x0, 0x0, 0x6, ...)
/usr/local/go/src/runtime/traceback.go:220 +0x15ae
runtime.sigprof(0x7fffea5dd9e7, 0xc297593e10, 0x0, 0xc297582180, 0xc297580000)
/usr/local/go/src/runtime/proc.go:3897 +0x371
runtime.sighandler(0xc20000001b, 0xc29758b4b0, 0xc29758b380, 0xc297582180)
/usr/local/go/src/runtime/signal_unix.go:522 +0x787
runtime.sigtrampgo(0x1b, 0xc29758b4b0, 0xc29758b380)
/usr/local/go/src/runtime/signal_unix.go:444 +0x199
runtime.sigtramp(0x7, 0x0, 0xc297584000, 0x0, 0x8000, 0x7fffea5db000, 0x65dca43412f3f, 0x81e2d2dc1e26b, 0x1, 0x135c9af0, ...)
/usr/local/go/src/runtime/sys_linux_amd64.s:389 +0x43
runtime: unexpected return pc for runtime.sigreturn called from 0x7
stack: frame={sp:0xc29758b380, fp:0xc29758b388} stack=[0xc297584000,0xc29758c000)
000000c29758b280: 000000c297582000 0000000000000000
000000c29758b290: 000000c29758b4b0 000000c29758b380
000000c29758b2a0: 000000c29758b318 0000000000449e69 <runtime.sigtrampgo+409>
000000c29758b2b0: 000000c20000001b 000000c29758b4b0
000000c29758b2c0: 000000c29758b380 000000c297582180
000000c29758b2d0: 0000000000000000 0000000000000000
000000c29758b2e0: 0000000000000000 0000000000000000
000000c29758b2f0: 0000000000000000 0000000000000000
000000c29758b300: 000000c297582180 000000c29758b4b0
000000c29758b310: 000000c29758b380 000000c29758b370
000000c29758b320: 00000000004687e3 <runtime.sigtramp+67> 000000000000001b
000000c29758b330: 000000c29758b4b0 000000c29758b380
000000c29758b340: 0000000000000002 0000000000000002
000000c29758b350: 000000c297593e2c 00000000135c9af0
000000c29758b360: 000000c29758b370 00007fffea5da080
000000c29758b370: 000000c297593e10 00000000004688d0 <runtime.sigreturn+0>
000000c29758b380: <0000000000000007 >0000000000000000
000000c29758b390: 000000c297584000 0000000000000000
000000c29758b3a0: 0000000000008000 00007fffea5db000
000000c29758b3b0: 00065dca43412f3f 00081e2d2dc1e26b
000000c29758b3c0: 0000000000000001 00000000135c9af0
000000c29758b3d0: 000000c297593e2c 0000000000000002
000000c29758b3e0: 0000000000000002 000000c297593e2c
000000c29758b3f0: 0000000000000048 000000c297593e10
000000c29758b400: 00007fffea5da080 0000000000043e86
000000c29758b410: d77c8ba3287f35d6 00000000ffffffff
000000c29758b420: 000000c297593e10 00007fffea5dd9e7
000000c29758b430: 0000000000000a83 002b000000000033
000000c29758b440: 0000000000000000 0000000000000000
000000c29758b450: 0000000000000000 0000000000000000
000000c29758b460: 000000c29758b540 0000000000000000
000000c29758b470: 0000000000000000 0000000000000000
000000c29758b480: 0000000000000000
runtime.sigreturn(0x0, 0xc297584000, 0x0, 0x8000, 0x7fffea5db000, 0x65dca43412f3f, 0x81e2d2dc1e26b, 0x1, 0x135c9af0, 0xc297593e2c, ...)
/usr/local/go/src/runtime/sys_linux_amd64.s:481
@tonyghita tonyghita changed the title runtime: go1.14rc1 fatal error: invalid runtime symbol table runtime: go1.14rc1 fatal error: invalid runtime symbol table: runtime: unexpected return pc for runtime.sigreturn called from 0x7 Feb 7, 2020
@ianlancetaylor
Copy link
Contributor

CC @aclements @randall77

Is there a way that we can reproduce the problem ourselves?

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 7, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.14 milestone Feb 7, 2020
@tonyghita
Copy link
Author

tonyghita commented Feb 7, 2020

In ~25 minutes of running production traffic at just above 700 request per second per application, this occurred 12 times across the fleet (different instances each time). Each fatal error occurred about 1 to 4 minutes after the application started and received traffic. The application is a non-trivial HTTP API (GraphQL) gateway that aggregates responses from other services.

Each time the specific runtime: invalid pc-encoded table value of f is different, but messages fatal error: invalid runtime symbol table and runtime: unexpected return pc for runtime.sigreturn called from 0x7 appear in every case.

Open to ideas for reproducing this case more minimally. Wild uneducated guess, but I saw a similar issue (#27540) call out profiling as a possible cause, which we do in this application (pprof on an interval with block and mutex profiles enabled). It has not been an issue until go1.14rc1 though.

@laboger
Copy link
Contributor

laboger commented Feb 11, 2020

We see this error in testing Openshift when compiled with go1.14rc1 on ppc64le. By default this testing is done with the -race option on, and that results in a few dozen errors about unsafe pointer arithmetic, since -race now turns on checkptr testing.

If I then run the tests with checkptr=0 along with -race, then I see the same with the invalid pc-encoded symbol table as above.

I'm still trying to isolate the conditions which cause this consistently because from run to run the failures are sometimes different. So far it has only happens when running all the tests, if I try to just run a test that previously failed by itself it does not fail. If I test without -race then different errors occur.

@cherrymui
Copy link
Member

I think this may be related to CL https://go-review.googlesource.com/c/go/+/212079 . The code saving vdsoPC in walltime1/nanotime1 assumes the function has no frame
https://go-review.googlesource.com/c/go/+/212079/3/src/runtime/sys_linux_amd64.s#224
Now the functions do have frame, so the offset needs adjustment.

That said, if my assumption is right, the PPC64 failure is probably a different one.

@gopherbot
Copy link

Change https://golang.org/cl/219118 mentions this issue: runtime: correct caller PC/SP offsets in walltime1/nanotime1

@dmitshur
Copy link
Contributor

dmitshur commented Feb 13, 2020

The text in commit message of CL 219118 included:

May fix #37127.

Unfortunately, GitHub parses that as "Fix[es] #37127." and automatically closed this issue. I don't think that was intended, so re-opening. /cc @cherrymui

@dmitshur dmitshur reopened this Feb 13, 2020
@cherrymui
Copy link
Member

Given that the crash is hard to reproduce, I cannot tell for sure whether the problem goes away with this, so "may fix". I guess we can close this and, if it didn't work, reopen it. Or we keep it open and let the reporter confirm.

@laboger
Copy link
Contributor

laboger commented Feb 13, 2020

I am still working on trying to narrow down the failure that happens on ppc64le. Should I open a separate issue for that?

I believe it began when the page allocator changed in early November but still trying to verify and find a smaller reproducer.

@cherrymui
Copy link
Member

@laboger yeah, I think it's better to open a new one. Quick question: do you have profiling turned on? If not, it is clearly a different problem.

@laboger
Copy link
Contributor

laboger commented Feb 13, 2020

@cherrymui It is not happening with profiling but fails with -race -d=checkptr=0. If I turn off -race it gets a different error. I'll open a new issue. It is the same error output about the symbol table.

@tonyghita
Copy link
Author

I'm eager to try your fix @cherrymui. Will this be released in something like an RC2?

@cherrymui
Copy link
Member

Yeah, there might be one ~next week or so. You could also check out Go tip, which is very close to 1.14rc1, with just a small number of fixes. Thanks!

@aclements
Copy link
Member

Removing release-blocker, since we think this is probably fixed. If it turns out not to be, we can continue working on this and perhaps issue a fix in a point release.

@dmitshur dmitshur added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 14, 2020
@tonyghita
Copy link
Author

Hey all, just tested the changes in tip and wanted to confirm that I no longer see the problem on linux/amd64 with my workload. Good find and thank you!

@dmitshur
Copy link
Contributor

Thanks for confirming @tonyghita!

I believe this issue is resolved then. It was waiting on feedback from you, the original reporter. I'll close it because there's nothing left to do. If there's anything else, please let us know!

@cherrymui
Copy link
Member

Thanks @tonyghita

@golang golang locked and limited conversation to collaborators Feb 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge OS-Linux WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

7 participants