Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: invalid pc-encoded table when running Openshift tests using go1.14rc1 on ppc64le #37216

Closed
laboger opened this issue Feb 13, 2020 · 12 comments · Fixed by sthagen/golang-go#155
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Feb 13, 2020

What version of Go are you using (go version)?

go version go1.14rc1 linux/ppc64le

$ go version

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

linux/ppc64le

go env Output
$ go env

What did you do?

Build and tested Openshift using Go 1.14rc1

What did you expect to see?

No failures or legitimate error output.

What did you see instead?

Some failures provide error messages that include the following:

runtime: invalid pc-encoded table f=github.com/openshift/origin/vendor/k8s.io/api/imagepolicy/v1alpha1.(*ImageReview).SetOwnerReferences pc=0x11c0f710 targetpc=0x11c0f9c0 tab=[0/0]0x0
        value=-1 until pc=0x11c0f684
        value=-2 until pc=0x11c0f694
        value=-1 until pc=0x11c0f6c4
        value=0 until pc=0x11c0f6c8
        value=1 until pc=0x11c0f6d4
        value=0 until pc=0x11c0f6d8
        value=1 until pc=0x11c0f6dc
        value=0 until pc=0x11c0f710
fatal error: invalid runtime symbol table

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0x3fffb7f90478
stack: frame={sp:0xc000d7cf20, fp:0xc000d7cf80} stack=[0xc000d76000,0xc000d7e000)
000000c000d7ce20:  000000001005d728 <runtime.sigtrampgo+296>  0000000000000000
000000c000d7ce30:  0000000000000000  0000000000000000
000000c000d7ce40:  000000001005d624 <runtime.sigtrampgo+36>  0000000000000000
000000c000d7ce50:  0000000000000000  0000000000000000
000000c000d7ce60:  0000000000000000  000000c001588300
000000c000d7ce70:  0000000000000000  000000001005d75c <runtime.sigtrampgo+348>
000000c000d7ce80:  0000000012121c00  000000c000d7dd78
000000c000d7ce90:  000000c000d7d000  000000001007bacc <runtime.sigtramp+60>
000000c000d7cea0:  0000000000000000  0000000000000000
000000c000d7ceb0:  0000000000000000  000000c000000017
000000c000d7cec0:  000000c000d7dd78  000000c000d7d000
000000c000d7ced0:  000000c001542f00  0000000000000000
000000c000d7cee0:  0000000000000000  0000000000000000
000000c000d7cef0:  0000000000000000  0000000000000000
000000c000d7cf00:  0000000000000000  000000c001542f00
000000c000d7cf10:  000000c000d7dd78  000000c000d7d000
000000c000d7cf20: <00003fffb7f90478  0000000000000000
000000c000d7cf30:  0000000000000000  0000000000000000
000000c000d7cf40:  0000000000000017  000000c000d7dd78
000000c000d7cf50:  000000c000d7d000  0000000000000000
000000c000d7cf60:  0000000000000000  0000000000000000
000000c000d7cf70:  0000000000000000  0000000000000000
000000c000d7cf80: >00003ffc94f9e4a0  0000000000000000
000000c000d7cf90:  0000000000000000  0000000000000000
000000c000d7cfa0:  0000000000000000  0000000000000000
000000c000d7cfb0:  0000000000000000  0000000000000000
000000c000d7cfc0:  0000000000000000  0000000000000000
000000c000d7cfd0:  0000000000000000  0000000000000000
000000c000d7cfe0:  0000000000000000  0000000000000000
000000c000d7cff0:  0000000000000000  0000000000000000
000000c000d7d000:  0000000000000000  0000000000000000
000000c000d7d010:  000000c000d76000  0000000000000000
000000c000d7d020:  0000000000008000  0000000000000000
000000c000d7d030:  0000000000000000  0000000000000000
000000c000d7d040:  0000000000000000  0000000000000000
000000c000d7d050:  0000000000000000  0000000000000000
000000c000d7d060:  0000000000000000  0000000000000000
000000c000d7d070:  0000000000000000  0000000000000000
runtime.throw(0x1206f12b, 0x1c)
        /home/boger/golang/go1.14rc1/go/src/runtime/panic.go:1112 +0x5c
runtime.pcvalue(0x135f6568, 0x136c1ec0, 0x4825e7, 0x11c0f9c0, 0x0, 0x11c0f901, 0x136c1ec0)
        /home/boger/golang/go1.14rc1/go/src/runtime/symtab.go:726 +0x4b0
runtime.pcdatavalue(0x135f6568, 0x136c1ec0, 0x0, 0x11c0f9c0, 0x0, 0x2)
        /home/boger/golang/go1.14rc1/go/src/runtime/symtab.go:814 +0x94
runtime.isAsyncSafePoint(0xc001542f00, 0x11c0f9c0, 0x3ffc94f9e4a0, 0x11c39350, 0x0)
        /home/boger/golang/go1.14rc1/go/src/runtime/preempt.go:396 +0x100
runtime.doSigPreempt(0xc001542f00, 0xc000d7ce88)
        /home/boger/golang/go1.14rc1/go/src/runtime/signal_unix.go:329 +0x120
runtime.sighandler(0xc000000017, 0xc000d7dd78, 0xc000d7d000, 0xc001542f00)
        /home/boger/golang/go1.14rc1/go/src/runtime/signal_unix.go:536 +0x708
runtime.sigtrampgo(0x17, 0xc000d7dd78, 0xc000d7d000)
        /home/boger/golang/go1.14rc1/go/src/runtime/signal_unix.go:444 +0x18c
runtime: unexpected return pc for runtime.sigtramp called from 0x3fffb7f90478
stack: frame={sp:0xc000d7cf20, fp:0xc000d7cf80} stack=[0xc000d76000,0xc000d7e000)
000000c000d7ce20:  000000001005d728 <runtime.sigtrampgo+296>  0000000000000000
000000c000d7ce30:  0000000000000000  0000000000000000
000000c000d7ce40:  000000001005d624 <runtime.sigtrampgo+36>  0000000000000000
000000c000d7ce50:  0000000000000000  0000000000000000
000000c000d7ce60:  0000000000000000  000000c001588300
000000c000d7ce70:  0000000000000000  000000001005d75c <runtime.sigtrampgo+348>
000000c000d7ce80:  0000000012121c00  000000c000d7dd78
000000c000d7ce90:  000000c000d7d000  000000001007bacc <runtime.sigtramp+60>
000000c000d7cea0:  0000000000000000  0000000000000000
000000c000d7ceb0:  0000000000000000  000000c000000017
000000c000d7cec0:  000000c000d7dd78  000000c000d7d000
000000c000d7ced0:  000000c001542f00  0000000000000000
000000c000d7cee0:  0000000000000000  0000000000000000
000000c000d7cef0:  0000000000000000  0000000000000000
000000c000d7cf00:  0000000000000000  000000c001542f00
000000c000d7cf10:  000000c000d7dd78  000000c000d7d000
000000c000d7cf20: <00003fffb7f90478  0000000000000000
000000c000d7cf30:  0000000000000000  0000000000000000
000000c000d7cf40:  0000000000000017  000000c000d7dd78
000000c000d7cf50:  000000c000d7d000  0000000000000000
000000c000d7cf60:  0000000000000000  0000000000000000
000000c000d7cf70:  0000000000000000  0000000000000000
000000c000d7cf80: >00003ffc94f9e4a0  0000000000000000
000000c000d7cf90:  0000000000000000  0000000000000000
000000c000d7cfa0:  0000000000000000  0000000000000000
000000c000d7cfb0:  0000000000000000  0000000000000000
000000c000d7cfc0:  0000000000000000  0000000000000000
000000c000d7cfd0:  0000000000000000  0000000000000000
000000c000d7cfe0:  0000000000000000  0000000000000000
000000c000d7cff0:  0000000000000000  0000000000000000
000000c000d7d000:  0000000000000000  0000000000000000
000000c000d7d010:  000000c000d76000  0000000000000000
000000c000d7d020:  0000000000008000  0000000000000000
000000c000d7d030:  0000000000000000  0000000000000000
000000c000d7d040:  0000000000000000  0000000000000000
000000c000d7d050:  0000000000000000  0000000000000000
000000c000d7d060:  0000000000000000  0000000000000000
000000c000d7d070:  0000000000000000  0000000000000000
runtime.sigtramp(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/boger/golang/go1.14rc1/go/src/runtime/sys_linux_ppc64x.s:346 +0x3c

goroutine 1235 [running]:
FAIL    github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/registry/core/replicationcontroller/storage    16.260s

The good news is that this bad symbol table output appears for some test failures every time the tests are run. The bad news, it is not always the same test that fails and I can not get the test to fail if it is run by itself, only when the whole set is run.

The bad symbol table message does not happen before commit 14849f0. If I build and test with anything after f511467 the error message occurs. Between these two commits, testing results in the message about unaligned sysUnused.

I am trying to find a smaller reproducer and also see if it fails on x86. It only fails with -race and -d=checkptr=0. If I don't turn off checkptr=0 then the unsafe pointer message occurs.

@cherrymui
Copy link
Member

@laboger could you take a look at what function PC 0x11c0f9c0 belongs to? Does it belong to function github.com/openshift/origin/vendor/k8s.io/api/imagepolicy/v1alpha1.(*ImageReview).SetOwnerReferences? What is the start and end addresses of that function?
Or whatever targetpc and f is, if you get a different invalid pc-encoded table failre. Thanks!

@laboger
Copy link
Contributor Author

laboger commented Feb 13, 2020

Here is what I get on another run:

runtime: invalid pc-encoded table f=github.com/openshift/origin/vendor/k8s.io/client-go/tools/clientcmd.LoadFromFile pc=0x11c0f090 targetpc=0x11c0f360 tab=[0/0]0x0

From the objdump of the test, here is the start and end of the LoadFromFile function:

0000000011c0e920 <github.com/openshift/origin/vendor/k8s.io/client-go/tools/clientcmd.LoadFromFile>:
    11c0e920:   10 00 7e e8     ld      r3,16(r30)
    11c0e924:   18 fe 81 38     addi    r4,r1,-488
    11c0e928:   40 20 23 7c     cmpld   r3,r4

.....

    11c0f088:   20 00 80 4e     blr
    11c0f08c:   00 00 00 00     .long 0x0
    11c0f090:   00 00 00 60     nop
    11c0f094:   00 00 00 60     nop
    11c0f098:   00 00 00 60     nop
    11c0f09c:   00 00 00 60     nop

targetpc is this:

0000000011c0f360 <00000042.plt_call.madvise@@GLIBC_2.17>:
    11c0f360:   18 00 41 f8     std     r2,24(r1)
    11c0f364:   70 82 82 e9     ld      r12,-32144(r2)
    11c0f368:   a6 03 89 7d     mtctr   r12
    11c0f36c:   20 04 80 4e     bctr

@cherrymui
Copy link
Member

@laboger thanks! This is very helpful. Apparently findfunc returns a wrong function for the PC in PLT. I'll look into it.

@cherrymui
Copy link
Member

@laboger is the binary internally linked or externally linked? Is it possible that I could get the binary? (I probably don't need to run it, just examining the content may be enough.) Thanks!

@cherrymui
Copy link
Member

My guess is that our func table assumes the address space of text is contiguous, and if there are functions inserted by the external linker in the middle, the logic falls apart. In particular, if we ask for the pc table for a PC that belongs to the inserted function, it will find the wrong function, because the func table doesn't contain an entry of such function.

@laboger
Copy link
Contributor Author

laboger commented Feb 14, 2020

This program is externally linked. And only happens if the -race option is used, which means it is linking in the .syso file, which is C code built by LLVM. So perhaps that combination is somehow causing the pc table to be inconsistent with the actual functions in the final program. Maybe running some significant tests with -race turned on will reproduce it.

One other note, all the failures I have seen with this message have isAsyncSafePoint on the stack that ends up calling pcvalue.

@laboger
Copy link
Contributor Author

laboger commented Feb 14, 2020

I also found that this program has trampolines due to its text size. So the program is very large, I'm not sure how I would get it to you? You could build it if you want, I put the directions below. It also seems like it would be good to have a test that walks through the pc-encoded table to detect where the errors are.

Building it is not too hard:

mkdir ~/openshift
export GOPATH=~/openshift
cd ~/openshift && mkdir -p src/github.com/openshift
cd src/github.com/openshift
git clone https://github.com/openshift/origin
cd origin
export PERMISSIVE_GO=y
make build-all
GOTEST_FLAGS='-p 8 -gcflags=all=-d=checkptr=0 -c' TIMEOUT=420s TEST_KUBE=true KUBERNETES_SERVICE_HOST= hack/test-go.sh vendor/k8s.io/kubernetes/pkg/registry/core/replicationcontroller/storage

@cherrymui
Copy link
Member

Thanks, @laboger . Yeah, I think the trampolines inserted by the external linker might be the problem. If a preemption signal lands in such trampoline, it may get this error. I'll try to make a fix.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 15, 2020
@dmitshur dmitshur added this to the Backlog milestone Feb 15, 2020
@gopherbot
Copy link

Change https://golang.org/cl/219717 mentions this issue: cmd/link, runtime: skip holes in func table

@cherrymui
Copy link
Member

@laboger does CL https://go-review.googlesource.com/c/go/+/219717 help? Thanks!

@cherrymui cherrymui modified the milestones: Backlog, Go1.14 Feb 16, 2020
@laboger
Copy link
Contributor Author

laboger commented Feb 17, 2020

I ran the full test set 3 times and didn't see the pc-encoded error message. Appears to resolve it.

@cherrymui
Copy link
Member

Thanks @laboger

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants