Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: invalid test binaries built for Openshift with new linker on ppc64le #38898

Closed
laboger opened this issue May 6, 2020 · 4 comments
Closed

Comments

@laboger
Copy link
Contributor

laboger commented May 6, 2020

What version of Go are you using (go version)?

$ go version
go version devel +0f47c12a29 Wed May 6 04:34:54 2020 +0000 linux/ppc64le (latest)

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
linux/ppc64le
Ubuntu 18.04

What did you do?

Build and test of Openshift on linux/ppc64le

mkdir -p ~/openshift/src/github.com/openshift
cd ~/openshift/src/github.com/openshift
git clone https://github.com/openshift/origin
cd origin; export GOPATH=~/openshift
export PERMISSIVE_GO=y
make build-all

To run all the tests:
GOTEST_FLAGS='-p 8 -gcflags=all=-d=checkptr=0' TIMEOUT=360s TEST_KUBE=true KUBERNETES_SERVICE_HOST= hack/test-go.sh

Building one test that fails with SIGILL:
GOTEST_FLAGS='-c -p 8 -gcflags=all=-d=checkptr=0' TIMEOUT=360s TEST_KUBE=true KUBERNETES_SERVICE_HOST= hack/test-go.sh vendor/k8s.io/kubernetes/pkg/cloudprovider/providers

What did you expect to see?

Mostly correct run of testcases. There are some that fail due to other reasons, but no SIGILLs until recently.

What did you see instead?

Some tests fail with SIGILL. Further investigation shows that the binaries are built incorrectly, some have portions that should contain valid instructions but do not. It appears that these are cases where the text sections are excessively large where special handling is supposed to happen on ppc64le.

The SIGILL happens due to a corrupt binary. At one point the program tries to jump to an address which is not a valid instruction stream. Likely due to size issues.

For example, if I do objdump -D on the providers.test built as shown above:

    12512c98:   20 00 81 f8     std     r4,32(r1)
    12512c9c:   55 d9 6f 4b     bl      11c105f0 <runtime.raceread-tramp0>
    12512ca0:   c0 00 61 e8     ld      r3,192(r1)
    12512ca4:   10 00 83 e8     ld      r4,16(r3)
    12512ca8:   30 01 24 2c     cmpdi   r4,304
    12512cac:   01 00 80 38     li      r4,1
    12512cb0:   9e 00 c4 7c     iseleq  r6,r4,r0
    12512cb4:   58 00 81 e8     ld      r4,88(r1)
    12512cb8:   d0 00 a1 e8     ld      r5,208(r1)
    12512cbc:   20 fa ff 4b     b       125126dc <github.com/openshift/origin/vendor/google.golang.org/api/container/v1.(*ProjectsZonesClustersAddonsCall).Do+0xac>

0000000012512cc0 <github.com/openshift/origin/vendor/google.golang.org/api/container/v1.(*ProjectsZonesClustersService).CompleteIpRotation>:
        ...
    12512cd0:   00 d9 6f 03     .long 0x36fd900
        ...
    12512ce0:   00 d9 6f 03     .long 0x36fd900
    12512ce4:   fc db 6f 03     .long 0x36fdbfc
        ...
    12512cf0:   82 13 00 00     .long 0x1382
    12512cf4:   60 c1 00 00     .long 0xc160
    12512cf8:   00 00 00 00     .long 0x0
    12512cfc:   44 d9 6f 03     .long 0x36fd944
        ...
    12512d14:   3c d9 6f 03     .long 0x36fd93c
    12512d18:   29 16 00 00     .long 0x1629
    12512d1c:   30 5d 00 00     .long 0x5d30
        ...

Using gdb on this program:

Thread 1 "providers.test" received signal SIGILL, Illegal instruction.
0x0000000012544010 in github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init ()
(gdb) x/i $pc
=> 0x12544010 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init>:	.long 0x0
(gdb) info reg $lr
lr             0x1005536c	0x1005536c <runtime.doInit+172>
(gdb) x/20ni $pc-24
   0x12543ff8 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init.0.func1+952>:	.long 0x0
   0x12543ffc <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init.0.func1+956>:	.long 0x0
   0x12544000 <runtime.cmpstring-tramp0>:	.long 0x1001
   0x12544004 <runtime.cmpstring-tramp0+4>:	.long 0x1710
   0x12544008 <runtime.cmpstring-tramp0+8>:	.long 0x0
   0x1254400c <runtime.cmpstring-tramp0+12>:	.long 0x0
=> 0x12544010 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init>:	.long 0x0
   0x12544014 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+4>:	.long 0x0
   0x12544018 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+8>:	.long 0x0
   0x1254401c <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+12>:	.long 0x0
   0x12544020 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+16>:	.long 0x0
   0x12544024 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+20>:	.long 0x36cc5ac
   0x12544028 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+24>:	.long 0x0
   0x1254402c <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+28>:	.long 0x0
   0x12544030 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+32>:	.long 0x0
   0x12544034 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+36>:	.long 0x36cc5ac
   0x12544038 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+40>:	.long 0x139d
   0x1254403c <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+44>:	.long 0x4afa
   0x12544040 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+48>:	.long 0x0
   0x12544044 <github.com/openshift/origin/vendor/github.com/GoogleCloudPlatform/k8s-cloud-provider/pkg/cloud/meta.init+52>:	.long 0x0

I was able to avoid the SIGILL failures by using the options to enable the old linker as @cherrymui requested (-gcflags=all=-go115newobj=false -asmflags=all=-go115newobj=false -ldflags=all=-go115newobj=false).

@cherrymui
Copy link
Member

Thanks for the report. And thanks for trying the old linker.

It seems there is some issue with trampoline generation, which is not surprising. This is the part that I'm not really confident with, as I couldn't test it on some very large binaries.

I'll take a look today. Thanks.

@cherrymui
Copy link
Member

Does the following patch help anything?

diff --git a/src/cmd/link/internal/ppc64/asm.go b/src/cmd/link/internal/ppc64/asm.go
index 4dc50eab79..bd4827ecb5 100644
--- a/src/cmd/link/internal/ppc64/asm.go
+++ b/src/cmd/link/internal/ppc64/asm.go
@@ -1098,11 +1098,6 @@ func asmb(ctxt *ld.Link, _ *loader.Loader) {
 		}
 	}
 
-	for _, sect := range ld.Segtext.Sections[1:] {
-		offset := sect.Vaddr - ld.Segtext.Vaddr + ld.Segtext.Fileoff
-		ld.WriteParallel(&wg, ld.Datblk, ctxt, offset, sect.Vaddr, sect.Length)
-	}
-
 	if ld.Segrodata.Filelen > 0 {
 		ld.WriteParallel(&wg, ld.Datblk, ctxt, ld.Segrodata.Fileoff, ld.Segrodata.Vaddr, ld.Segrodata.Filelen)
 	}

I haven't figured out anything, but that code looks suspicious.

@gopherbot
Copy link

Change https://golang.org/cl/232661 mentions this issue: cmd/link: don't overwrite text sections on PPC64

@laboger
Copy link
Contributor Author

laboger commented May 7, 2020

The CL fixes the problem, thanks.

@golang golang locked and limited conversation to collaborators May 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants