Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: arm5 sigill #19674

Closed
zeebo opened this issue Mar 23, 2017 · 36 comments
Closed

runtime/cgo: arm5 sigill #19674

zeebo opened this issue Mar 23, 2017 · 36 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@zeebo
Copy link
Contributor

zeebo commented Mar 23, 2017

What version of Go are you using (go version)?

current-ish master. see the linux-arm-arm5spacemonkey column on https://build.golang.org/

What operating system and processor architecture are you using (go env)?

GOARCH="arm"
GOOS="linux"
GOARM=5

What did you do?

ran all.bash

What did you expect to see?

the tests to pass

What did you see instead?

##### ../misc/cgo/testplugin
SIGILL: illegal instruction
PC=0x1bc0b4 m=0 sigcode=1
signal arrived during cgo execution

goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x11ce20, 0x10545c54, 0x1050e108)
    /home/builder/stage0scratch/tmp/workdir/go/src/runtime/cgocall.go:132 +0xb8 fp=0x10545c3c sp=0x10545c20
plugin._Cfunc_pluginOpen(0x1db158, 0x1050e108, 0x0)
    plugin/_obj/_cgo_gotypes.go:98 +0x38 fp=0x10545c50 sp=0x10545c3c
plugin.open.func3(0x1db158, 0x1050e108, 0x10510140)
    /home/builder/stage0scratch/tmp/workdir/go/src/plugin/plugin_dlopen.go:61 +0x8c fp=0x10545c6c sp=0x10545c50
plugin.open(0x1422e1, 0xa, 0x0, 0x0, 0x0)
    /home/builder/stage0scratch/tmp/workdir/go/src/plugin/plugin_dlopen.go:61 +0x14c fp=0x10545d9c sp=0x10545c6c
plugin.Open(0x1422e1, 0xa, 0x0, 0x0, 0x0)
    /home/builder/stage0scratch/tmp/workdir/go/src/plugin/plugin.go:30 +0x24 fp=0x10545db4 sp=0x10545d9c
main.main()
    /home/builder/stage0scratch/tmp/workdir/go/misc/cgo/testplugin/src/host/host.go:55 +0x38 fp=0x10545fbc sp=0x10545db4
runtime.main()
    /home/builder/stage0scratch/tmp/workdir/go/src/runtime/proc.go:185 +0x1e0 fp=0x10545fe4 sp=0x10545fbc
runtime.goexit()
    /home/builder/stage0scratch/tmp/workdir/go/src/runtime/asm_arm.s:970 +0x4 fp=0x10545fe4 sp=0x10545fe4

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /home/builder/stage0scratch/tmp/workdir/go/src/runtime/asm_arm.s:970 +0x4
@bradfitz bradfitz changed the title cgo: arm5 sigill runtime/cgo: arm5 sigill Mar 23, 2017
@bradfitz bradfitz added this to the Go1.9 milestone Mar 23, 2017
@bradfitz
Copy link
Contributor

For example,

linux-arm-arm5spacemonkey at d0ff9ec
https://build.golang.org/log/70a355740ca968705fdd530ea48828136bad9a7a

Notably, these are the first real ARM5 builders we've ever had. The old "arm5" builders we had were just modern-ish ARMv6-or-7-or-something (Scaleway C1) machines running the ARM5 binaries, which meant we got away with cheating (illegal instructions) without realizing it.

So this is probably some codegen not respecting the $GOARM=="5".

/cc @ianlancetaylor @randall77 @josharian @minux @cherrymui

@bradfitz
Copy link
Contributor

(That's one example, but the whole build column is like that.)

@bradfitz
Copy link
Contributor

Oh, looking closer finally, this is cgo/testplugin, and during a cgo call, so the C compiler is not generating the ARM5 code. I guess we might need to pass down special flags to the C compiler to change its target processor version?

@ianlancetaylor?

Is this something that cmd/go needs to do?

@ianlancetaylor
Copy link
Contributor

There are a few places in the toolchain that pass architecture-specific options to the C compiler:

  • Builder.gccArchArgs in cmd/go/internal/work/build.go
  • hostlinkArchArgs in cmd/link/internal/lib.go
  • Package.gccMachine in cmd/cgo/gcc.go

I think it would be reasonable to modify those places (or unify them!) and arrange that if GOARM=5 we pass -march=armv5.

@bradfitz
Copy link
Contributor

@zeebo, interested in tackling this?

@zeebo
Copy link
Contributor Author

zeebo commented Mar 23, 2017

Sure.

@azdagron
Copy link

azdagron commented Mar 25, 2017

We have an initial patchset ready for this but even though -march=armv5 is being passed, it still doesn't seem to fix the issue:

# ~/go/bin/go env
GOARCH="arm"
GOBIN=""
GOEXE=""
GOHOSTARCH="arm"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/root/go"
GOTOOLDIR="/root/go/pkg/tool/linux_arm"
GCCGO="gccgo"
GOARM=""
CC="gcc"
GOGCCFLAGS="-fPIC -marm -march=armv5 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build236399254=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

# ~/go/bin/go tool dist test -run testplugin

##### ../misc/cgo/testplugin
SIGILL: illegal instruction
PC=0x1bb608 m=0 sigcode=1
signal arrived during cgo execution

goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x11cdd0, 0x10543c54, 0x1050e108)
	/root/go/src/runtime/cgocall.go:132 +0xb8 fp=0x10543c3c sp=0x10543c20
plugin._Cfunc_pluginOpen(0x1da158, 0x1050e108, 0x0)
	plugin/_obj/_cgo_gotypes.go:98 +0x38 fp=0x10543c50 sp=0x10543c3c
plugin.open.func3(0x1da158, 0x1050e108, 0x10512180)
	/root/go/src/plugin/plugin_dlopen.go:61 +0x8c fp=0x10543c6c sp=0x10543c50
plugin.open(0x142241, 0xa, 0x0, 0x0, 0x0)
	/root/go/src/plugin/plugin_dlopen.go:61 +0x14c fp=0x10543d9c sp=0x10543c6c
plugin.Open(0x142241, 0xa, 0x0, 0x0, 0x0)
	/root/go/src/plugin/plugin.go:30 +0x24 fp=0x10543db4 sp=0x10543d9c
main.main()
	/root/go/misc/cgo/testplugin/src/host/host.go:55 +0x38 fp=0x10543fbc sp=0x10543db4
runtime.main()
	/root/go/src/runtime/proc.go:185 +0x1e0 fp=0x10543fe4 sp=0x10543fbc
runtime.goexit()
	/root/go/src/runtime/asm_arm.s:970 +0x4 fp=0x10543fe4 sp=0x10543fe4

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
	/root/go/src/runtime/asm_arm.s:970 +0x4

trap    0x6
error   0x0
oldmask 0x0
r0      0xb5d67c30
r1      0x1b8ae8
r2      0x1b8ae8
r3      0x0
r4      0xb5d670b8
r5      0x1d599c
r6      0x1d5978
r7      0x12e6e8
r8      0xb5d767b0
r9      0xb5d3c2b4
r10     0xb5d3c468
fp      0xb5d76799
ip      0xb5d11430
sp      0xb5d66f74
lr      0x56599
pc      0x1bb608
cpsr    0x10
fault   0x0
2017/03/24 23:55:46 Failed: exit status 2
2017/03/24 23:55:46 FAILED

Here is the diff (with some stuff elided, like the deps changes):

diff --git a/src/cmd/cgo/gcc.go b/src/cmd/cgo/gcc.go
index a740748..849a6ab 100644
--- a/src/cmd/cgo/gcc.go
+++ b/src/cmd/cgo/gcc.go
@@ -9,6 +9,7 @@ package main
 
 import (
        "bytes"
+       "cmd/internal/obj"
        "debug/dwarf"
        "debug/elf"
        "debug/macho"
@@ -1202,7 +1203,11 @@ func (p *Package) gccMachine() []string {
        case "386":
                return []string{"-m32"}
        case "arm":
-               return []string{"-marm"} // not thumb
+               args := []string{"-marm"} // not thumb
+               if obj.GOARM == 5 {
+                       args = append(args, "-march=armv5")
+               }
+               return args
        case "s390":
                return []string{"-m31"}
        case "s390x":
diff --git a/src/cmd/go/internal/work/build.go b/src/cmd/go/internal/work/build.go
index c09d8d3..1177141 100644
--- a/src/cmd/go/internal/work/build.go
+++ b/src/cmd/go/internal/work/build.go
@@ -3074,7 +3074,11 @@ func (b *Builder) gccArchArgs() []string {
        case "amd64", "amd64p32":
                return []string{"-m64"}
        case "arm":
-               return []string{"-marm"} // not thumb
+               args := []string{"-marm"} // not thumb
+               if obj.GOARM == 5 {
+                       args = append(args, "-march=armv5")
+               }
+               return args
        case "s390x":
                return []string{"-m64", "-march=z196"}
        case "mips64", "mips64le":
diff --git a/src/cmd/link/internal/ld/lib.go b/src/cmd/link/internal/ld/lib.go
index 7f05682..8cec542 100644
--- a/src/cmd/link/internal/ld/lib.go
+++ b/src/cmd/link/internal/ld/lib.go
@@ -1257,7 +1257,11 @@ func hostlinkArchArgs() []string {
        case sys.AMD64, sys.PPC64, sys.S390X:
                return []string{"-m64"}
        case sys.ARM:
-               return []string{"-marm"}
+               args := []string{"-marm"}
+               if obj.GOARM == 5 {
+                       args = append(args, "-march=armv5")
+               }
+               return args
        case sys.ARM64:
                // nothing needed
        case sys.MIPS64:

@ianlancetaylor
Copy link
Contributor

What is the instruction at PC value 0x1bb608?

@zeebo
Copy link
Contributor Author

zeebo commented Mar 25, 2017

Is this the right thing?

(gdb) disas 0x1bb608
Dump of assembler code for function runtime.finalizer1:
=> 0x001bb608 <+0>:                     ; <UNDEFINED> instruction: 0xf7bdef7b
   0x001bb60c <+4>:     ldrdeq  r0, [r0], -lr
End of assembler dump.

I figured out how to keep the binary/so files around. Would a copy of those be helpful?

@zeebo
Copy link
Contributor Author

zeebo commented Mar 25, 2017

Poking around with gdb some more, I can't get the program to finish the topmost stack frame at

#0  _dl_init (main_map=main_map@entry=0x1db1a8, argc=1, argv=0xbefff834, env=0xbefff83c) at dl-init.c:86
#1  0xb6fe3838 in dl_open_worker (a=<optimized out>) at dl-open.c:577
#2  0xb6fdf080 in _dl_catch_error (objname=0xb6fdf080 <_dl_catch_error+120>, objname@entry=0xbefff3ec, errstring=0xb6ff5518, errstring@entry=0xbefff3f0, mallocedp=0xbefff3ec, mallocedp@entry=0xbefff3eb, operate=0xbefff3eb, args=args@entry=0xbefff3f4) at dl-error.c:187
#3  0xb6fe2f00 in _dl_open (file=0x1da158 "/root/go/misc/cgo/testplugin/plugin1.so", mode=-2147483390, caller_dlopen=0x11ce20 <_cgo_e64586f1776f_Cfunc_pluginOpen+80>, nsid=-2, argc=1, argv=0xbefff834, env=0xbefff83c) at dl-open.c:661
#4  0xb6fbcbc0 in dlopen_doit (a=0xbefff640) at dlopen.c:66
#5  0xb6fdf080 in _dl_catch_error (objname=0xb6fdf080 <_dl_catch_error+120>, errstring=0xb6ff5518, mallocedp=0x1db16c, operate=0x1db168, args=0xbefff640) at dl-error.c:187
#6  0xb6fbd30c in _dlerror_run (operate=0xb6fbcb40 <dlopen_doit>, args=args@entry=0xbefff640) at dlerror.c:163
#7  0xb6fbcc90 in __dlopen (file=<optimized out>, mode=mode@entry=258) at dlopen.c:87
#8  0x0011ce20 in pluginOpen (err=0x1050e108, path=<optimized out>) at /root/go/src/plugin/plugin_dlopen.go:19
#9  _cgo_e64586f1776f_Cfunc_pluginOpen (v=0x10545c54) at cgo-gcc-prolog:72
#10 0x000c3d5c in runtime.asmcgocall () at /root/go/src/runtime/asm_arm.s:526
#11 0x00000020 in ?? ()

This seems to be the jump that causes stuff to go wrong (it picks a different address to jump to when I have the breakpoint set. No idea):

(gdb) start
Temporary breakpoint 26 at 0x11b824: file /root/go/misc/cgo/testplugin/src/host/host.go, line 51.
Starting program: /root/go/misc/cgo/testplugin/host
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabi/libthread_db.so.1".
[New Thread 0xb6de9460 (LWP 3071)]
[New Thread 0xb65e9460 (LWP 3072)]

Temporary breakpoint 26, main.main () at /root/go/misc/cgo/testplugin/src/host/host.go:51
51              if got, want := common.X, 3*5; got != want {
(gdb) b *0xb6fe3870
Breakpoint 27 at 0xb6fe3870: file dl-open.c, line 601.
(gdb) c
Continuing.
Breakpoint 27, 0xb6fe3870 in dl_open_worker (a=<optimized out>) at dl-open.c:601
(gdb) info registers
r0             0xb5d97c30       3050929200
r1             0x1b8ae8 1805032
r2             0x1b8ae8 1805032
r3             0x0      0
r4             0xb5d970b8       3050926264
r5             0x1d599c 1923484
r6             0x1d5978 1923448
r7             0x12e6e8 1238760
r8             0xb5da67b0       3050989488
r9             0xb5d6c2b4       3050750644
r10            0xb5d6c468       3050751080
r11            0xb5da6799       3050989465
r12            0xb5d41430       3050574896
sp             0xb5d96f74       0xb5d96f74
lr             0x1bb5dc 1816028
pc             0xb6fe3870       0xb6fe3870 <dl_open_worker+876>
cpsr           0x40000010       1073741840
(gdb) disas 0xb6fe3870, 0xb6fe3874
Dump of assembler code from 0xb6fe3870 to 0xb6fe3874:
   0xb6fe3870 <dl_open_worker+876>:     bx      lr
(gdb) si
0x001bb5dc in runtime.lastmoduledatap ()
(gdb) si
0x001bb5e0 in runtime.maxstacksize ()
(gdb)

Program received signal SIGILL, Illegal instruction.
0x001bb5e0 in runtime.maxstacksize ()

Let me know if there's anything useful I can do to help debug this more, because I don't really know what I'm looking at or if any of this is helpful. 😃

@azdagron
Copy link

@ianlancetaylor, would you like SSH access to the builder? We're happy to get you on there if that will help. If so, get me your SSH pubkey and I'll get an account set up.

@zeebo
Copy link
Contributor Author

zeebo commented Apr 4, 2017

Ping? We're stuck on this.

@ianlancetaylor
Copy link
Contributor

If you are stuck on this because misc/cgo/testplugin is failing on ARM5, then I think the answer is to disable that test. It's clear that plugin support is very patchy at the moment, and we shouldn't let plugin test failures hold us up anywhere. (Does the test fail on normal ARM? I know very little about Go on ARM myself.)

The comment above suggests that the program is somehow trying to execute the code at runtime.finalizer1, but that is a variable (a []byte), not a function. So the program is crashing because it is trying to execute data as though it were code.

@ianlancetaylor
Copy link
Contributor

CC @crawshaw because this appears to be plugin related.

@gopherbot
Copy link

CL https://golang.org/cl/39716 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 6, 2017
Plugin support is patchy at the moment, so disable the test for
now until the test can be fixed. This way, we can get builders
for ARMv5 running for the rest of the code.

Updates #19674

Change-Id: I08aa211c08a85688656afe2ad2e680a2a6e5dfac
Reviewed-on: https://go-review.googlesource.com/39716
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@minux
Copy link
Member

minux commented Apr 10, 2017 via email

lparth pushed a commit to lparth/go that referenced this issue Apr 13, 2017
Plugin support is patchy at the moment, so disable the test for
now until the test can be fixed. This way, we can get builders
for ARMv5 running for the rest of the code.

Updates golang#19674

Change-Id: I08aa211c08a85688656afe2ad2e680a2a6e5dfac
Reviewed-on: https://go-review.googlesource.com/39716
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@bradfitz
Copy link
Contributor

bradfitz commented Jun 7, 2017

@randall77, got any time to check this out?

Looks like it might be the Go compiler's fault and not cmd/go failing to pass down flags to gcc.

@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 7, 2017
@randall77
Copy link
Contributor

runtime.finalizer1 isn't code, it is data. Disassembling it is sure to give you junk.
If the PC actually reached this global variable, something else is very wrong.

@aclements
Copy link
Member

(gdb) info registers
lr             0x1bb5dc 1816028
pc             0xb6fe3870       0xb6fe3870 <dl_open_worker+876>
(gdb) disas 0xb6fe3870, 0xb6fe3874
Dump of assembler code from 0xb6fe3870 to 0xb6fe3874:
   0xb6fe3870 <dl_open_worker+876>:     bx      lr
(gdb) si
0x001bb5dc in runtime.lastmoduledatap ()

runtime.lastmoduledatap is also a data symbol. But the branch did what it was told to, so why was lr set to runtime.lastmoduledatap?

@aclements
Copy link
Member

@zeebo, a few questions:

  1. Which version of libc are you using? (So we can match up source lines.)

  2. Can you post the disassembly of all of dl_open_worker? I'm particularly interested in the instructions leading up to the bx lr, but might as well post the whole thing.

  3. Upon entry to dl_open_worker, what are the registers? (In particular lr, but, again, might as well dump them all.)

@zeebo
Copy link
Contributor Author

zeebo commented Jun 14, 2017

  1. from dpkg, 2.19-18+deb8u7.

  2. https://gist.github.com/zeebo/9bf6059544521f0b311efe4e6953f312

Breakpoint 1, dl_open_worker (a=0xbefff924) at dl-open.c:197
197	dl-open.c: No such file or directory.
(gdb) info registers
r0             0xbefff924	3204446500
r1             0x0	0
r2             0x97	151
r3             0xb6fe3504	3070113028
r4             0x0	0
r5             0x80000102	2147483906
r6             0xb6fff050	3070226512
r7             0x1d9158	1937752
r8             0xb6fbcbc0	3069955008
r9             0x122e30	1191472
r10            0x1c62c8	1860296
r11            0xbefff964	3204446564
r12            0xbefff7a8	3204446120
sp             0xbefff750	0xbefff750
lr             0xb6fdf080	3070095488
pc             0xb6fe3504	0xb6fe3504 <dl_open_worker>
cpsr           0x60000010	1610612752

I took the liberty of running a disas on the contents of lr there: https://gist.github.com/zeebo/52f7436e9392cc821b6dfe3af05b5963

@zeebo
Copy link
Contributor Author

zeebo commented Jun 14, 2017

Also if you would like, I can probably arrange getting you a shell in to the hardware that is having the issues. Does that sound like a good idea?

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 Jul 6, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jul 6, 2017

@zeebo, yes, that'd probably be helpful, but @ianlancetaylor is on vacation now.

If I ever finish gomote ssh suppport we'd have access to your existing builders. :-/

@aclements
Copy link
Member

from dpkg, 2.19-18+deb8u7.

Thanks. Turns out dl-open.c:601 is just the close brace of dl_open_worker. Based on this and the disassembly, this looks like it's just a common epilogue, which is too bad.

https://gist.github.com/zeebo/9bf6059544521f0b311efe4e6953f312

Looks like it's not doing anything funny with LR here. It was pushed on the stack in the prologue and popped from the stack in the epilogue. This suggests either the stack slot got corrupted or the SP itself did.

I took the liberty of running a disas on the contents of lr there: https://gist.github.com/zeebo/52f7436e9392cc821b6dfe3af05b5963

Thanks. This is exactly what I would expect it to be, so the LR is sane on entry.

@zeebo, a shell would be useful. Alternatively, could I get you to break on entry to dl_open_worker again and run this snippet in gdb:

python while True: map(gdb.execute, ["nexti", "bt 1", "info reg", "x/12x $sp"])

It should dump a few thousand lines giving the register state and top-of-stack at every instruction in dl_open_worker so we can trace the exact path through it. You might want to set pagination off and set logging on to capture it.

@zeebo
Copy link
Contributor Author

zeebo commented Jul 6, 2017

I ran the python gdb snippet and captured the output here: https://gist.github.com/zeebo/c5b3bd0ff0658132e8794ee39bf3df4a

I've also set you up with a user account on the builder. I used your ssh keys from github, so you should be able to ssh in with

ssh -p 11046 aclements@relay002b.spacemonkey.com

There is a checkout of the go repository in ~/go with some modifications to keep the failing binary around, and the binary that is failing is ~/go/misc/cgo/testplugin/host

Let me know if you run in to any issues. I'm also on the gophers slack as zeebo if you want a less asynchronous communication environment.

@aclements
Copy link
Member

I ran the python gdb snippet and captured the output here: https://gist.github.com/zeebo/c5b3bd0ff0658132e8794ee39bf3df4a

Perfect. Curiously, it looks like this traced into _dl_init and then failed without getting back out to dl_open_worker. _dl_init makes all sorts of indirect calls to initialization functions provided in tables in the .so, so it wouldn't surprise me if something is slightly off here and tromping on the stack.

I've also set you up with a user account on the builder.

Awesome. Though it's giving me "ssh_exchange_identification: read: Connection reset by peer". If I try connecting directly, I don't get the OpenSSH server identification (tried from two different hosts on very different networks).

@zeebo
Copy link
Contributor Author

zeebo commented Jul 6, 2017

Ok, sorry about the relay issues. Maybe try this one relay005.spacemonkey.com:14975

@aclements
Copy link
Member

$ ssh -p 14975 aclements@relay005.spacemonkey.com
ssh: connect to host relay005.spacemonkey.com port 14975: Connection refused

:(

@zeebo
Copy link
Contributor Author

zeebo commented Jul 7, 2017

Alright, apparently the binary I'm using to forward connections silently decides to stop working after some point in time.

Instead, I have set up a wacky system of ssh reverse tunnels and socat, but maybe it's more reliable. The address is now relay001.spacemonkey.com:7001.

@aclements
Copy link
Member

The address is now relay001.spacemonkey.com:7001.

Perfect. That's working for me.

So far all I've managed to figure out is that SP goes bonkers at some point during dl_open_worker, which leads to it restoring a bonkers LR. Strangely, the SP it winds up at is just slightly below the stack of a different M from the one that's running dl_open_worker.

I haven't been able to figure out where it goes bonkers. However, I suspect this is why my Python snippet fails to nexti over the _dl_init.

@aclements
Copy link
Member

Found where it goes bonkers as soon as I sent that.

Entering the function epilogue, SP still good:

#0  dl_open_worker (a=<optimized out>) at dl-open.c:584
r0             0xb5d96b98	3050924952
r1             0x1b7800	1800192
r2             0x1b7800	1800192
r3             0x0	0
r4             0x1da1b0	1941936
r5             0xb6fff050	3070226512
r6             0x4	4
r7             0x10	16
r8             0xb6fff568	3070227816
r9             0x194	404
r10            0xb6ffed98	3070225816
r11            0xb5d95f70	3050921840
r12            0xb5d40dd0	3050573264
sp             0xbefff128	0xbefff128
lr             0xb6fe3838	3070113848
pc             0xb6fe3864	0xb6fe3864 <dl_open_worker+864>
cpsr           0x40000010	1073741840
0xbefff128:	0xbefff18c	0xb6fe3b18	0x001da1b0	0x00000000
0xbefff138:	0x00000000	0x00000000	0x00000000	0x00000000
0xbefff148:	0x00000000	0xbefff130	0x00000004	0x00000000

Execute ldr sp, [r11, #-64] ; 0x40 (sp = *(r11 - 64))

#0  dl_open_worker (a=<optimized out>) at dl-open.c:601
r0             0xb5d96b98	3050924952
r1             0x1b7800	1800192
r2             0x1b7800	1800192
r3             0x0	0
r4             0x1da1b0	1941936
r5             0xb6fff050	3070226512
r6             0x4	4
r7             0x10	16
r8             0xb6fff568	3070227816
r9             0x194	404
r10            0xb6ffed98	3070225816
r11            0xb5d95f70	3050921840
r12            0xb5d40dd0	3050573264
sp             0x15d8e8	0x15d8e8 <syscall.statictmp_44>
lr             0xb6fe3838	3070113848
pc             0xb6fe3868	0xb6fe3868 <dl_open_worker+868>
cpsr           0x40000010	1073741840

Execute sub sp, r11, #32 (sp = r11 - 32)

#0  0xb6fe386c in dl_open_worker (a=<optimized out>) at dl-open.c:601
r0             0xb5d96b98	3050924952
r1             0x1b7800	1800192
r2             0x1b7800	1800192
r3             0x0	0
r4             0x1da1b0	1941936
r5             0xb6fff050	3070226512
r6             0x4	4
r7             0x10	16
r8             0xb6fff568	3070227816
r9             0x194	404
r10            0xb6ffed98	3070225816
r11            0xb5d95f70	3050921840
r12            0xb5d40dd0	3050573264
sp             0xb5d95f50	0xb5d95f50
lr             0xb6fe3838	3070113848
pc             0xb6fe386c	0xb6fe386c <dl_open_worker+872>
cpsr           0x40000010	1073741840
0xb5d95f50:	0x001c5fd0	0x001b7928	0x001d6408	0x001d63e0
0xb5d95f60:	0xb5da7530	0xb5d6e3b8	0xb5d6e570	0xb5da7519
0xb5d95f70:	0x001ba2e4	0xb6f8fe00	0x001d63d8	0x001d63dc

Next step is to pop the saved registers, so we're supposed to be back at a sane SP now, but aren't.

(I have no idea why it did the intermediate ldr, since the very next instruction clobbered SP.)

It looks to me like r11 is a frame pointer in this function. It uses a variable-length array, so the stack frame is dynamically adjusted at runtime. And the epilogue sequence that restores SP from r11 mirrors the prologue.

r11 got clobbered by the _dl_init call. I'm not sure where yet, but I have a suspicion some .so init function we're generating isn't following the C ABI correctly.

@gopherbot
Copy link

CL https://golang.org/cl/47831 mentions this issue.

@aclements
Copy link
Member

I'm pretty sure I found it, but I managed to completely toast the Go tree you set up for me on the ARM host (oops). @zeebo, would you mind testing https://golang.org/cl/47831?

@zeebo
Copy link
Contributor Author

zeebo commented Jul 7, 2017

No problem. I'll give it a shot right now. Edit: I forgot to mention it will take about 30-40 minutes because it requires a rebuild of the toolchain. These boards aren't the quickest :)

@zeebo
Copy link
Contributor Author

zeebo commented Jul 7, 2017

It seems fixed. Do we want to revert 168eb9c and let the arm builders chew on it?

@gopherbot
Copy link

CL https://golang.org/cl/47834 mentions this issue.

gopherbot pushed a commit that referenced this issue Jul 7, 2017
This reverts commit 168eb9c.

CL 47831 fixes the issue with plugins on ARMv5, so we can re-enable the test.

Updates #19674.

Change-Id: Idcb29f93ffb0460413f1fab5bb82fa2605795038
Reviewed-on: https://go-review.googlesource.com/47834
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Jul 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

8 participants