Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: linux-arm cross-compiled binary segfaults at start on raspberry pi #36309

Closed
AndrewGMorgan opened this issue Dec 29, 2019 · 20 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@AndrewGMorgan
Copy link
Contributor

What version of Go are you using (go version)?

go1.14beta1

$ ../bin/go version
go version devel +a5bfd9da1d Tue Dec 17 14:59:30 2019 +0000 linux/amd64go version

Does this issue reproduce with the latest release?

No, this is new since 1.13.5.

[FWIW Binary searching git history, I find that reverting these two commits causes things to work as expected again: git revert 75c839af22a ; git revert f07cbc7f88e]

What operating system and processor architecture are you using (go env)?

go env Output
$ GOARCH=arm ../bin/go env
GO111MODULE=""
GOARCH="arm"
GOBIN=""
GOCACHE="/home/andrew/.cache/go-build"
GOENV="/home/andrew/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/andrew/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/andrew/gits/go-expt/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/andrew/gits/go-expt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
GOARM="5"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/home/andrew/gits/go-expt/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build928208053=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This is the code I'm building:

$ cat hello.go 
package main

import (
        "flag"
        "fmt"
)

var target = flag.String("target", "", "target name")

func main() {
        flag.Parse()
        fmt.Printf("target = %q\n", *target)
}

Building it as follows:

$ GOARCH=arm ../bin/go build hello.go

What did you expect to see?

Rebuilding the Go build at 1.13.5:

pi@raspberrypi:~ $ ./hello
target = ""
pi@raspberrypi:~ $ 

What did you see instead?

pi@raspberrypi:~ $ ./hello
Segmentation fault
pi@raspberrypi:~ $ strace ./hello
execve("./hello", ["./hello"], 0x7eb78700 /* 21 vars */) = 0
getpid()                                = 6170
sched_getaffinity(0, 8192, [0, 1, 2, 3]) = 4
open("/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY) = -1 ENOENT (No such file or directory)
mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f42000
mmap2(NULL, 12288, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f3f000
mmap2(0x76f3f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76f3f000
mmap2(NULL, 270667776, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x66d1e000
brk(NULL)                               = 0x700000
mmap2(0x800000, 541065216, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x800000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x17e47a} ---
+++ killed by SIGSEGV +++
Segmentation fault
pi@raspberrypi:~ $ 
@ianlancetaylor
Copy link
Contributor

CC @cherrymui

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Dec 29, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.14 milestone Dec 29, 2019
@ianlancetaylor ianlancetaylor changed the title linux-arm cross-compiled binary segfaults at start on raspberry pi runtime: linux-arm cross-compiled binary segfaults at start on raspberry pi Dec 29, 2019
@ianlancetaylor
Copy link
Contributor

What Linux kernel version are you running? What glibc version?

@AndrewGMorgan
Copy link
Contributor Author

Being a cross-compile, it is CGO_ENABLED=0. But the raspberry pi says:

Linux raspberrypi 4.19.75-v7+ #1270 SMP Tue Sep 24 18:45:11 BST 2019 armv7l

pi@raspberrypi:~ $ ldd /bin/cat
linux-vdso.so.1 (0x7efbf000)
/usr/lib/arm-linux-gnueabihf/libarmmem-${PLATFORM}.so => /usr/lib/arm-linux-gnueabihf/libarmmem-v7l.so (0x76f5f000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76e11000)
/lib/ld-linux-armhf.so.3 (0x76f74000)

@AndrewGMorgan
Copy link
Contributor Author

pi@raspberrypi:~ $ ls -l /lib/arm-linux-gnueabihf/libc.so.6
lrwxrwxrwx 1 root root 12 May 14 2019 /lib/arm-linux-gnueabihf/libc.so.6 -> libc-2.28.so

@AndrewGMorgan
Copy link
Contributor Author

I tried to build head on this raspberry pi using a fresh clone from go.googlesource.com and the first attempt failed with a bus error:

$ ./all.bash
Building Go cmd/dist using /usr/lib/go-1.11. (go1.11.6 linux/arm)
Building Go toolchain1 using /usr/lib/go-1.11.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
go tool dist: FAILED: /home/pi/gits/go/pkg/tool/linux_arm/go_bootstrap install -gcflags=all= -ldflags=all= -i cmd/asm cmd/cgo cmd/compile cmd/link: signal: bus error

As before all.bash gets a lot further if I do this to the repository: git revert 75c839af22a ; git revert f07cbc7f88e... I've not tried this before, so I'm not 100% sure the rpi is up to the task of running all the tests without locking up in some way.

In this instance, the rpi stalled out after displaying:

...
ok      text/tabwriter  0.040s
ok      text/template   0.521s
ok      text/template/parse     0.177s
ok      time    3.170s
ok      unicode 0.061s
ok      unicode/utf16   0.051s
ok      unicode/utf8    0.061s
ok      cmd/addr2line   11.004s
ok      cmd/api 0.159s
ok      cmd/asm/internal/asm    7.680s
ok      cmd/asm/internal/lex    0.025s

and the syslog contains:

Dec 30 00:48:27 raspberrypi kernel: [196115.411313] INFO: task kworker/3:0:23624 blocked for more than 120 seconds.
Dec 30 00:48:31 raspberrypi kernel: [196115.411322]       Tainted: G         C        4.19.75-v7+ #1270
Dec 30 00:48:32 raspberrypi kernel: [196115.411325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 30 00:48:36 raspberrypi kernel: [196115.411330] kworker/3:0     D    0 23624      2 0x00000000
Dec 30 00:48:41 raspberrypi kernel: [196115.411349] Workqueue: events_freezable mmc_rescan
Dec 30 00:48:44 raspberrypi kernel: [196115.411370] [<80831314>] (__schedule) from [<80831984>] (schedule+0x50/0xa8)
Dec 30 00:48:48 raspberrypi kernel: [196115.411380] [<80831984>] (schedule) from [<80686c38>] (__mmc_claim_host+0x120/0x228)
Dec 30 00:48:54 raspberrypi kernel: [196115.411389] [<80686c38>] (__mmc_claim_host) from [<80686d78>] (mmc_get_card+0x38/0x3c)
Dec 30 00:48:59 raspberrypi kernel: [196115.411398] [<80686d78>] (mmc_get_card) from [<8068fde4>] (mmc_sd_detect+0x24/0x7c)
Dec 30 00:49:04 raspberrypi kernel: [196115.411406] [<8068fde4>] (mmc_sd_detect) from [<806893a0>] (mmc_rescan+0x1cc/0x39c)
Dec 30 00:49:08 raspberrypi kernel: [196115.411417] [<806893a0>] (mmc_rescan) from [<8013bf64>] (process_one_work+0x170/0x458)
Dec 30 00:49:12 raspberrypi kernel: [196115.411426] [<8013bf64>] (process_one_work) from [<8013c2a8>] (worker_thread+0x5c/0x5a4)
Dec 30 00:49:16 raspberrypi kernel: [196115.411436] [<8013c2a8>] (worker_thread) from [<80142594>] (kthread+0x138/0x168)
Dec 30 00:49:21 raspberrypi kernel: [196115.411446] [<80142594>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28)
Dec 30 00:49:24 raspberrypi kernel: [196115.411450] Exception stack(0xa8691fb0 to 0xa8691ff8)
Dec 30 00:49:26 raspberrypi kernel: [196115.411456] 1fa0:                                     00000000 00000000 00000000 00000000
Dec 30 00:49:29 raspberrypi kernel: [196115.411463] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Dec 30 00:49:33 raspberrypi kernel: [196115.411468] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Dec 30 00:50:30 raspberrypi kernel: [196238.291843] INFO: task kworker/3:0:23624 blocked for more than 120 seconds.

@gopherbot
Copy link

Change https://golang.org/cl/212641 mentions this issue: runtime: not use R11 in nanotime1/walltime1 on ARM

@cherrymui
Copy link
Member

@AndrewGMorgan thanks for reporting!

Could you test if the CL above fixes the problem? (The code path is not exercised on the builder, so I cannot test it myself.) Thanks!

@cpuguy83
Copy link

I also get a segfault when cross-compiling for pi.
It only happens with static linking (I do have some cgo), if I produce a dynamically linked binary no segfault.

Tested with the patch and still get a segfault... so far I've only tested with 1.12, 1.13, and master+patch (so no patch on top of 1.12 or 1.13).

@cherrymui
Copy link
Member

@cpuguy83 that looks like a different issue. Could you open a new issue?

Did you mean that your case fails with Go 1.12 and 1.13 as well?
Does your C code use thread local storage?

@AndrewGMorgan
Copy link
Contributor Author

I patched in refs/changes/41/212641/2 and cross-compiled a hello that did not segfault!

I also attempted to run all.bash on the rpi with such a patched tree and it completed the test cmd/asm/internal/lex before locking up as before, so I think we're back to where we were before the two changes that seemed to introduce the segfault issue for me.

After rebooting the pi with the not-quite-fully-tested go compiler on it, I was able to compile hello.go and run it without a segfault.

@cherrymui
Copy link
Member

Thanks @AndrewGMorgan !

After cmd/asm/internal/lex it will be the compiler's tests. It takes large amount of memory to build and run. How much memory does your machine have? It is possible that it hangs, or just slow, due to the large memory usage, even using swaps.

@AndrewGMorgan
Copy link
Contributor Author

My rpi is a 3B and has 1GiB of RAM total. free says it has 948304 KiB available for user space. I've been able to run the tests in question one at a time, so I'm wondering if some strategic 'sync's might help the all.bash run complete? But it's likely not a Go intrinsic problem.

@AndrewGMorgan
Copy link
Contributor Author

For posterity and likely off-topic for extending this bug, I ran this in parallel with an all.bash run on my rpi:

$ while true ; do sync ; sleep 1 ; done

Attached is the output of the fail:

pi -raspberrypi -gits-go-src- all-bash.txt

I'm inclined to agree that this looks like resource limits at work.

@josharian
Copy link
Contributor

Do you have more luck if you run

GOFLAGS=-p=1 all.bash

@AndrewGMorgan
Copy link
Contributor Author

Tried the GOFLAGS=-p=1 setting and got a fatal error: runtime: output of memory for the cmd/compile/internal/ssa test. Pretty soon after thatthe system locked up.

@josharian
Copy link
Contributor

Ah. Related: #27739.

By way of narrowing it down, would you mind running make.bash, then go test -v -cpu=1 cmd/compile/internal/ssa and see which test is pushing it over the limit?

@AndrewGMorgan
Copy link
Contributor Author

AndrewGMorgan commented Jan 5, 2020

It seems to pretty much grind to a halt during the compilation phase.

When I set up this rpi I left it with the default swap allocation which appears to be only 100 MiB. Running top while the compilation was occurring, it is clear that this was being quickly exhausted.

I edited the /etc/dphys-swapfile file and commented out the default hard limit, #CONF_SWAPSIZE=100 un-commented the swap factor parameter of CONF_SWAPFACTOR=2. It took a reboot to take effect and with that in effect the compilation seemed to complete relatively quickly and the test ran until killed by a 10 minute timeout:

...
=== RUN   TestMagicExhaustive16
--- PASS: TestMagicExhaustive16 (235.65s)
=== RUN   TestMagicExhaustive16U
panic: test timed out after 10m0s
...

Once the compilation phase had completed, the memory footprint for the test became quite modest, so I'm guessing the compilation was straying into the footprint of the total mem+swap.

The rpi 3B is a quad-core system, so with extra swap enabled, I tried the same again without -cpu=1. TestMagicExhaustive16 took the same amount of time, but empirically it seemed to take less time to reach that point - ie., the compilation was faster. The result, however, was the same.

Judging by the way the CPU load seems to be pegged at 100%, I'm guessing this is now a limit of the CPU single thread speed. I wonder if there is a way to test only some fraction of the 16 bit space by default?

So, I tried running this test with -timeout=0. This time, the only test cases that seemed to take a significant amount of time (ie., > 2s) were:

--- PASS: TestMagicExhaustive16 (235.55s)
--- PASS: TestMagicExhaustive16U (402.58s)
--- PASS: TestDivisibleExhaustive16U (502.50s)
--- PASS: TestDivisibleExhaustive16 (285.43s)
...
PASS
ok      cmd/compile/internal/ssa        1428.534s

[FWIW TestPoset, TestPosetStrict, TestPosetCollapse and TestShiftToExtensionAMD64 seem to generate an extraordinary amount of log spam, so I was worried I might have missed another long running test case, but those 4 add to something close to the total time, so I suspect not.]

@josharian
Copy link
Contributor

Thanks, @AndrewGMorgan. If you're up for one more experiment, would you:

git fetch "https://go.googlesource.com/go" refs/changes/03/213703/1 && git checkout FETCH_HEAD

and then try again, and let me know whether things get any farther?

@AndrewGMorgan
Copy link
Contributor Author

Not exactly sure what you want tried. I ran the test again (-timeout=0), compilation memory footprint seemed lower, and the tests were slightly slower:

--- PASS: TestMagicExhaustive16 (237.08s)
--- PASS: TestMagicExhaustive16U (407.54s)
--- PASS: TestDivisibleExhaustive16U (504.36s)
??? (screen didn't buffer enough to capture 4th long test time but was overwhelmed with log spam)
PASS
ok cmd/compile/internal/ssa 1438.884s

@josharian
Copy link
Contributor

Thanks. I was hoping all.bash would pass for you. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants