Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: cross-compile ARM for speed? #17105

Closed
bradfitz opened this issue Sep 14, 2016 · 32 comments
Closed

x/build: cross-compile ARM for speed? #17105

bradfitz opened this issue Sep 14, 2016 · 32 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

As part of #17104 to improve trybot speed and get Trybots down to 5 minutes, I now see that linux-arm is the slowest builder, even sharded 8 machines wide.

The problem is that just make.bash on linux-arm takes 5 minutes itself, even without running any tests, so sharding 8 machines wide doesn't help much.

What do people think about cross-compiling the linux-arm make.bash on linux-amd64 (on Kubernetes) first (which takes about 33 seconds), and then pushing that out to 7 real ARM machines for tests? (instead of pushing out the same everything-built tarball from the ARM machine)

In parallel, we could run a real ARM make.bash (for 5 minutes) to verify it works, but never use its output for testing.

Thoughts?

/cc @ianlancetaylor @quentinmit @davecheney @minux @cherrymui

@bradfitz bradfitz added the Builders x/build issues (builders, bots, dashboards) label Sep 14, 2016
@bradfitz bradfitz added this to the Unreleased milestone Sep 14, 2016
@bradfitz bradfitz self-assigned this Sep 14, 2016
@ianlancetaylor
Copy link
Contributor

SGTM

@cherrymui
Copy link
Member

SGTM. It should be fine as there are already many tests that do invoke the compiler/linker/etc.

@minux
Copy link
Member

minux commented Sep 14, 2016 via email

@bradfitz
Copy link
Contributor Author

SGTM. It should be fine as there are already many tests that do invoke the compiler/linker/etc.

Good point. I didn't consider that. So maybe the parallel make.bash on real hardware is a little pointless.

@ianlancetaylor
Copy link
Contributor

I think running make.bash on real hardware is still useful.

@bradfitz
Copy link
Contributor Author

Yeah, it makes me feel more comfortable too. And it's easy enough and still basically within my time goal.

@gopherbot
Copy link

CL https://golang.org/cl/29670 mentions this issue.

gopherbot pushed a commit to golang/build that referenced this issue Sep 23, 2016
This is a new builder in prep for the change to the "linux-arm"
builder where the GOARCH=arm make.bash will be cross-compiled from a
Kubernetes container on fast hardware.

Updates golang/go#17105 (cross-compile ARM builders' make.bash)
Updates golang/go#17104 (5 minute trybots)

Change-Id: Icfd2644d77639f731151abe54839322960418254
Reviewed-on: https://go-review.googlesource.com/29670
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@gopherbot
Copy link

CL https://golang.org/cl/29677 mentions this issue.

@bradfitz
Copy link
Contributor Author

With @jfrazelle's help, we've almost got this working.

But we just saw a crash when running the cross-built GOROOT files on the normal Scaleway.com ARM machine:

--- FAIL: TestCgoExternalThreadPanic (0.01s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: signal: segmentation fault
    crash_cgo_test.go:72: want failure containing "panic: BOOM". output:

--- FAIL: TestEnsureDropM (0.01s)
    crash_test.go:105: testprogcgo EnsureDropM exit status: signal: segmentation fault
    crash_cgo_test.go:154: expected "OK\n", got 
FAIL
FAIL    runtime 68.340s

Not sure what to make of that.

@ianlancetaylor?

@bradfitz
Copy link
Contributor Author

But on the same machine, it seems to work by hand:

# go test -c runtime
#  ./runtime.test -test.v -test.run='Panic$|EnsureDrop'   
=== RUN   TestCallersPanic
--- PASS: TestCallersPanic (0.00s)
    callers_test.go:46: functions seen: runtime.Callers runtime.call16 runtime.gopanic runtime_test.f2 runtime_test.TestCallersPanic testing.tRunner runtime.goexit runtime_test.TestCallersPanic.func1 runtime_test.f3 runtime_test.f1
=== RUN   TestCgoExternalThreadPanic
--- PASS: TestCgoExternalThreadPanic (9.49s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: exit status 2
=== RUN   TestEnsureDropM
--- PASS: TestEnsureDropM (0.01s)
=== RUN   TestRecursivePanic
--- PASS: TestRecursivePanic (2.79s)
    crash_test.go:105: testprog RecursivePanic exit status: exit status 2
=== RUN   TestGoexitInPanic
--- PASS: TestGoexitInPanic (0.04s)
    crash_test.go:105: testprog GoexitInPanic exit status: exit status 2
=== RUN   TestDeferPtrsPanic
--- PASS: TestDeferPtrsPanic (0.04s)
=== RUN   TestStackPanic
--- PASS: TestStackPanic (0.00s)
PASS

@ianlancetaylor
Copy link
Contributor

It's weird that those tests failed with a segmentation fault but that there was no output. One thing that can cause that is if the signal handler itself gets a signal, but neither of those tests is expected to get a signal. Both of those tests involve a non-Go thread calling a Go function, so my guess is that it has something to do with that. I can't think of anything else useful.

@bradfitz
Copy link
Contributor Author

And it failed on the staging builder again in the same way. Doesn't seem to be a flake.

@bradfitz
Copy link
Contributor Author

@ianlancetaylor, the only difference I can see between how I'm running it "by hand" vs by the builders is that when I run it by hand and it works, it's running under bash. The builders run it under the Go buildlet binary.

Is there some environment difference I'm not considering?

@ianlancetaylor
Copy link
Contributor

For these tests the test itself will run go build for a program that uses cgo (runtime/testdata/testprogcgo), which means that the test will invoke the C compiler. Are you sure that you are getting the same C compiler when you run it by hand as when it fails?

@ianlancetaylor
Copy link
Contributor

That is, what is PATH for bash and for the buildlet?

@bradfitz
Copy link
Contributor Author

@ianlancetaylor, ah hah! I bet that's the issue. I can totally believe the CC_FOR_TARGET or CGO_ENABLED isn't being set in the tests.

Thanks!

@bradfitz
Copy link
Contributor Author

Er, on second thought: we're not cross-compiling when running the tests, so we're using the system default:

# gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

In the Kubernetes container where we cross-compile make.bash, we use https://packages.debian.org/stretch/gcc-arm-linux-gnueabihf ...

# arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (Debian 6.1.1-9) 6.1.1 20160705
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Are you saying that mixing those is the problem?

Maybe we need an older arm-linux-gnueabihf-gcc version?

@bradfitz
Copy link
Contributor Author

The Scaleway machines are running Debian GNU/Linux 8.1 (jessie).

@bradfitz
Copy link
Contributor Author

gopherbot pushed a commit to golang/build that referenced this issue Sep 24, 2016
The far superior linux distro of champions.

Updates golang/go#17105

Change-Id: I5ea0cd2361753f61bb74bf3d4dea6c181f1427fa
Reviewed-on: https://go-review.googlesource.com/29687
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz
Copy link
Contributor Author

We switched to Jessie in the cross-compiling Kubernetes container, but no luck. It still fails, and I see this on Scaleway builder just before it fails while running the test:

16303 ?        Ss   131:35 /lib/systemd/systemd-journald
 7634 ?        Ssl    0:01 /usr/local/bin/buildlet-stage0
 7641 ?        Sl     0:35  \_ ./buildlet.exe --workdir=/workdir --hostname=scaleway-staging-02 --halt=false --reverse=linux-arm,linux-
 7706 ?        Sl     0:00      \_ /workdir/go/bin/go tool dist test --no-rebuild --banner=XXXBANNERXXX: go_test:runtime
 7717 ?        Sl     0:00          \_ /workdir/go/pkg/tool/linux_arm/dist test --no-rebuild --banner=XXXBANNERXXX: go_test:runtime
 7746 ?        Sl     0:00              \_ go test -short -tags= -timeout=6m0s -gcflags= runtime
 7819 ?        Sl     0:02                  \_ /tmp/go-build528979954/runtime/_test/runtime.test -test.short=true -test.timeout=6m0s
 7843 ?        Sl     0:00                      \_ go build -o /tmp/go-build961658414/testprogcgo.exe
 7856 ?        Sl     0:00                          \_ /workdir/go/pkg/tool/linux_arm/cgo -objdir /tmp/go-build444027289/_/workdir/go/s
 7863 ?        S      0:00                              \_ arm-linux-gnueabihf-gcc -w -Wno-error -o/tmp/go-build444027289/_/workdir/go/
 7864 ?        R      0:00                                  \_ /usr/lib/gcc/arm-linux-gnueabihf/4.9/cc1 -quiet -I /tmp/go-build44402728

And on that same Scaleway machine:

# gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# uname -a
Linux buildlet-prep 3.2.34-30 #17 SMP Mon Apr 13 15:53:45 UTC 2015 armv7l GNU/Linux

# lsb_release  -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.1 (jessie)
Release:    8.1
Codename:   jessie

# arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@bradfitz
Copy link
Contributor Author

And in the Kubernetes container:

root@85ec3929230a:/# arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc ( 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So at least now we seem to be running the same compiler? (albeit one on an amd64 host and one on an armhf host)

@bradfitz
Copy link
Contributor Author

@jfrazelle and I are stumped. Going to take a break from this for now. Clues welcome.

@ianlancetaylor
Copy link
Contributor

What system libraries are available on the cross-compiler host and on the real host?

Is there any for you to snag a copy of the testproccgo program that is failing?

@crawshaw
Copy link
Member

Can you run one of the failing binaries under gdb?

@bradfitz
Copy link
Contributor Author

I'll work on both those things. I just finally now got it to reproduce by hand in a shell.

I made the buildlet log the environment it runs the test with:

In a browser, watching the hacked-up coordinator:

:: Running /workdir/go/bin/go with args ["/workdir/go/bin/go" "tool" "dist" "test" "--no-rebuild" "--banner=XXXBANNERXXX:" "go_test:runtime"] and env ["PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" "GOROOT_BOOTSTRAP=/usr/local/go" "WORKDIR=/workdir" "GO_BUILDER_NAME=linux-arm" "GO_BUILDER_FLAKY_NET=1" "GOROOT=/workdir/go"] in dir /workdir

--- FAIL: TestCgoExternalThreadPanic (0.01s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: signal: segmentation fault
    crash_cgo_test.go:72: want failure containing "panic: BOOM". output:

--- FAIL: TestEnsureDropM (0.01s)
    crash_test.go:105: testprogcgo EnsureDropM exit status: signal: segmentation fault
    crash_cgo_test.go:154: expected "OK\n", got 
FAIL
FAIL    runtime 78.894s
2016/09/24 01:31:09 Failed: exit status 1
:: Running /workdir/go/bin/go with args ["/workdir/go/bin/go" "tool" "dist" "test" "--no-rebuild" "--banner=XXXBANNERXXX:" "go_test:runtime"] and env ["PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" "GOROOT_BOOTSTRAP=/usr/local/go" "WORKDIR=/workdir" "GO_BUILDER_NAME=linux-arm" "GO_BUILDER_FLAKY_NET=1" "GOROOT=/workdir/go"] in dir /workdir

And then I was able to make it do it by hand:

In ssh:

root@buildlet-prep:/workdir# PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GOROOT_BOOTSTRAP=/usr/local/go WORKDIR=/workdir GO_BUILDER_NAME=linux-arm GO_BUILDER_FLAKY_NET=1 GOROOT=/workdir/go /workdir/go/bin/go tool dist test --no-rebuild go_test:runtime

##### Testing packages.
--- FAIL: TestCgoExternalThreadPanic (0.01s)
    crash_test.go:105: testprogcgo CgoExternalThreadPanic exit status: signal: segmentation fault
    crash_cgo_test.go:72: want failure containing "panic: BOOM". output:

--- FAIL: TestEnsureDropM (0.01s)
    crash_test.go:105: testprogcgo EnsureDropM exit status: signal: segmentation fault
    crash_cgo_test.go:154: expected "OK\n", got 
FAIL
FAIL    runtime 65.839s
2016/09/24 01:36:48 Failed: exit status 1

So now I can actually modify things easily and see what's happening I hope.

@bradfitz
Copy link
Contributor Author

Mystery/clue: running go test passes but go tool dist test fails !?

# PATH=/workdir/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GOROOT_BOOTSTRAP=/usr/local/go WORKDIR=/workdir GO_BUILDER_NAME=linux-arm GO_BUILDER_FLAKY_NET=1 GOROOT=/workdir/go /workdir/go/bin/go test -v -short runtime
....
PASS
ok      runtime 137.799s

(Almost all that time is TestCollisions, it turns out... #17217)

@bradfitz
Copy link
Contributor Author

Okay, got a binary.

root@buildlet-prep:/workdir# file /tmp/go-build332353381/testprogcgo.exe 
/tmp/go-build332353381/testprogcgo.exe: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=52eb3d9211e9b2896578e92f9bce32d439670bc6, not stripped

root@buildlet-prep:/workdir# ldd /tmp/go-build332353381/testprogcgo.exe 
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0x402e5000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x40308000)
    /lib/ld-linux-armhf.so.3 (0x400d6000)

root@buildlet-prep:/workdir# gdb /tmp/go-build332353381/testprogcgo.exe
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
This GDB was configured as "arm-linux-gnueabihf".
...
Reading symbols from /tmp/go-build332353381/testprogcgo.exe...done.
warning: File "/workdir/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
    add-auto-load-safe-path /workdir/go/src/runtime/runtime-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
(gdb) run EnsureDropM
Starting program: /tmp/go-build332353381/testprogcgo.exe EnsureDropM
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0x40939460 (LWP 21409)]
[New Thread 0x41199460 (LWP 21410)]
[New Thread 0x41999460 (LWP 21411)]
[New Thread 0x42aff460 (LWP 21413)]
[New Thread 0x42199460 (LWP 21412)]
[New Thread 0x432ff460 (LWP 21414)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x432ff460 (LWP 21414)]
_sfloat () at /workdir/go/src/runtime/vlop_arm.s:75
75      MOVW    m_locks(R8), R1
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
runtime.raise () at /workdir/go/src/runtime/sys_linux_arm.s:137
137     RET
(gdb) c
Continuing.
[Thread 0x432ff460 (LWP 21414) exited]
[Thread 0x42aff460 (LWP 21413) exited]
[Thread 0x42199460 (LWP 21412) exited]
[Thread 0x41199460 (LWP 21410) exited]
[Thread 0x41999460 (LWP 21411) exited]
[Thread 0x40939460 (LWP 21409) exited]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) 

Wrong GOARM= level?

@bradfitz
Copy link
Contributor Author

From cmd/dist:

func xgetgoarm() string {
//...
        if gohostarch != "arm" || goos != gohostos {
                // Conservative default for cross-compilation.                                                                         
                return "5"
        }

That's my best guess at the moment.

@bradfitz
Copy link
Contributor Author

On Scaleway,

# ./go/pkg/tool/linux_arm/dist -check-goarm
VFPv1 OK.
VFPv3 OK.

So I should probably make the Kubernetes cross-compiler set GOARM=7

@bradfitz
Copy link
Contributor Author

Yup! That was it.

Thanks @jfrazelle, @ianlancetaylor, and @crawshaw!

@jessfraz
Copy link
Contributor

Omg yay!!!

On Friday, September 23, 2016, Brad Fitzpatrick notifications@github.com
wrote:

Yup! That was it.

Thanks @jfrazelle https://github.com/jfrazelle, @ianlancetaylor
https://github.com/ianlancetaylor, and @crawshaw
https://github.com/crawshaw!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABYNbI61WMCS2D2f9KpQOrldsQcUHgb4ks5qtI7wgaJpZM4J8_-h
.

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@davecheney
Copy link
Contributor

That default cross compilation setting gets you every time. We should
probably update it to be 6, which is the default for local compiles.

On Sat, Sep 24, 2016 at 12:32 PM, Brad Fitzpatrick <notifications@github.com

wrote:

From cmd/dist:

func xgetgoarm() string {//...
if gohostarch != "arm" || goos != gohostos {
// Conservative default for cross-compilation.
return "5"
}

That's my best guess at the moment.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17105 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA6kD21aU5U0P1anmODHcD3N5vvjVks5qtIu_gaJpZM4J8_-h
.

@golang golang locked and limited conversation to collaborators Sep 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

8 participants