-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile, cmd/link: can't build large arm binaries with external linking, Kubernetes now too big #17028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The file in question is: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/docker/engine-api/types/versions/compare.go @luxas have you done a |
There's always at least an implicit init function in packages to run all the assignments and function calls at the package (global) scope, and to call the init functions of dependent packages. Does the problem reproduce with Go 1.7.1?
What makes you say that? |
@lavalamp breakage at current HEAD with go 1.7.1:
It guess it's hitting some kind of limit in the golang arm implementation right now I don't know much about the lower-level Golang ARM implementation, can you think about something that would panic there? |
can you bisect for the exact sha that started the breakage, not the tag |
Trying to find the commit |
Your faulting address is the pc (program counter) and it is greater than 2**26. I'm not familiar with arm, but could it be that the 32 bit arm platform has a limit on the size of the programs it can run and these programs are exceeding it? These are the same programs we have issues with on ppc64le due to their size. |
@laboger Thanks a lot! also broke with at commit kubernetes/kubernetes@a81732b it didn't break, but with kubernetes/kubernetes@ff9980e it does break. The difference there is only 300 commits, but the binary clearly passes the Well, now we know what the problem is. |
Other interesting observations:
So it might be a limitation in golang... |
running a 32 bit binary on a 64 bit system will run it in 32 bit emulation On Fri, Sep 9, 2016 at 7:53 AM, Lucas Käldström notifications@github.com
|
/cc @minux @cherrymui |
A hard problem to fix because we do instruction encoding in
the compiler front end and at that time there is no way for
the compiler to know if the target is too far away.
Splitting text sections might help, but a real solution is perhaps
to make the front end delay choosing the encoding for a jump
until link time.
Could we modify cmd/internal/obj so that it does instruction
encoding on a basic block basis and let the linker choose
the encoding for jump instructions?
A short term solution is make our liker generate trampolines
if the target is too far away.
|
Thanks everyone for helping with this! I know @laboger has a patch for ppc64le that can be cherrypicked upon go1.7 |
The CL @laboger sent may cover this with minimal changes. I haven't investigated how the external linker is rewriting the jump, it may not work on ARM, but it's worth a try. @luxas, can you patch in CL 27790, modify cmd/link/internal/ld/data.go:1912 to look for either PPC64 or ARM, and see if it works? |
Has any investigation been done to determine if the binaries built by Kubernetes could be reduced in size? I suggested this when the problem was first reported. That would not only resolve your issue with ppc64le and arm but improve compile and link times, save space for your binaries, etc. etc. I looked briefly at the packages included in hyperkube and it appeared to have many duplicate path names rooted at different locations. Could there be new versions of some packages being added without removing the old ones that are no longer used? Or could the binaries get split in any way? |
While we are solving the linker problem, in the meantime, is using Go tip an option for you? The SSA compiler generates smaller code that does not fault.
|
What about we introduce a |
I think that would be a good solution for all platforms that could have this problem, and would fix internal and external linking. My CL only works for external linking, because it depends on the GNU linker to handle the long calls. I believe it needs to be an option because it will degrade call performance, so we wouldn't want to do it all the time but there should be opportunities for optimizing. @minux made similar suggestions in the corresponding issue for ppc64le #15823. |
One other note, on arm isn't this a runtime problem when executing the program that has been built, not a build problem? So while I think the problem is due to the large program size the symptoms on arm 32 bit are different than ppc64le. |
I think it is a build problem: it emits a pc-relative jump and the offset overflows, and the linker did not signal, so the program jumped to a faulty address. |
I don't think a new compiler mode is the answer. That won't fit well with the go tool. If there's no space in the text for the linker to rewrite the pc-relative jump, than as long as no single function is too large, the linker can insert a trampoline. |
That means your suggestion is for internal linking only? Because I assume you mean the internal linker would insert the trampoline. |
@laboger Can you provide a version of your CL as cherrypickable upon go1.6.3 or go1.7.1 so I can test if it fixes this issue? |
@laboger The linker has plenty of opportunities to insert trampolines for both external and internal linking (though it may take some wiring). |
@luxas If you get the patch from CL 23963, that should apply against go 1.7. That was my previous patch before I had to rebase it for master and ended up with a new CL. Please note that this patch depends on using external linking so that the external linker makes the necessary changes to allow for long jumps or calls. |
Does the arm compiler link internally or externally by default? |
@crawshaw But when you say "the linker" you mean the golang linker would generate the trampolines before writing out the elf text sections that are passed on to the external linker? |
Yes. (Of course it may be that the external linker is already smart enough to do that itself, I haven't checked.) |
Automatic merge from submit-queue Use a patched golang version for building linux/arm Fixes: kubernetes#29904 Right now, linux/arm is broken because of an internal limitation in Go. I've filed an issue for it here: golang/go#17028 The affected binaries of this limitation are hyperkube and kube-apiserver, which are the largest binaries. And when we now have a patched go 1.7.1 version for building "unsupported" but important architectures (ref: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/multi-platform.md), we should also include the patch for ppc64le and start building ppc64le again. As soon as @laboger has the patch I need up on Github, I'll include ppc64le to this PR and we'll merge it TODO: - [ ] ~~Update the PR with patches for ppc64le at the same time @luxas~~ - [x] Push the new kube-cross image @ixdy - [x] Run a full `make release` before to verify nothing breaks @luxas + @ixdy - [ ] Cherrypick into the 1.4 branch @luxas + (who?) @lavalamp @smarterclayton @ixdy @rsc @davecheney @wojtek-t @jfrazelle @bradfitz @david-mcmahon @pwittrock
Automatic merge from submit-queue Use a patched golang version for building linux/arm Fixes: kubernetes#29904 Right now, linux/arm is broken because of an internal limitation in Go. I've filed an issue for it here: golang/go#17028 The affected binaries of this limitation are hyperkube and kube-apiserver, which are the largest binaries. And when we now have a patched go 1.7.1 version for building "unsupported" but important architectures (ref: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/multi-platform.md), we should also include the patch for ppc64le and start building ppc64le again. As soon as @laboger has the patch I need up on Github, I'll include ppc64le to this PR and we'll merge it TODO: - [ ] ~~Update the PR with patches for ppc64le at the same time @luxas~~ - [x] Push the new kube-cross image @ixdy - [x] Run a full `make release` before to verify nothing breaks @luxas + @ixdy - [ ] Cherrypick into the 1.4 branch @luxas + (who?) @lavalamp @smarterclayton @ixdy @rsc @davecheney @wojtek-t @jfrazelle @bradfitz @david-mcmahon @pwittrock (cherry picked from commit 9bc7e36)
CL https://golang.org/cl/29397 mentions this issue. |
@cherrymui Many thanks for the CLs you've made. I assume CL 29397 is the "real" one, which can be merged at some point, right? We should definitely have a proper fix for this as soon as possible. |
@luxas Yes, I think CL 29397 will be the "real" one.
What issue? |
I meant this linker issue. Can you write a quick recap of what you're doing in CL 29397? |
In short, the issue is that when resolving a direct jump (or call), if the target is too far so that the offset cannot be encoded into the instruction, the linker simply truncated the address, and the generated binary jumped wild. The fix is to let the linker detect too-far jumps and automatically insert trampolines when needed. |
Isn't this fix for internal linking only? |
I believe it is not too hard to make it also work for external linking. But maybe not in that CL. |
Thanks, what's the ETA on CL 29397? But one step at a time. |
Hopefully by this week. |
@cherrymui Thanks for the patch! Still I think we need to get external linking working, so can you reopen this issue @bradfitz @cherrymui @crawshaw? Do you have any plan to get external linking working or am I missing something? |
I would have thought a similar approach to ppc64 should work, or including the trampolines in the section given to the external linker. I don't know if anyone is working on it. |
I will try to bring up external linking support on ARM soon. |
Clarification on the ppc64le fixes: the fix for this problem when using external linking is already upstream. That is the fix to split text sections if they get too large #15823, because then the GNU linker can create and insert trampolines or long branches as needed. The fix for #16665 solves the problem on ppc64le when using internal linking. |
@laboger But arm still needs this external linking fix? |
Yes it sounds like @cherrymui is working on it. |
CL https://golang.org/cl/31143 mentions this issue. |
Automatic merge from submit-queue Use a patched golang version for building linux/arm Fixes: kubernetes#29904 Right now, linux/arm is broken because of an internal limitation in Go. I've filed an issue for it here: golang/go#17028 The affected binaries of this limitation are hyperkube and kube-apiserver, which are the largest binaries. And when we now have a patched go 1.7.1 version for building "unsupported" but important architectures (ref: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/multi-platform.md), we should also include the patch for ppc64le and start building ppc64le again. As soon as @laboger has the patch I need up on Github, I'll include ppc64le to this PR and we'll merge it TODO: - [ ] ~~Update the PR with patches for ppc64le at the same time @luxas~~ - [x] Push the new kube-cross image @ixdy - [x] Run a full `make release` before to verify nothing breaks @luxas + @ixdy - [ ] Cherrypick into the 1.4 branch @luxas + (who?) @lavalamp @smarterclayton @ixdy @rsc @davecheney @wojtek-t @jfrazelle @bradfitz @david-mcmahon @pwittrock (cherry picked from commit 9bc7e36)
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go1.6.3
What operating system and processor architecture are you using (
go env
)?linux/arm
What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
Just run
docker run -it gcr.io/google_containers/kube-apiserver-arm:v1.4.0-alpha.3 /usr/local/bin/kube-apiserver
What did you expect to see?
kube-apiserver starting
What did you see instead?
Seems like Kubernetes with it's deps just grew too big for arm :(
Any help here would be really appreciated. We're releasing Kubernetes in about 10 days
I need help from some Go guru here that knows how the internals work!
I assume this is a Go issue rather than a Kubernetes issue, since the file that's segfaulting doesn't have an
init()
Please take a look as quickly as possible!
-> @lavalamp @smarterclayton @ixdy @rsc @davecheney @wojtek-t @jfrazelle @bradfitz
The text was updated successfully, but these errors were encountered: