-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: performance problems with many long, exported identifiers #18602
Comments
Without a specific repro, I don't foresee many people jumping on this bug. But once you have a repro, it becomes much more exciting to debug and fix. |
@bradfitz I was hoping the bug was obvious ("large code can't be compiled") but I understand the need to reproduce. I will take some time to write an extreme Go code generator to replicate my issue (or maybe post a link to a large set of code causing this issue). |
@cretz, no, it's not obvious at all. It's not even obvious whether you're hitting problems in the compiler (e.g. the SSA optimization phases, and there are many), or in the linker. Every time a bug like this has come up, it's had a very unique cause and fix. |
@bradfitz - Code to reproduce: repro-18602.zip I was able to get ~130MB of the code to compress well down to 10MB so I could paste in this issue. See the link above. It's a transpiler I am working on here. Simply extract the ZIP, navigate into the folder extracted to, and run
I was afraid it was a generic, obvious issue due to the extreme size of the code and that it's all in one package. I don't expect the code to run (I figure it'd panic, I don't know, never got it to compile). But if it's a nuanced issue that would be fantastic. |
I'll leave a compile running overnight to see whether I can get some useful pprof output.
|
Building Kubernetes produces several very large programs. There could be a reproducer there. Just a suggestion.... |
Compile with memory profiling complete. Took almost 7 hrs, 13gb ram:
Longest phases were:
The resulting object file was 1.14gb! Memory profile, using The only unusual things I see in the profile are:
As for the extremely long dumpobj time and giant object file (which will lead to massive link times and/or giant binaries), I suspect that that is again the result of very many very long exported identifiers. Hopefully when my cpu profile run completes in five hours, it'll be much clearer whether there are obvious non-linearities in dumpobj. @cretz though there may be some improvements available in the compiler, I don't anticipate anything breathtaking; you will almost certainly also need to change your generated code as well. As a start, I suggest altering your code generator to (a) avoid exporting any identifiers you reasonably can and (b) give your symbols shorter names, perhaps using truncated hashes or a global numbering scheme. (You'll probably also then need to generate a lookup table or some such to help with demangling, for debugging. Sorry about that.) If you do those things, please report back with the results, and maybe with an updated code dump. Thanks! |
@laboger thanks, but to mangle Tolstoy, every interminable compile is interminable in its own way. |
@josharian - Thanks for the update. I figured my use case was just really extreme. I can unexport some things. I can also reduce the identifier size (I wanted my identifiers to help w/ runtime reflection, but no big deal, I can maintain a separate blob w/ that info). I was hoping that even though the obj file is big, that DCE would remove a lot of bloat from the final binary due to having most functions as methods, but I am unfamiliar w/ the internals of the linker. "whether there are obvious non-linearities in dumpobj" I think this might be a key point in general. Ideally some of the work can be streamed and not held in memory for the life of the package compile. At least now y'all have a good stress test case. |
If it's really very long identifiers that cause problems with the compile time, we should try to get to the bottom of this, rather than find work-arounds. Exporting long identifiers shouldn't cause undue compilation times - there's a bug somewhere. As one data point: The export data writer uses a map to canonicalize strings - that is, a string that's been seen before (this includes exported identifiers) will only appear once in the export data. But the same identifiers may appear elsewhere in the obj file. |
Here's a CPU profile: Aha! Hello, gc.testdclstack. This is not the first time we've had problems with testdclstack. See #14781. Robert suggested only enabling it in a special debug mode in the compiler. It is probably time to do that, perhaps even for Go 1.8. I'll see about sending a CL soon. With gc.testdclstack eliminated, the parse phase drops from 11m to 13s. Still waiting to see how much it helps elsewhere. Eliminating gc.testdclstack won't help with memory usage, though. My compile is still at 7gb and growing.
I don't think it's just very long exported identifiers. It is also the sheer number of them, and probably also some structural things (per the other comments I've made). Squinting at the profiles, the long identifiers is maybe 10% of the memory issue; I just suggested it to @cretz as a good, straightforward first step (and experiment to confirm what I'm seeing). |
@josharian Excellent, thanks for tracking this down. We should perhaps disable testdclstack (in non-debug mode) even for 1.8. It's just a verification step. |
@griesemer, @josharian, send a CL soon if so. |
CL https://golang.org/cl/35113 mentions this issue. |
CL https://golang.org/cl/35114 mentions this issue. |
For anyone wanting it, here is a test case w/ smaller identifiers: repro-18602-smaller-idents.zip. Granted I still think repro-18602.zip is quality stress test case too. |
After the CLs above, time is reduced to a half hour, and max rss is down a bit:
For reference, here's an alloc_space profile output: Aside from the things I've already mentioned, disabling inlining would probably help noticeably now with memory usage. There might be further optimizations available to further speed up dumpobj or shrink the object file size by reusing more strings somewhere, but I'll leave that to @griesemer (export info) and @crawshaw (reflect info). Thanks for the new test case; I'll take a look at that later or tomorrow. |
This reduces compilation time for the program in #18602 from 7 hours to 30 min. Updates #14781 Updates #18602 Change-Id: I3c4af878a08920e6373d3b3b0c4453ee002e32eb Reviewed-on: https://go-review.googlesource.com/35113 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
Was testdclstack involved in dumpobj? Or what's the reason for the 55x reduction there? |
Yes. |
At least on my machine, the new code you posted compiles by ~10-15% faster, but memory usage doesn't shrink significantly; I guess I was wrong. The object file is still 856mb, though, so you're still probably going to have slow linking and a very large (and probably slow) binary. I don't plan to investigate this further at the moment. |
No prob. I appreciate that y'all are leaving this open so it can be revisited in the future if anyone wants a crazy test case for compiler performance. |
Instead of always appending to c.Values, choose whichever slice is larger; b.Values will be set to nil anyway. Appending once instead of in a loop also limits slice growth to once per function call and is more efficient. Reduces max rss for the program in #18602 by 6.5%, and eliminates fuseBlockPlain from the alloc_space pprof output. fuseBlockPlain previously accounted for 16.74% of allocated memory. Updates #18602. Change-Id: I417b03722d011a59a679157da43dc91f4425210e Reviewed-on: https://go-review.googlesource.com/35114 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
What version of Go are you using (
go version
)?1.7, 1.8+
What operating system and processor architecture are you using (
go env
)?win64 and nix64
What did you do?
Tried to compile 150MB of code in a single package. Really it was a simple main.go that referenced this huge package and I executed "go build" but it is the compile executable that consumed ridiculous resources. But this even happens in my case when I get the code down to half that size (~75 MB). Not removing inlining and/or optimizations only delays the extreme resource usage, it does not remove it.
What did you expect to see?
A successful result and timid RAM usage that does not climb to extreme proportions and "streams" the build limited only by disk space of the result.
What did you see instead?
Consumed proportional amount of memory to code size, running out of memory (Win) or swapping forever until receiving a kill signal (Nix).
I am afraid I cannot provide the large set of code at the immediate moment. But if necessary, I can build a go generator that generates a ton of interfaces, structs, and functions. Orig thread: https://groups.google.com/forum/#!topic/golang-nuts/sBBkQ1_xf2Q.
The text was updated successfully, but these errors were encountered: