Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: big binary and slow compilation times with maps & []interface{} in static code #20095

Open
philpennock opened this issue Apr 24, 2017 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Milestone

Comments

@philpennock
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8.1 darwin/amd64

What operating system and processor architecture are you using (go env)?

darwin_amd64

What did you do?

I switched some static data from being encoded in the binary as a large raw string to generating Golang code representing data structures, where the data represented covers "all assigned Unicode codepoints", at 31618 elements.

I removed a couple of map[rune]whatever items and constructed them at runtime, to get compiler times down from one minute back to a couple of seconds.

Even after making that change, I did not initially notice that the unicode.a file had gone up to 509MB and the executable to 491MB. I tried a couple of things, but the key was to switch a slice of []interface{} to []ConcreteType. At runtime I need the []interface{} (to feed to Ferret for substring search) but switching the compiled-in slice from elements of interface{} to elements of concrete types brought unicode.a down to 4.4MB (from 509MB) and the executable down to 9.4MB.

Code is: https://github.com/philpennock/character
(go get compatible, though make will do some embedding)

What did you expect to see?

Non-degenerate performance in compile times or library/executable sizes.

What did you see instead?

Something which made me think I was compiling C++, not Golang. 😄

@ALTree ALTree changed the title Poor size/perf with maps & []interface{} in static code cmd/compile: big binary and slow compilation times with maps & []interface{} in static code Apr 24, 2017
@josharian
Copy link
Contributor

Thanks for filing a report. I tried to reproduce by cloning that repo, checking out commit 32b8af781432a258c4440b9937500c249d24d200, cd unicode, removing external dependencies (Ferret, aux) by commenting out a bit of code, and running go build. The build completed in less than a second for me.

I assume I guessed wrong on the commit or the package or something. Would you mind providing more specific reproduction instructions?

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 24, 2017
@philpennock
Copy link
Author

Sure, sorry. 4am bug-reports apparently not my forte. Looks like the only issue with 1.8.1 might be with maps, at an older commit ( 239cc5c ) and then only compiler time, not binary size.


That commit id was correct for my report. Today, I can only reproduce the slowness/size with Go 1.7.5 so I'm beginning to think that I had one terminal window lingering on that Go version after examining why Travis was failing on older versions of Go. So with 1.7.5:

go install  81.87s user 15.21s system 108% cpu 1:29.26 total

-rwxr-xr-x  1 pdp  staff   491M Apr 24 12:06 character
-rw-r--r--   1 pdp  staff   491M Apr 24 12:06 unicode.a

whereas building with 1.8.1 gives 16MB and 6.0MB respectively. Let's assume that for this, PEBKAC.


Checking out 239cc5c63dec285f897ce32cfbf8fd78e8fcda05 and with ~/go/pkg/darwin_amd64 blown away:

% time go build
go build  54.60s user 5.23s system 129% cpu 46.296 total
% go version
go version go1.8.1 darwin/amd64
% ls -lh character
-rwxr-xr-x  1 pdp  staff    27M Apr 24 12:13 character
% ./character version
character: version <unknown>
Golang: Runtime: go1.8.1
tabular: API version: 1.0

So that's definitely built with 1.8.1, the binary size is roughly fine, but 54s of user CPU time to complete the build.

At this older version of character, I was trying to embed maps into the binary at compile time. That was bad enough that I fixed it in the follow-up commit before merging back to master.

*   5cd64f8 Merge branch 'generate_structured_unicode_data'
|\
| * 2404491 Performance compromise: pre-gen some Unicode data
| * 239cc5c Finish the switch to structured Unicode
|/
* e903416 Pregenerate blocks data at go generate time

@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Apr 24, 2017
@bradfitz
Copy link
Contributor

@philpennock, and how's the performance with Go tip?

@josharian
Copy link
Contributor

@philpennock thanks, I can reproduce now.

This is an interesting one. If you comment out either of the map fields (ByRune, ByName), it compiles using tip in 10-15s. But if both are present...well, it's still running. :) My guess is CSE, but will need to dig.

Based on that, if you need a quick workaround, you could try doing:

var global Unicode

func init() {
  global.ByRune = // ...
}

func init() {
  global.ByName = // ...
}

Also, you'll get better compiler performance and better runtime performance if ByRune is an array instead of a map.

I'll keep poking at the original issue.

@philpennock
Copy link
Author

philpennock commented Apr 24, 2017

I don't need a workaround, I get acceptable performance doing the map init at runtime instead of compile-time, but I figured the Golang issue should be reported. :)

So I'm using a slice at compile time. I tried an array, didn't see any noticeable difference.

@bradfitz

% go version
go version devel +3fa133f482 Mon Apr 24 15:45:15 2017 +0000 darwin/amd64
% rm -rf ~/go/pkg/darwin_amd64
% cd ~gopdp/character
% git checkout 239cc5c
% time go build
^C
go build  6.90s user 1.73s system 1% cpu 9:09.30 total

I killed it after 9 minutes at 38GB of RAM in use as my laptop was struggling under the memory pressure. I'd call that a regression. (failed humor)

@bradfitz
Copy link
Contributor

I never said it wasn't a regression or not a bug. I was just asking for information for completeness.

@philpennock
Copy link
Author

Sorry, wasn't trying to be combative, it was intended as a humorous aside.

@josharian
Copy link
Contributor

This is similar in many ways to #19751. We start with a single giant block filled with values. The writebarrier pass breaks this up into lots of blocks, each with a few values. The liveness tracking in regalloc ends up quadratic, which is what we're seeing.

As with #19751, I see two fixes. We should probably do both.

There are also some smaller optimizations we should consider, based on the SSA html. One is making deadcode cheap enough to do after CSE: #15306. Another is doing OffPtr collapsing during dec, and then using the func const cache to avoid introducing new OffPtr SP values. But while worthwhile, they're not going to really going to move the needle on the quadratic behavior.

@josharian josharian added this to the Go1.9Maybe milestone May 1, 2017
@josharian
Copy link
Contributor

Marked Go1.9Maybe to match #19751. There's an outstanding CL for that (CL 42178) that doesn't help with issue directly, but if that CL can be made to work, a similar fix might work here.

@josharian
Copy link
Contributor

Looks like CL 42178 (and thus #19751) is unlikely to make 1.9. Moving this to 1.10.

@josharian josharian modified the milestones: Go1.10, Go1.9Maybe May 18, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 29, 2017
@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. ToolSpeed
Projects
None yet
Development

No branches or pull requests

4 participants