Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: recent allocs regression #16897

Closed
josharian opened this issue Aug 27, 2016 · 16 comments
Closed

cmd/compile: recent allocs regression #16897

josharian opened this issue Aug 27, 2016 · 16 comments

Comments

@josharian
Copy link
Contributor

Going from commit c85b77c (just before the introduction of a printer type to cmd/compile/internal/gc/fmt.go) to 0d23c28 (current tip), I see:

name       old time/op      new time/op      delta
Template        297ms ± 6%       296ms ± 5%    ~           (p=1.000 n=14+13)
Unicode         150ms ± 4%       155ms ±10%    ~           (p=0.068 n=13+14)
GoTypes         965ms ± 4%       986ms ± 1%  +2.16%        (p=0.017 n=15+13)

name       old alloc/op     new alloc/op     delta
Template       47.7MB ± 0%      47.2MB ± 0%  -1.11%        (p=0.000 n=15+15)
Unicode        37.9MB ± 0%      37.9MB ± 0%  +0.08%        (p=0.000 n=15+13)
GoTypes         143MB ± 0%       145MB ± 0%  +1.39%        (p=0.000 n=15+15)

name       old allocs/op    new allocs/op    delta
Template         450k ± 0%        463k ± 0%  +3.00%        (p=0.000 n=15+15)
Unicode          368k ± 0%        374k ± 0%  +1.63%        (p=0.000 n=15+15)
GoTypes         1.34M ± 0%       1.45M ± 0%  +8.58%        (p=0.000 n=15+15)

I haven't dug into why, but I suspect that it is because formatting routines that used to simply return a constant string now append that constant string to a byte slice and then convert that byte slice back into a string. If this guess is correct, it could probably be reversed by adding a string field to printer and doing some careful special-casing code in the printer methods.

cc @griesemer @mdempsky

@josharian josharian added this to the Go1.8 milestone Aug 27, 2016
@griesemer
Copy link
Contributor

@josharian You're probably right but I suggest not jumping to fixing this right now. The fmt.go cleanup is not finished yet, it's just a start. Let's complete the transition and cleanups and then revisit.

@josharian
Copy link
Contributor Author

Sounds good. Just wanted to flag it, since this is a known performance hot spot.

@griesemer
Copy link
Contributor

@josharian How did you obtain the results above? I like to re-measure.

@josharian
Copy link
Contributor Author

josharian commented Aug 30, 2016

$ # Make sure you're using the tip compiler for this go get! It memorizes GOROOT.
$ go get -u rsc.io/compilebench
$ git checkout c85b77c
$ ./make.bash
$ compilebench -short -count 30 -alloc | tee before
$ git checkout master
$ ./make.bash
$ compilebench -short -count 30 -alloc | tee after
$ benchstat before after

I now get from that (with master at d6098e4):

name       old time/op      new time/op      delta
Template        312ms ±16%       291ms ± 9%   -6.57%        (p=0.000 n=30+28)
Unicode         163ms ±20%       157ms ± 9%     ~           (p=0.088 n=30+30)
GoTypes         1.05s ±21%       0.97s ± 5%   -7.24%        (p=0.000 n=30+26)
Compiler        5.67s ±10%       5.21s ± 7%   -8.12%        (p=0.000 n=30+26)

name       old user-ns/op   new user-ns/op   delta
Template   415user-ms ±13%  398user-ms ± 6%   -4.02%        (p=0.002 n=30+29)
Unicode    232user-ms ± 9%  228user-ms ± 3%     ~           (p=0.123 n=29+25)
GoTypes    1.37user-s ±16%  1.37user-s ± 3%   -0.04%        (p=0.030 n=30+27)
Compiler   7.55user-s ± 8%  7.22user-s ± 9%   -4.44%        (p=0.000 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       46.8MB ± 0%      47.9MB ± 0%   +2.39%        (p=0.000 n=30+28)
Unicode        37.8MB ± 0%      38.2MB ± 0%   +1.00%        (p=0.000 n=29+30)
GoTypes         143MB ± 0%       147MB ± 0%   +3.13%        (p=0.000 n=30+30)
Compiler        681MB ± 0%       710MB ± 0%   +4.33%        (p=0.000 n=29+30)

name       old allocs/op    new allocs/op    delta
Template         444k ± 0%        500k ± 0%  +12.68%        (p=0.000 n=30+30)
Unicode          369k ± 0%        385k ± 0%   +4.55%        (p=0.000 n=30+29)
GoTypes         1.35M ± 0%       1.60M ± 0%  +18.43%        (p=0.000 n=30+30)
Compiler        5.63M ± 0%       6.49M ± 0%  +15.23%        (p=0.000 n=30+30)

The alloc numbers are the ones that I think are probably due to the gc/fmt changes. The CPU numbers may be because I'm running a browser, etc., but I hope they're real! :)

@griesemer
Copy link
Contributor

@josharian Thanks. My pending change ( https://go-review.googlesource.com/28072 ) reduces the allocations again somewhat. Following the same script, I now get:

name       old time/op     new time/op     delta
Template       307ms ± 4%      315ms ± 4%   +2.55%        (p=0.000 n=29+29)
Unicode        164ms ± 4%      165ms ± 4%     ~           (p=0.057 n=30+30)
GoTypes        1.01s ± 3%      1.03s ± 3%   +1.72%        (p=0.000 n=30+30)
Compiler       5.49s ± 1%      5.62s ± 2%   +2.31%        (p=0.000 n=30+28)

name       old user-ns/op  new user-ns/op  delta
Template        397M ± 3%       406M ± 6%   +2.21%        (p=0.000 n=28+30)
Unicode         225M ± 4%       226M ± 3%     ~           (p=0.230 n=29+30)
GoTypes        1.31G ± 3%      1.34G ± 5%   +2.79%        (p=0.000 n=30+30)
Compiler       7.39G ± 2%      7.50G ± 2%   +1.43%        (p=0.000 n=30+29)

name       old alloc/op    new alloc/op    delta
Template      46.8MB ± 0%     47.5MB ± 0%   +1.48%        (p=0.000 n=29+28)
Unicode       37.8MB ± 0%     38.1MB ± 0%   +0.64%        (p=0.000 n=30+28)
GoTypes        143MB ± 0%      145MB ± 0%   +1.72%        (p=0.000 n=30+30)
Compiler       683MB ± 0%      706MB ± 0%   +3.31%        (p=0.000 n=30+29)

name       old allocs/op   new allocs/op   delta
Template        444k ± 0%       481k ± 0%   +8.38%        (p=0.000 n=30+30)
Unicode         369k ± 0%       379k ± 0%   +2.74%        (p=0.000 n=30+30)
GoTypes        1.35M ± 0%      1.50M ± 0%  +10.78%        (p=0.000 n=30+30)
Compiler       5.66M ± 0%      6.25M ± 0%  +10.31%        (p=0.000 n=29+29)

gopherbot pushed a commit that referenced this issue Aug 30, 2016
Internal objects that satisfy the Printable interface can print
directly to a printer w/o going through the conversion to a string
first.

Made printer.f understand and special-case %v so that Printable
objects use the printer directly.

This is work in progress and we may end up doing something else
eventually (perhaps using fmt.Formatter) - or even undo these
changes if this exploration doesn't get us to a significantly
better place.

Allocations numbers relative to commit c85b77c (still up, but
reduced from most recent change):

name       old time/op     new time/op     delta
Template       307ms ± 4%      315ms ± 4%   +2.55%        (p=0.000 n=29+29)
Unicode        164ms ± 4%      165ms ± 4%     ~           (p=0.057 n=30+30)
GoTypes        1.01s ± 3%      1.03s ± 3%   +1.72%        (p=0.000 n=30+30)
Compiler       5.49s ± 1%      5.62s ± 2%   +2.31%        (p=0.000 n=30+28)

name       old user-ns/op  new user-ns/op  delta
Template        397M ± 3%       406M ± 6%   +2.21%        (p=0.000 n=28+30)
Unicode         225M ± 4%       226M ± 3%     ~           (p=0.230 n=29+30)
GoTypes        1.31G ± 3%      1.34G ± 5%   +2.79%        (p=0.000 n=30+30)
Compiler       7.39G ± 2%      7.50G ± 2%   +1.43%        (p=0.000 n=30+29)

name       old alloc/op    new alloc/op    delta
Template      46.8MB ± 0%     47.5MB ± 0%   +1.48%        (p=0.000 n=29+28)
Unicode       37.8MB ± 0%     38.1MB ± 0%   +0.64%        (p=0.000 n=30+28)
GoTypes        143MB ± 0%      145MB ± 0%   +1.72%        (p=0.000 n=30+30)
Compiler       683MB ± 0%      706MB ± 0%   +3.31%        (p=0.000 n=30+29)

name       old allocs/op   new allocs/op   delta
Template        444k ± 0%       481k ± 0%   +8.38%        (p=0.000 n=30+30)
Unicode         369k ± 0%       379k ± 0%   +2.74%        (p=0.000 n=30+30)
GoTypes        1.35M ± 0%      1.50M ± 0%  +10.78%        (p=0.000 n=30+30)
Compiler       5.66M ± 0%      6.25M ± 0%  +10.31%        (p=0.000 n=29+29)

For #16897.

Change-Id: I37f95ab60508018ee6d29a98d238482b60e3e4b5
Reviewed-on: https://go-review.googlesource.com/28072
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@davecheney
Copy link
Contributor

Just as a data point. I've been keeping my benchmarks up to date and as of
this morning Go tip is slightly slower than 1.7, an average of no more than
2%.

On Wed, 31 Aug 2016, 04:05 Robert Griesemer notifications@github.com
wrote:

@josharian https://github.com/josharian Thanks. My pending change (
https://go-review.googlesource.com/28072 ) reduces the allocations again
somewhat. Following the same script, I now get:

name old time/op new time/op delta
Template 307ms ± 4% 315ms ± 4% +2.55% (p=0.000 n=29+29)
Unicode 164ms ± 4% 165ms ± 4% ~ (p=0.057 n=30+30)
GoTypes 1.01s ± 3% 1.03s ± 3% +1.72% (p=0.000 n=30+30)
Compiler 5.49s ± 1% 5.62s ± 2% +2.31% (p=0.000 n=30+28)

name old user-ns/op new user-ns/op delta
Template 397M ± 3% 406M ± 6% +2.21% (p=0.000 n=28+30)
Unicode 225M ± 4% 226M ± 3% ~ (p=0.230 n=29+30)
GoTypes 1.31G ± 3% 1.34G ± 5% +2.79% (p=0.000 n=30+30)
Compiler 7.39G ± 2% 7.50G ± 2% +1.43% (p=0.000 n=30+29)

name old alloc/op new alloc/op delta
Template 46.8MB ± 0% 47.5MB ± 0% +1.48% (p=0.000 n=29+28)
Unicode 37.8MB ± 0% 38.1MB ± 0% +0.64% (p=0.000 n=30+28)
GoTypes 143MB ± 0% 145MB ± 0% +1.72% (p=0.000 n=30+30)
Compiler 683MB ± 0% 706MB ± 0% +3.31% (p=0.000 n=30+29)

name old allocs/op new allocs/op delta
Template 444k ± 0% 481k ± 0% +8.38% (p=0.000 n=30+30)
Unicode 369k ± 0% 379k ± 0% +2.74% (p=0.000 n=30+30)
GoTypes 1.35M ± 0% 1.50M ± 0% +10.78% (p=0.000 n=30+30)
Compiler 5.66M ± 0% 6.25M ± 0% +10.31% (p=0.000 n=29+29)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16897 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA4wBVxDV-IYGlha2D4Q0y53FbR7Oks5qlHAdgaJpZM4JunmF
.

@griesemer
Copy link
Contributor

@davecheney I wouldn't be surprised if some of it is due to these changes. I'm playing a bit more with it but if I cannot make it net-negative, I'll undo all the fmt changes that didn't explicitly remove code.

@davecheney
Copy link
Contributor

@robert I don't think the formatting changes had any effect for or against,
since I started looking a week or so ago go tip has been ~2% slower in my
compile benchmarks than 1.7.

On Wed, Aug 31, 2016 at 9:12 AM, Robert Griesemer notifications@github.com
wrote:

@davecheney https://github.com/davecheney I wouldn't be surprised if
some of it is due to these changes. I'm playing a bit more with it but if I
cannot make it net-negative, I'll undo all the fmt changes that didn't
explicitly remove code.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16897 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA4oqCSfQt8FStKvz0qKTh_VhCF5Sks5qlLjwgaJpZM4JunmF
.

@josharian
Copy link
Contributor Author

If you give up on it, Robert, please let me know before reverting everything.

Dave, if you could identify when the 2% regression started, that'd be useful. But let's leave this bug specifically for the gc/fmt changes.

@davecheney
Copy link
Contributor

Dave, if you could identify when the 2% regression started, that'd be
useful. But let's leave this bug specifically for the gc/fmt changes.

I'm going to assume it was when the dev.ssa branch was merged in. To be
clear, we're talking about this much

go version go1.7 linux/amd64

real 1m2.428s
user 3m5.858s
sys 0m11.943s

go version devel +b2e0e96 Tue Aug 30 23:10:43 2016 +0000 linux/amd64

real 1m4.310s
user 3m13.133s
sys 0m12.396s

I don't think it's worth alerting the national guard at this early stage of
the game.

On Wed, Aug 31, 2016 at 9:23 AM, Josh Bleecher Snyder <
notifications@github.com> wrote:

If you give up on it, Robert, please let me know before reverting
everything.

Dave, if you could identify when the 2% regression started, that'd be
useful. But let's leave this bug specifically for the gc/fmt changes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#16897 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA9EALy8TmF__dzd2vwnwxlLpllbWks5qlLtYgaJpZM4JunmF
.

@griesemer
Copy link
Contributor

@josharian I'm trying to use fmt.Formatter instead of the local printer - that would be nicer and more flexible. So far it looks good.

@griesemer
Copy link
Contributor

@josharian I have now implemented fmt.Formatters for all types handled by fmt.go and also converted all source code to use formatted printing. For instance, an sconv(s, FmtSign) for the format %v becomes just s with the format %+v (I'm going to make these formats %+S for better readability).

Running the above benchmark above against commit c85b77c and current tip (commit cd0ba4c, excl. my new changes) results in:

name       old time/op     new time/op     delta
Template       307ms ± 4%      301ms ± 5%   -1.84%        (p=0.003 n=29+29)
Unicode        164ms ± 4%      160ms ± 4%   -2.18%        (p=0.000 n=30+29)
GoTypes        1.01s ± 3%      1.01s ± 3%     ~           (p=0.398 n=30+30)
Compiler       5.49s ± 1%      5.47s ± 2%     ~           (p=0.181 n=30+29)

name       old alloc/op    new alloc/op    delta
Template      46.8MB ± 0%     47.2MB ± 0%   +0.98%        (p=0.000 n=29+30)
Unicode       37.8MB ± 0%     38.0MB ± 0%   +0.55%        (p=0.000 n=30+29)
GoTypes        143MB ± 0%      145MB ± 0%   +1.59%        (p=0.000 n=30+30)
Compiler       683MB ± 0%      700MB ± 0%   +2.42%        (p=0.000 n=30+30)

name       old allocs/op   new allocs/op   delta
Template        444k ± 0%       479k ± 0%   +7.86%        (p=0.000 n=30+30)
Unicode         369k ± 0%       378k ± 0%   +2.69%        (p=0.000 n=30+30)
GoTypes        1.35M ± 0%      1.50M ± 0%  +10.68%        (p=0.000 n=30+30)
Compiler       5.66M ± 0%      6.21M ± 0%   +9.72%        (p=0.000 n=29+30)

This is about what we had yesterday.

Running the above benchmark above against commit c85b77c and and my local tip (incl. my new changes) results in:

name       old time/op     new time/op     delta
Template       307ms ± 4%      304ms ± 3%    ~           (p=0.053 n=29+29)
Unicode        164ms ± 4%      160ms ± 8%  -2.34%        (p=0.000 n=30+30)
GoTypes        1.01s ± 3%      1.05s ± 3%  +3.60%        (p=0.000 n=30+29)
Compiler       5.49s ± 1%      5.52s ± 1%  +0.42%        (p=0.009 n=30+29)

name       old alloc/op    new alloc/op    delta
Template      46.8MB ± 0%     46.8MB ± 0%  +0.14%        (p=0.000 n=29+30)
Unicode       37.8MB ± 0%     37.9MB ± 0%  +0.24%        (p=0.000 n=30+30)
GoTypes        143MB ± 0%      144MB ± 0%  +0.50%        (p=0.000 n=30+30)
Compiler       683MB ± 0%      688MB ± 0%  +0.69%        (p=0.000 n=30+30)

name       old allocs/op   new allocs/op   delta
Template        444k ± 0%       471k ± 0%  +6.05%        (p=0.000 n=30+30)
Unicode         369k ± 0%       376k ± 0%  +2.00%        (p=0.000 n=30+30)
GoTypes        1.35M ± 0%      1.47M ± 0%  +8.53%        (p=0.000 n=30+30)
Compiler       5.66M ± 0%      6.07M ± 0%  +7.22%        (p=0.000 n=29+29)

This shows an improvement - albeit not as much as I hoped. But the changes should have removed many function calls to the various Xconv routines all over the place. There should also much fewer strings be allocated and concatenated.

For final comparison: Current master tip as above against my local tip:

name       old time/op     new time/op     delta
Template       301ms ± 5%      304ms ± 3%    ~           (p=0.110 n=29+29)
Unicode        160ms ± 4%      160ms ± 8%    ~           (p=0.769 n=29+30)
GoTypes        1.01s ± 3%      1.05s ± 3%  +3.82%        (p=0.000 n=30+29)
Compiler       5.47s ± 2%      5.52s ± 1%  +0.82%        (p=0.000 n=29+29)

name       old alloc/op    new alloc/op    delta
Template      47.2MB ± 0%     46.8MB ± 0%  -0.83%        (p=0.000 n=30+30)
Unicode       38.0MB ± 0%     37.9MB ± 0%  -0.31%        (p=0.000 n=29+30)
GoTypes        145MB ± 0%      144MB ± 0%  -1.07%        (p=0.000 n=30+30)
Compiler       700MB ± 0%      688MB ± 0%  -1.69%        (p=0.000 n=30+30)

name       old allocs/op   new allocs/op   delta
Template        479k ± 0%       471k ± 0%  -1.67%        (p=0.000 n=30+30)
Unicode         378k ± 0%       376k ± 0%  -0.68%        (p=0.000 n=30+30)
GoTypes        1.50M ± 0%      1.47M ± 0%  -1.94%        (p=0.000 n=30+30)
Compiler       6.21M ± 0%      6.07M ± 0%  -2.28%        (p=0.000 n=30+29)

I'm going to send out a series of CLs (about one dozen in total) with the respective changes.

@gopherbot
Copy link

CL https://golang.org/cl/29087 mentions this issue.

@griesemer
Copy link
Contributor

For the record: With (pending) https://go-review.googlesource.com/29089, we are now at or below the original levels:

name       old time/op     new time/op     delta
Template       297ms ± 5%      284ms ± 3%  -4.53%  (p=0.000 n=27+29)
Unicode        159ms ± 5%      151ms ± 5%  -4.91%  (p=0.000 n=28+30)
GoTypes        985ms ± 5%      935ms ± 2%  -5.13%  (p=0.000 n=28+29)

name       old alloc/op    new alloc/op    delta
Template      46.8MB ± 0%     45.7MB ± 0%  -2.37%  (p=0.000 n=30+30)
Unicode       37.8MB ± 0%     37.9MB ± 0%  +0.29%  (p=0.000 n=29+30)
GoTypes        143MB ± 0%      138MB ± 0%  -3.64%  (p=0.000 n=29+30)

name       old allocs/op   new allocs/op   delta
Template        444k ± 0%       440k ± 0%  -0.94%  (p=0.000 n=30+30)
Unicode         369k ± 0%       369k ± 0%  +0.19%  (p=0.000 n=29+30)
GoTypes        1.35M ± 0%      1.34M ± 0%  -1.24%  (p=0.000 n=30+30)

(Most of the benefits came from https://go-review.googlesource.com/29087.)

Considering this particular issue fully resolved at this point.

@josharian
Copy link
Contributor Author

Thanks, Robert.

@gopherbot
Copy link

CL https://golang.org/cl/29089 mentions this issue.

gopherbot pushed a commit that referenced this issue Sep 13, 2016
Also: update fmt_test.go.

Together with the previous commits, we are now at or below c85b77c
levels in terms of allocation for the benchmark described in #16897
(old = c85b77c, new = this commit):

name       old time/op     new time/op     delta
Template       297ms ± 5%      284ms ± 3%  -4.53%  (p=0.000 n=27+29)
Unicode        159ms ± 5%      151ms ± 5%  -4.91%  (p=0.000 n=28+30)
GoTypes        985ms ± 5%      935ms ± 2%  -5.13%  (p=0.000 n=28+29)

name       old alloc/op    new alloc/op    delta
Template      46.8MB ± 0%     45.7MB ± 0%  -2.37%  (p=0.000 n=30+30)
Unicode       37.8MB ± 0%     37.9MB ± 0%  +0.29%  (p=0.000 n=29+30)
GoTypes        143MB ± 0%      138MB ± 0%  -3.64%  (p=0.000 n=29+30)

name       old allocs/op   new allocs/op   delta
Template        444k ± 0%       440k ± 0%  -0.94%  (p=0.000 n=30+30)
Unicode         369k ± 0%       369k ± 0%  +0.19%  (p=0.000 n=29+30)
GoTypes        1.35M ± 0%      1.34M ± 0%  -1.24%  (p=0.000 n=30+30)

For #16897.

Change-Id: Iedbeb408e2f1e68dd4a3201bf8813c8066ebf7ed
Reviewed-on: https://go-review.googlesource.com/29089
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@golang golang locked and limited conversation to collaborators Sep 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants