Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: autogenerated wrapper is not inlined #25338

Closed
TocarIP opened this issue May 10, 2018 · 6 comments
Closed

cmd/compile: autogenerated wrapper is not inlined #25338

TocarIP opened this issue May 10, 2018 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@TocarIP
Copy link
Contributor

TocarIP commented May 10, 2018

What version of Go are you using (go version)?

Master:

go version devel +5bd88b0 Thu May 10 18:02:07 2018 +0000 linux/amd64

Running encoding/binary benchmarks I see significant slowdown (compared to 1.10):

WriteSlice1000Int32s-4                           9.51µs ± 2%    12.83µs ± 2%    +34.92%  (p=0.008 n=5+5)
ReadSlice1000Int32s-4                            9.34µs ± 2%    12.60µs ± 2%    +34.90%  (p=0.008 n=5+5)

This is caused by compiler not inlining autogenerated binary.(*bigEndian).PutUint32, which does nothing but shuffles arguments and calls binary.bigEndian.PutUint32.

Bisect points to ca2f85f, which is IMHO strange, because it doesn't enable new export format by default.
@mdempsky any ideas?

@TocarIP
Copy link
Contributor Author

TocarIP commented May 10, 2018

In genwrapper (cmd/compile/internal/gc/subr.go) I see:

        // TODO(mdempsky): Investigate why this doesn't work with
        // indexed export. For now, we disable even in non-indexed
        // mode to ensure fair benchmark comparisons and to track down
        // unintended compilation differences.
        if false {
                inlcalls(fn)
        }

Should we at least change false to flagiexport, since benchmarks are done?

@josharian
Copy link
Contributor

@TocarIP I see you are systematically checking benchmarks and running down significant regressions. Just wanted to say a hearty thank you for doing this!

@quasilyte
Copy link
Contributor

quasilyte commented May 11, 2018

Additional regressions caused by that CL:

sort
StableInt1K-4         148µs ± 0%      237µs ± 0%    +59.40%  (p=0.008 n=5+5)
StableInt64K-4       13.2ms ± 1%     21.0ms ± 1%    +59.11%  (p=0.008 n=5+5)
SortInt1K-4           134µs ± 0%      201µs ± 1%    +50.75%  (p=0.008 n=5+5)
SortInt64K-4         13.0ms ± 1%     19.3ms ± 1%    +48.86%  (p=0.008 n=5+5)
Stable1e6-4           9.76s ± 0%     13.39s ± 0%    +37.26%  (p=0.008 n=5+5)
Sort1e6-4             2.46s ± 0%      3.37s ± 0%    +37.08%  (p=0.008 n=5+5)
Sort1e4-4            16.2ms ± 0%     22.2ms ± 0%    +36.50%  (p=0.016 n=5+4)
Stable1e2-4           155µs ± 0%      211µs ± 0%    +35.67%  (p=0.008 n=5+5)

bufio
WriterFlush-4       26.8ns ± 0%     31.6ns ± 0%    +18.09%  (p=0.016 n=5+4)

There are more regressions, but these are the most significant ones (highest delta).

@agnivade
Copy link
Contributor

I see you are systematically checking benchmarks and running down significant regressions.

Probably we should automate this through a nightly job so that offfending CLs are caught quicker.

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 13, 2018
@agnivade agnivade added this to the Go1.11 milestone May 13, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 30, 2018
@gopherbot
Copy link

Change https://golang.org/cl/135697 mentions this issue: cmd/compile/internal/gc: inline autogenerated (*T).M wrappers

gopherbot pushed a commit that referenced this issue Oct 16, 2018
Currently all inlining of autogenerated wrappers is disabled,
because it causes build failures, when indexed export format is enabled.
Turns out we can reenable it for common case of (*T).M wrappers.
This fixes most performance degradation of 1.11 vs 1.10.

encoding/binary:
name                    old time/op   new time/op   delta
ReadSlice1000Int32s-6    14.8µs ± 2%   11.5µs ± 2%  -22.01%  (p=0.000 n=10+10)
WriteSlice1000Int32s-6   14.8µs ± 2%   11.7µs ± 2%  -20.95%  (p=0.000 n=10+10)

bufio:
name           old time/op    new time/op    delta
WriterFlush-6    32.4ns ± 1%    28.8ns ± 0%  -11.17%  (p=0.000 n=9+10)

sort:
SearchWrappers-6       231ns ± 1%   231ns ± 0%     ~     (p=0.129 n=9+10)
SortString1K-6         365µs ± 1%   298µs ± 1%  -18.43%  (p=0.000 n=9+10)
SortString1K_Slice-6   274µs ± 2%   276µs ± 1%     ~     (p=0.105 n=10+10)
StableString1K-6       490µs ± 1%   373µs ± 1%  -23.73%  (p=0.000 n=10+10)
SortInt1K-6            210µs ± 1%   142µs ± 1%  -32.69%  (p=0.000 n=10+10)
StableInt1K-6          243µs ± 0%   151µs ± 1%  -37.75%  (p=0.000 n=10+10)
StableInt1K_Slice-6    130µs ± 1%   130µs ± 0%     ~     (p=0.237 n=10+8)
SortInt64K-6          19.9ms ± 1%  13.5ms ± 1%  -32.32%  (p=0.000 n=10+10)
SortInt64K_Slice-6    11.5ms ± 1%  11.5ms ± 1%     ~     (p=0.912 n=10+10)
StableInt64K-6        21.5ms ± 0%  13.5ms ± 1%  -37.30%  (p=0.000 n=9+10)
Sort1e2-6              108µs ± 2%    83µs ± 3%  -23.26%  (p=0.000 n=10+10)
Stable1e2-6            218µs ± 0%   161µs ± 1%  -25.99%  (p=0.000 n=8+9)
Sort1e4-6             22.6ms ± 1%  16.8ms ± 0%  -25.45%  (p=0.000 n=10+7)
Stable1e4-6           67.6ms ± 1%  49.7ms ± 0%  -26.48%  (p=0.000 n=10+10)
Sort1e6-6              3.44s ± 0%   2.55s ± 1%  -26.05%  (p=0.000 n=8+9)
Stable1e6-6            13.7s ± 0%    9.9s ± 1%  -27.68%  (p=0.000 n=8+10)

Fixes #27621
Updates #25338

Change-Id: I6fe633202f63fa829a6ab849c44d7e45f8835dff
Reviewed-on: https://go-review.googlesource.com/c/135697
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@randall77
Copy link
Contributor

I believe this is fixed now. The CL above, and possibly the slightly more aggressive mid-stack inlining makes the wrapper inline successfully.
Tip is still slower than 1.10, but only by a few percent.

@golang golang locked and limited conversation to collaborators Dec 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

7 participants