Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: NilCheckDeep performance regression #25389

Closed
TocarIP opened this issue May 14, 2018 · 5 comments
Closed

cmd/compile: NilCheckDeep performance regression #25389

TocarIP opened this issue May 14, 2018 · 5 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@TocarIP
Copy link
Contributor

TocarIP commented May 14, 2018

After 00c8e14 NilCheckDeep benchmark shows significant regression:

NilCheckDeep1-8         358ns ± 2%     374ns ± 1%   +4.64%  (p=0.000 n=10+10)
NilCheckDeep10-8        641ns ± 3%     690ns ± 2%   +7.69%  (p=0.000 n=10+10)
NilCheckDeep100-8      3.52µs ± 4%    4.03µs ± 3%  +14.44%  (p=0.000 n=10+10)
NilCheckDeep1000-8     39.1µs ± 2%    45.4µs ± 1%  +15.96%  (p=0.000 n=10+9)
NilCheckDeep10000-8     412µs ± 2%     484µs ± 2%  +17.53%  (p=0.000 n=10+10)

name                 old speed      new speed      delta
NilCheckDeep1-8      2.79MB/s ± 2%  2.67MB/s ± 1%   -4.48%  (p=0.000 n=10+10)
NilCheckDeep10-8     15.6MB/s ± 3%  14.5MB/s ± 2%   -7.14%  (p=0.000 n=10+10)
NilCheckDeep100-8    28.4MB/s ± 4%  24.8MB/s ± 3%  -12.64%  (p=0.000 n=10+10)
NilCheckDeep1000-8   25.5MB/s ± 2%  22.0MB/s ± 1%  -13.77%  (p=0.000 n=10+9)
NilCheckDeep10000-8  24.3MB/s ± 2%  20.7MB/s ± 2%  -14.92%  (p=0.000 n=10+10)

name                 old alloc/op   new alloc/op   delta
NilCheckDeep1-8          101B ± 0%       93B ± 0%   -7.92%  (p=0.000 n=10+10)
NilCheckDeep10-8         192B ± 0%      184B ± 0%   -4.17%  (p=0.000 n=10+10)
NilCheckDeep100-8      1.17kB ± 0%    1.16kB ± 0%   -0.68%  (p=0.000 n=10+10)
NilCheckDeep1000-8     10.3kB ± 0%    10.3kB ± 0%   -0.08%  (p=0.000 n=10+10)
NilCheckDeep10000-8     102kB ± 0%     102kB ± 0%   -0.01%  (p=0.000 n=10+10)

name                 old allocs/op  new allocs/op  delta
NilCheckDeep1-8          6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep10-8         6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep100-8        6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep1000-8       6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)
NilCheckDeep10000-8      6.00 ± 0%      5.00 ± 0%  -16.67%  (p=0.000 n=10+10)

This is caused by extra time spent in zeroing 256 bytes for:

stores := make([]*Value, 0, 64)

Reducing 64 to 8 removes degradation

@TocarIP
Copy link
Contributor Author

TocarIP commented May 14, 2018

Running compilerbench with reduced size, shows compile time degradation, so NilCheckDeep looks like a bad benchmark. Should we just remove it altogether?
cc @randall77 @ALTree

@ALTree
Copy link
Member

ALTree commented May 14, 2018

Running compilerbench with reduced size, shows compile time degradation

This doesn't surprise me. For those stack-allocate changes I always tried to choose the smallest cap constant that would cover most of the calls. Reducing the cap will negate the effect and actually make things worse, because if we stack-allocate small (like with cap = 8) and then most calls actually go over 8, we'll have to do even more work because we'll often need to move the worklist to the heap.

NilCheckDeep looks like a bad benchmark.

Even if it doesn't reflect real-world compiler performance, it could still be useful as a microbenchmark for that specific part of the compiler. The comment says:

benchmarkNilCheckDeep is a stress test of nilcheckelim.
...
Run with multiple depths to observe big-O behavior.

which makes sense. My vote is to keep it, and also keep my stack-allocate change (since it makes the compiler faster); and ignore the benchmarkNilCheckDeep regression, since it's a stress-test by its own admission.

@ALTree ALTree added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label May 14, 2018
@ALTree ALTree added this to the Go1.11 milestone May 14, 2018
@OneOfOne
Copy link
Contributor

OneOfOne commented May 15, 2018

@ALTree I don't have time to test right now, but wouldn't something like this help a bit? or would it still trigger the zeroing?

var stash [64]*Value
store := stash[:0]

@ALTree
Copy link
Member

ALTree commented May 15, 2018

@OneOfOne Just checked, they generate basically the exact same code. Both go on the stack and then there's a DUFFZERO $152 call.

@TocarIP
Copy link
Contributor Author

TocarIP commented May 15, 2018

My vote is to keep it, and also keep my stack-allocate change (since it makes the compiler faster); and ignore the benchmarkNilCheckDeep regression, since it's a stress-test by its own admission.

Agreed, closing.

@TocarIP TocarIP closed this as completed May 15, 2018
@golang golang locked and limited conversation to collaborators May 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants