-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: optimize append performance #14758
Comments
Dup of #11419? I'm not sure I'd expect them to be equal, though. append is a runtime call and will be more expensive than not making a call. |
similar to #14718 |
This is unrelated to either #11419 or #14718. It should be fixable. We inline the easy part of append in the compiler.
The branch is predicted not taken (and will never be taken in this case), so the grow path is out-of-line. The inner loop of BenchmarkFoo is:
The inner loop of BenchmarkBar is:
I suspect that most of the time is lost with the extra writes to those autotmp variables. We might be able to lift those writes to after the loop, or get rid of them altogether. David was thinking about this at some point. |
Just for context, this came up in CL/20513, and provided a very small performance boost of ~1% when the slice length was tracked manually. This is not a priority, but a nice to have. |
CL https://golang.org/cl/20570 mentions this issue. |
* Shaves about 10k from pkg/tools/linux_amd64. * Was suggested by drchase before * Found by looking at ssa output of #14758 Change-Id: If2c4ddf3b2603d4dfd8fb4d9199b9a3dcb05b17d Reviewed-on: https://go-review.googlesource.com/20570 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro> TryBot-Result: Gobot Gobot <gobot@golang.org>
On tip (to-be-1.9), 10 runs, I see:
I guess this is fixed, so I'm closing this issue. |
Looks like actually is |
The original bug was fixed recently with better spill code generation: https://go-review.googlesource.com/c/34822/ |
@randall77, does that mean this bug should be closed, or ... ? |
Well, I suspect the regression is because of #15837. Leave this open so I can double-check after I get a fix together for that issue. |
CL https://golang.org/cl/38431 mentions this issue. |
In Go 1.8.x, panics are generally scheduled at the very end of functions. That property was lost in Go 1.9; this CL restores it. This helps with the Fannkuch benchmark: name old time/op new time/op delta Fannkuch11-8 2.74s ± 2% 2.55s ± 2% -7.03% (p=0.000 n=20+20) This increases the fannkuch function size from 801 bytes to 831 bytes, but that is still smaller than Go 1.8.1 at 844 bytes. It generally increases binary size a tiny amount. Negligible compiler performance impact. For the code in #14758: name old time/op new time/op delta Foo-8 326ns ± 3% 312ns ± 3% -4.32% (p=0.000 n=28+30) Bar-8 560ns ± 2% 565ns ± 2% +0.96% (p=0.002 n=30+27) Updates #18977 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.690 n=5+5) Unicode 28.7MB ± 0% 28.7MB ± 0% ~ (p=0.841 n=5+5) GoTypes 109MB ± 0% 109MB ± 0% ~ (p=0.690 n=5+5) Compiler 457MB ± 0% 457MB ± 0% ~ (p=0.841 n=5+5) SSA 1.10GB ± 0% 1.10GB ± 0% +0.03% (p=0.032 n=5+5) Flate 24.4MB ± 0% 24.5MB ± 0% ~ (p=0.690 n=5+5) GoParser 30.9MB ± 0% 30.9MB ± 0% ~ (p=0.421 n=5+5) Reflect 73.3MB ± 0% 73.3MB ± 0% ~ (p=1.000 n=5+5) Tar 25.5MB ± 0% 25.5MB ± 0% ~ (p=0.095 n=5+5) XML 40.8MB ± 0% 40.9MB ± 0% ~ (p=0.056 n=5+5) [Geo mean] 71.6MB 71.6MB +0.01% name old allocs/op new allocs/op delta Template 395k ± 0% 394k ± 1% ~ (p=1.000 n=5+5) Unicode 344k ± 0% 344k ± 0% ~ (p=0.690 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.421 n=5+5) Compiler 4.41M ± 0% 4.41M ± 0% ~ (p=0.841 n=5+5) SSA 9.79M ± 0% 9.79M ± 0% ~ (p=0.310 n=5+5) Flate 237k ± 0% 237k ± 0% ~ (p=0.841 n=5+5) GoParser 321k ± 0% 321k ± 1% ~ (p=0.421 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 251k ± 1% 252k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 400k ± 0% ~ (p=0.222 n=5+5) [Geo mean] 741k 741k +0.03% name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% +0.05% (p=0.008 n=5+5) Unicode 202k ± 0% 202k ± 0% +0.02% (p=0.008 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% +0.07% (p=0.008 n=5+5) Compiler 3.91M ± 0% 3.91M ± 0% +0.05% (p=0.008 n=5+5) SSA 7.86M ± 0% 7.87M ± 0% +0.07% (p=0.008 n=5+5) Flate 227k ± 0% 227k ± 0% +0.10% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% +0.04% (p=0.008 n=5+5) Reflect 950k ± 0% 951k ± 0% +0.04% (p=0.008 n=5+5) Tar 187k ± 0% 187k ± 0% -0.03% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% +0.04% (p=0.008 n=5+5) [Geo mean] 647k 647k +0.04% Change-Id: I2015aa26338b90cf41e47f89564e336dc02608df Reviewed-on: https://go-review.googlesource.com/43293 Reviewed-by: Keith Randall <khr@golang.org>
looprotate finds loop headers and arranges for them to be placed after the body of the loop. This eliminates a jump from the body. However, if the loop header is a series of contiguously laid out blocks, the rotation introduces a new jump in that series. This CL expands the "loop header" to move to be the entire run of contiguously laid out blocks in the same loop. This shrinks object files a little, and actually speeds up the compiler noticeably. Numbers below. Fannkuch performance seems to vary a lot by machine. On my laptop: name old time/op new time/op delta Fannkuch11-8 2.89s ± 2% 2.85s ± 3% -1.22% (p=0.000 n=50+50) This has a significant affect on the append benchmarks in #14758: name old time/op new time/op delta Foo-8 312ns ± 3% 276ns ± 2% -11.37% (p=0.000 n=30+29) Bar-8 565ns ± 2% 456ns ± 2% -19.27% (p=0.000 n=27+28) Updates #18977 Fixes #20355 name old time/op new time/op delta Template 205ms ± 5% 204ms ± 8% ~ (p=0.903 n=92+99) Unicode 85.3ms ± 4% 85.1ms ± 3% ~ (p=0.191 n=92+94) GoTypes 512ms ± 4% 507ms ± 4% -0.93% (p=0.000 n=95+97) Compiler 2.38s ± 3% 2.35s ± 3% -1.27% (p=0.000 n=98+95) SSA 4.67s ± 3% 4.64s ± 3% -0.62% (p=0.000 n=95+96) Flate 117ms ± 3% 117ms ± 3% ~ (p=0.099 n=84+86) GoParser 139ms ± 4% 137ms ± 4% -0.90% (p=0.000 n=97+98) Reflect 329ms ± 5% 326ms ± 6% -0.97% (p=0.002 n=99+98) Tar 102ms ± 6% 101ms ± 5% -0.97% (p=0.006 n=97+97) XML 198ms ±10% 196ms ±13% ~ (p=0.087 n=100+100) [Geo mean] 318ms 316ms -0.72% name old user-time/op new user-time/op delta Template 250ms ± 7% 250ms ± 7% ~ (p=0.850 n=94+92) Unicode 107ms ± 8% 106ms ± 5% -0.76% (p=0.005 n=98+91) GoTypes 665ms ± 5% 659ms ± 5% -0.85% (p=0.003 n=93+98) Compiler 3.15s ± 3% 3.10s ± 3% -1.60% (p=0.000 n=99+98) SSA 6.82s ± 3% 6.72s ± 4% -1.55% (p=0.000 n=94+98) Flate 138ms ± 8% 138ms ± 6% ~ (p=0.369 n=94+92) GoParser 170ms ± 5% 168ms ± 6% -1.13% (p=0.002 n=96+98) Reflect 412ms ± 8% 416ms ± 8% ~ (p=0.169 n=100+100) Tar 123ms ±18% 123ms ±14% ~ (p=0.896 n=100+100) XML 236ms ± 9% 234ms ±11% ~ (p=0.124 n=100+100) [Geo mean] 401ms 398ms -0.63% name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.222 n=5+5) Unicode 28.7MB ± 0% 28.7MB ± 0% ~ (p=0.421 n=5+5) GoTypes 109MB ± 0% 109MB ± 0% ~ (p=0.056 n=5+5) Compiler 457MB ± 0% 457MB ± 0% +0.07% (p=0.008 n=5+5) SSA 1.10GB ± 0% 1.10GB ± 0% +0.05% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.5MB ± 0% ~ (p=0.222 n=5+5) GoParser 30.9MB ± 0% 31.0MB ± 0% +0.21% (p=0.016 n=5+5) Reflect 73.4MB ± 0% 73.4MB ± 0% ~ (p=0.421 n=5+5) Tar 25.5MB ± 0% 25.5MB ± 0% ~ (p=0.548 n=5+5) XML 40.9MB ± 0% 40.9MB ± 0% ~ (p=0.151 n=5+5) [Geo mean] 71.6MB 71.6MB +0.07% name old allocs/op new allocs/op delta Template 394k ± 0% 394k ± 0% ~ (p=1.000 n=5+5) Unicode 344k ± 0% 343k ± 0% ~ (p=0.310 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=1.000 n=5+5) Compiler 4.42M ± 0% 4.42M ± 0% ~ (p=1.000 n=5+5) SSA 9.80M ± 0% 9.80M ± 0% ~ (p=0.095 n=5+5) Flate 237k ± 1% 238k ± 1% ~ (p=0.310 n=5+5) GoParser 320k ± 0% 322k ± 1% +0.50% (p=0.032 n=5+5) Reflect 958k ± 0% 957k ± 0% ~ (p=0.548 n=5+5) Tar 252k ± 1% 252k ± 0% ~ (p=1.000 n=5+5) XML 400k ± 0% 400k ± 0% ~ (p=0.841 n=5+5) [Geo mean] 741k 742k +0.06% name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% -0.05% (p=0.008 n=5+5) Unicode 202k ± 0% 202k ± 0% -0.01% (p=0.008 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% -0.06% (p=0.008 n=5+5) Compiler 3.91M ± 0% 3.91M ± 0% -0.06% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.92M ± 0% +0.01% (p=0.008 n=5+5) Flate 228k ± 0% 227k ± 0% -0.04% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% -0.06% (p=0.008 n=5+5) Reflect 952k ± 0% 951k ± 0% -0.02% (p=0.008 n=5+5) Tar 187k ± 0% 187k ± 0% -0.04% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.05% (p=0.008 n=5+5) [Geo mean] 648k 648k -0.04% Change-Id: I8630c4291a0eb2f7e7927bc04d7cc0efef181094 Reviewed-on: https://go-review.googlesource.com/43491 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
For the append benchmarks in golang#14758: name old time/op new time/op delta Foo-8 276ns ± 2% 275ns ± 1% -0.52% (p=0.008 n=29+29) Bar-8 456ns ± 2% 456ns ± 2% ~ (p=0.290 n=28+29) Change-Id: I6bc74ad26ff632388f0ab355ec998f7dc9eaee8d
Change https://golang.org/cl/43492 mentions this issue: |
For the append benchmarks in golang#14758: name old time/op new time/op delta Foo-8 276ns ± 2% 275ns ± 1% -0.52% (p=0.008 n=29+29) Bar-8 456ns ± 2% 456ns ± 2% ~ (p=0.290 n=28+29) Change-Id: I6bc74ad26ff632388f0ab355ec998f7dc9eaee8d
Using
go devel +ed4a27a
I ran the following benchmarks:
Currently I see:
I expect to see these two to perform about the same.
@klauspost
The text was updated successfully, but these errors were encountered: